CommunityNews

Training Language Models to Self-Correct via Reinforcement Learning

Training Language Models to Self-Correct via Reinforcement Learning.
Self-correction is a highly desirable capability of large language models (LLMs), yet it has consistently been found to be largely ineffective in modern LLMs. Existing approaches for training self-correction either require multiple models or rely on a more capable model or other forms of supervision. To this end, we develop a multi-turn online reinforcement learning (RL) approach, SCoRe, that significantly improves an LLM’s self-correction ability using entirely self-generated data. To build SCoRe, we first show that variants of supervised fine-tuning (SFT) on offline model-generated correction traces are insufficient for instilling self-correction behavior. In particular, we observe that training via SFT either suffers from a distribution mismatch between the training data and the model’s own responses or implicitly prefers only a certain mode of correction behavior that is often not effective at test time. SCoRe addresses these challenges by training under the model’s own distribution of self-generated correction traces and using appropriate regularization to steer the learning process into learning a self-correction strategy that is effective at test time as opposed to simply fitting high-reward responses for a given prompt. This regularization prescribes running a first phase of RL on a base model to generate a policy initialization that is less susceptible to collapse and then using a reward bonus to amplify self-correction during training. When applied to Gemini 1.0 Pro and 1.5 Flash models, we find that SCoRe achieves state-of-the-art self-correction performance, improving the base models’ self-correction by 15.6% and 9.1% respectively on the MATH and HumanEval benchmarks.

Read in full here:

This thread was posted by one of our members via one of our news source trackers.

View thread on forum

#learning #training

0 54 0

2024-09-20 15:23:34 UTC

Where Next?

View thread on forum

learning

training

Home General Dev>In The News

#learning #training

0 54 0

Last post

Popular General Dev topics

General Dev>In The News

Top 5 programming languages for web developers to learn

The following languages will help current and new web developers navigate the programming landscape to code web-based services and apps t...

techrepublic.com

#programming #languages #web

59 2159 24

2022-01-10 05:38:51 UTC

New

General Dev>In The News

I wired my tree with 500 LED lights and calculated their 3D coordinates

I wired my tree with 500 LED lights and calculated their 3D coordinates… If you support me on Patreon at any point in December 2020 I wi...

youtube.com

1 1302 2

2022-11-15 15:35:32 UTC

New

General Dev>In The News

The faster you unlearn OOP, the better for you and your software

Maybe it’s just my experience, but Object-Oriented Programming seems like a default, most common paradigm of software engineering. The on...

dpc.pw

#oop

36 1920 15

2021-06-21 01:31:51 UTC

New

General Dev>In The News

Japan government backs 4-day workweek

TOKYO (Kyodo) – Japan’s government plans to encourage firms to let their employees choose to work four days a week instead of five, aimin...

mainichi.jp

35 1504 18

2021-07-02 16:29:41 UTC

New

General Dev>In The News

Kinesis Advantage360 Ergonomic Keyboard

Kinesis Advantage360 Ergonomic Keyboard. Split-adjustable, contoured design that maximizes comfort and boosts productivity. Mechanical s...

kinesis-ergo.com

/keyboards #ergonomic

0 1243 0

2021-12-18 06:40:34 UTC

New

General Dev>In The News

LG 28-inch 16:18 DualUp Monitor

LG 28-inch 16:18 DualUp Monitor with Ergo Stand and USB Type-C™ (28MQ780-B) | LG USA. Shop LG 28MQ780-B on the official LG.com website ...

lg.com

12 1824 12

2022-09-01 19:28:37 UTC

New

General Dev>In The News

Testing Intel’s Arc A770 GPU for Deep Learning

Christian Mills - Testing Intel’s Arc A770 GPU for Deep Learning Pt. 2. This post covers my experience training image classification mod...

christianjmills.com

#testing #learning #intel

0 1528 0

2023-08-09 15:00:13 UTC

New

General Dev>In The News

JavaScript Fatigue Strikes Back

The new frameworks will continue until morale improves.

allenpike.com

/js

6 274 5

2025-03-24 16:52:46 UTC

New

General Dev>In The News

Knowing CSS is mastery to Front end Development

There are countless articles why developers should not focus on Frameworks too much and instead learn to understand the underlying langua...

helloanselm.com

#css #development

2 143 1

2025-03-10 14:21:35 UTC

New

General Dev>In The News

Should managers still code?

Ah, the eternal question, straight from the mailbag.

theengineeringmanager.substack.com

#code

0 181 0

2025-03-13 01:41:39 UTC

New

Other popular topics

General Dev>Hardware

Moonlander Keyboard (Mechanical) (Ergonomic) (Split) (Ortholinear)

Bought the Moonlander mechanical keyboard. Cherry Brown MX switches. Arms and wrists have been hurting enough that it’s time I did someth...

#hardware /keyboards #moonlander #mechanical-keyboards #ortholinear #ergonomic

212 15008 90

2021-07-13 15:33:55 UTC

New

General Dev>Hardware

Seen any cool new keyboards?

We have a thread about the keyboards we have, but what about nice keyboards we come across that we want? If you have seen any that look n...

/keyboards #mechanical-keyboards

49 5284 39

2025-05-10 22:54:44 UTC

New

Linux>Chat

RancherOS is in end of life

Oh just spent so much time on this to discover now that RancherOS is in end of life but Rancher is refusing to mark the Github repo as su...

#linux #rancheros

10 5256 6

2021-01-30 21:04:03 UTC

New

Backend>Learning Resources

Programming Phoenix LiveView

Build highly interactive applications without ever leaving Elixir, the way the experts do. Let LiveView take care of performance, scalabi...

pragprog.com

#pragprog /elixir /phoenix #published-book /book-programming-phoenix-liveview

61 3882 14

2022-10-26 00:51:43 UTC

New

Science/Tech>Health & Diet

Did you manage to avoid covid19?

Seems like a lot of people caught it - just wondered whether any of you did? As far as I know I didn’t, but it wouldn’t surprise me if I...

#covid19

190 3839 79

2022-10-27 05:12:52 UTC

New

General Dev>Blogs/Talks

Failing Big with Elixir and LiveView - A Post-Mortem

Here’s the story how one of the world’s first production deployments of LiveView came to be - and how trying to improve it almost caused ...

pentacent.com

/elixir /phoenix #blog-post #liveview

37 2727 14

2021-06-11 08:31:36 UTC

New

General Dev>Dev Chat

Roc Language - a new purely functional programming language built for speed and ergonomics

Hi folks, I don’t know if I saw this here but, here’s a new programming language, called Roc Reminds me a bit of Elm and thus Haskell. ...

#programminguages #functional-programming

49 4462 14

2021-11-10 20:03:09 UTC

New

Community>In The Spotlight

Spotlight: Rebecca Skinner (Author) Interview and AMA!

Author Spotlight Rebecca Skinner @RebeccaSkinner Welcome to our latest author spotlight, where we sit down with Rebecca Skinner, auth...

#author-spotlight /haskell /book-effective-haskell

106 10605 28

2022-11-16 10:29:37 UTC

New

Community>In The Spotlight

Spotlight: VM Brasseur (Author) Interview and AMA!

Author Spotlight: VM Brasseur @vmbrasseur We have a treat for you today! We turn the spotlight onto Open Source as we sit down with V...

#author-spotlight /book-forge-your-future-with-open-source

16 4113 11

2023-03-27 16:00:12 UTC

New

General Dev>In The News

Review of Linux on Minisforum V3 AMD Ryzen Tablet

A Brief Review of the Minisforum V3 AMD Tablet. Update: I have created an awesome-minisforum-v3 GitHub repository to list information fo...

mudkip.me

#linux #review #amd

0 1782 0

2024-06-24 02:26:38 UTC

New

General Dev>In The News

What went wrong for Yahoo

General Dev>In The News

Heredocs Can Make Your Bash Scripts Self-Documenting | Hold The Robot

General Dev>In The News

From Async/Await to Virtual Threads

General Dev>In The News

The Future is NOT Self-Hosted

General Dev>In The News

Celebrating 20 years of MDN | MDN Blog

General Dev>In The News

Google Spoofed Via DKIM Replay Attack: A Technical Breakdown

General Dev>In The News

The Promised LAN

General Dev>In The News

FastVLM: Efficient Vision Encoding for Vision Language Models

General Dev>In The News

Reverse engineering GitHub Actions cache to make it fast | Blacksmith

General Dev>In The News

SQL Injection as a Feature

General Dev>In The News

General Dev In The News ❯

Latest on Devtalk

Debian: Debconf25 welcomes its sponsors

Linux>Official News

What went wrong for Yahoo

General Dev>In The News

CentOS Board Meeting Recap, June 2025

Linux>Official News

Heredocs Can Make Your Bash Scripts Self-Documenting | Hold The Robot

General Dev>In The News

From Async/Await to Virtual Threads

General Dev>In The News

Steve Jobs' cabinet

macOS>In The News

The Future is NOT Self-Hosted

General Dev>In The News

How Anthropic teams use Claude Code

AI>In The News

3D Models and Free Textures Resources

Game Dev>Game

Fable 5.0.0-alpha.14 released!

Frontend>Official News

The dawn of quantum advantage | IBM Quantum Computing Blog

Quantum Computing

Intel CEO Letter to Employees

AI>In The News

Graphene OS: a security-enhanced Android build

Android>In The News

Celebrating 20 years of MDN | MDN Blog

General Dev>In The News

Google Spoofed Via DKIM Replay Attack: A Technical Breakdown

General Dev>In The News

Devtalk ❯

We ❤️ helpful members!

We reward our most helpful members via our MOTM scheme - by giving away a whopping 25 books per year!

Sub Categories:

We're in Beta

About us Mission Statement See our Roadmap

Training Language Models to Self-Correct via Reinforcement Learning

CommunityNews

Training Language Models to Self-Correct via Reinforcement Learning

Where Next?

Popular General Dev topics

Top 5 programming languages for web developers to learn

I wired my tree with 500 LED lights and calculated their 3D coordinates

The faster you unlearn OOP, the better for you and your software

Japan government backs 4-day workweek

Kinesis Advantage360 Ergonomic Keyboard

LG 28-inch 16:18 DualUp Monitor

Testing Intel’s Arc A770 GPU for Deep Learning

JavaScript Fatigue Strikes Back

Knowing CSS is mastery to Front end Development

Should managers still code?

Other popular topics

Moonlander Keyboard (Mechanical) (Ergonomic) (Split) (Ortholinear)

Seen any cool new keyboards?

RancherOS is in end of life

Programming Phoenix LiveView

Did you manage to avoid covid19?

Failing Big with Elixir and LiveView - A Post-Mortem

Roc Language - a new purely functional programming language built for speed and ergonomics

Spotlight: Rebecca Skinner (Author) Interview and AMA!

Spotlight: VM Brasseur (Author) Interview and AMA!

Review of Linux on Minisforum V3 AMD Ryzen Tablet

Sponsor Spotlight

General Dev>In The News

Latest on Devtalk

We ❤️ helpful members!

Devtalk Sponsors

Categories:

Sub Categories:

Popular Portals

Devtalk Sponsors

We're in Beta