CommunityNews

Cognitive Behaviors that Enable Self-Improving Reasoners, or, Four Habits of Highly Effective STaRs

Test-time inference has emerged as a powerful paradigm for enabling language models to ``think’’ longer and more carefully about complex challenges, much like skilled human experts. While reinforcement learning (RL) can drive self-improvement in language models on verifiable tasks, some models exhibit substantial gains while others quickly plateau. For instance, we find that Qwen-2.5-3B far exceeds Llama-3.2-3B under identical RL training for the game of Countdown. This discrepancy raises a critical question: what intrinsic properties enable effective self-improvement? We introduce a framework to investigate this question by analyzing four key cognitive behaviors – verification, backtracking, subgoal setting, and backward chaining – that both expert human problem solvers and successful language models employ. Our study reveals that Qwen naturally exhibits these reasoning behaviors, whereas Llama initially lacks them. In systematic experimentation with controlled behavioral datasets, we find that priming Llama with examples containing these reasoning behaviors enables substantial improvements during RL, matching or exceeding Qwen’s performance. Importantly, the presence of reasoning behaviors, rather than correctness of answers, proves to be the critical factor – models primed with incorrect solutions containing proper reasoning patterns achieve comparable performance to those trained on correct solutions. Finally, leveraging continued pretraining with OpenWebMath data, filtered to amplify reasoning behaviors, enables the Llama model to match Qwen’s self-improvement trajectory. Our findings establish a fundamental relationship between initial reasoning behaviors and the capacity for improvement, explaining why some language models effectively utilize additional computation while others plateau.

Read in full here:

View thread on forum

0 62 0

2025-03-14 17:52:40 UTC

Popular General Dev topics

In The News

Pocketlang - a small, fast, functional language written in C and syntactically similar to Ruby

Pocketlang is a small (~3000 semicolons) and fast functional language written in C. It’s syntactically similar to Ruby and it can be lear...

github.com

/ruby #pocketlang

14 1153 5

2021-06-28 15:28:53 UTC

New

In The News

Rust Is the Future of JavaScript Infrastructure

Rust Is The Future of JavaScript Infrastructure – Lee Robinson. Why is Rust being used to replace parts of the JavaScript web ecosystem ...

leerob.io

/rust /js #infrastructure

20 1130 7

2021-11-18 05:07:25 UTC

New

In The News

A simple WASM maze generator in Go (demo)

GitHub - deadpixi/wasm-maze-generator: A simple WASM maze generator in Go. A simple WASM maze generator in Go. Contribute to deadpixi/wa...

github.com

/wasm /go #demo

8 1445 2

2022-01-21 15:16:43 UTC

New

In The News

Kyria Build, Part 1: A wireless ergonomic keyboard

It has some interesting features: It’s entirely wireless (the left half speaks Bluetooth to the right half, and the right half speaks B...

ianthehenry.com

#ergonomic #keyboard

0 1049 0

2022-01-21 02:06:58 UTC

New

In The News

Why Flutter is the most popular cross-platform mobile SDK

Why Flutter is the most popular cross-platform mobile SDK. Running a development team for each mobile platform sucks up resources from o...

stackoverflow.blog

/flutter #mobile

34 1678 14

2023-05-31 06:23:34 UTC

New

In The News

C++ Cheat Sheets

C++ Cheat Sheets & Infographics. Graphics and cheat sheets, each capturing one aspect of C++: algorithms/containers/STL, language ba...

hackingcpp.com

/c-plus-plus

7 1317 1

2022-03-07 03:29:25 UTC

New

In The News

NimSkull: A Hard Fork of Nim

GitHub - nim-works/nimskull: An in development statically typed systems programming language; with sustainability at its core. We, the co...

github.com

/nim

0 1051 0

2022-07-08 12:47:55 UTC

New

In The News

How to design a good API and why it matters (2006)

ABSTRACT In lieu of a traditional , I’ve tried to distill the essence of the talk into a collection of maxims: All programmers are API ...

dl.acm.org

#design #api

2 868 1

2022-10-07 10:11:24 UTC

New

In The News

Introducing the ChatGPT/LLM error tracker

Large Language Models like ChatGPT say The Darnedest Things. The Errors They MakeWhy We Need to Document Them, and What We Have Decided ...

garymarcus.substack.com

#chatgpt #error

0 2660 0

2023-01-13 04:03:10 UTC

New

In The News

Dark mode is not as good for your eyes as you believe (2019)

Dark mode isn’t as good for your eyes as you believe. The shadowy display mode has leagues of fans claiming it helps reduce eye strain, ...

wired.com

3 670 2

2024-07-09 23:57:17 UTC

New

Other popular topics

General Dev Chat

HELLO WORLD (Introductions thread!)

Hello Devtalk World! Please let us know a little about who you are and where you’re from :nerd_face:

#community

476 5558 111

2024-11-03 11:56:18 UTC

New

Code Editors

Onivim 2 Code Editor

Thanks to @foxtrottwist’s and @Tomas’s posts in this thread: Poll: Which code editor do you use? I bought Onivim! :nerd_face: https://on...

#code-editors /onivim /revery

88 5129 32

2023-05-15 07:32:26 UTC

New

macOS Developer Forum>Chat/Discussions

Your Mac Isn't Yours - worse, it spies on you and sends it home unencrypted

On modern versions of macOS, you simply can’t power on your computer, launch a text editor or eBook reader, and write or read, without a ...

sneak.berlin

#macos #macs #apple #privacy #mac-privacy #big-sur

24 3445 24

2021-02-02 11:13:29 UTC

New

General Dev Chat

Do you have any (non-dev) hobbies?

Not sure if following fits exactly this thread, or if we should have a hobby thread… For many years I’m designing and building model air...

#community

200 3400 78

2025-01-24 20:03:51 UTC

New

General Dev Chat

Languages Without Garbage Collection

Continuing the discussion from Thinking about learning Crystal, let’s discuss - I was wondering which languages don’t GC - maybe we can c...

#garbage-collection

21 4528 7

2021-05-06 05:54:58 UTC

New

In The Spotlight

Spotlight: Mike Riley (Author) Interview and AMA!

Author Spotlight Mike Riley @mriley This month, we turn the spotlight on Mike Riley, author of Portable Python Projects. Mike’s book ...

#author-spotlight /python /book-portable-python-projects #iot #internet-of-things

62 6172 19

2022-06-09 14:01:01 UTC

New

In The News

The overengineered Solution to my Pigeon Problem

The overengineered Solution to my Pigeon Problem. TL;DR: I built a wifi-equipped water gun to shoot the pigeons on my balcony, controlle...

maxnagy.com

0 4203 0

2022-05-15 23:35:45 UTC

New

In The Spotlight

Spotlight: Erin Dees (Author) Interview and AMA!

Author Spotlight Erin Dees @undees Welcome to our new author spotlight! We had the pleasure of chatting with Erin Dees, co-author of ...

#author-spotlight /ruby /rails /book-effective-testing-with-rspec-3 #rspec #book-seven-moreuages-in-seven-weeks /book-cucumber-recipes

24 3462 11

2023-03-27 15:52:01 UTC

New

General Questions/Help

Do you prefer regular mechanical keyboards or low profile mechanical keyboards and why?

I have always used antique keyboards like Cherry MX 1800 or Cherry MX 8100 and almost always have modified the switches in some way, like...

/keyboards #mechanical-keyboards

27 2528 9

2023-02-06 21:10:15 UTC

New

In The News

Zig now has built-in HTTP server and client in std

zig/http.zig at 7cf2cbb33ef34c1d211135f56d30fe23b6cacd42 · ziglang/zig. General-purpose programming language and toolchain for maintaini...

github.com

/zig #http

0 2506 0

2023-05-19 00:35:41 UTC

New

Latest in In The News

DOGE software engineer’s computer infected by info-stealing malware

In The News

Elon Musk is responsible for “killing the world’s poorest children,” says Bill Gates

In The News

Hardening GitHub Actions: Lessons from Recent Attacks | Wiz Blog

In The News

OpenSearch 3.0 Enhances Vector Database Performance, Search Infrastructure and Scalability to Meet AI-driven Demand - OpenSearch

In The News

DOCX Converter

In The News

DoorDash to buy British food delivery firm Deliveroo for $3.9 billion in overseas push

In The News

Matt Godbolt sold me on Rust (by showing me C++)

In The News

Preparing for when the Machine Stops

In The News

Despite misleading marketing, Israeli company TeleMessage, used by Trump officials, can access plaintext chat logs

In The News

RK3588 - Implementing a Vectorscope for processing video in real time

In The News

An Interactive Debugger for Rust Trait Errors

In The News

You can’t Git clone a team

In The News

Design for 3D-Printing - Rahix' Blog

In The News

Orders of infinity

In The News

Functional HTML — overreacted

In The News

Progressive Dehancement

In The News

Chips aren’t improving like they used to, and it’s killing game console price cuts

In The News

DuckDB is Probably the Most Important Geospatial Software of the Last Decade

In The News

Speedrunning and Modding The Incredibles: Rise of the Underminer

In The News

🔍 Why I stopped angel investing after 15 years (and what I'm doing instead)

In The News

View all threads ❯

Latest (all)

Wikidive - AI-Guided Wikipedia Exploration

AI In The News

Node.js v24.0.1 released!

Backend News (official)

Mind-reading AI recreates what you're looking at with amazing accuracy

AI In The News

Why do LLMs have emergent properties?

AI In The News

Announcing Ivar: Ruby’s Missing Instance Variable Typo Warnings

Backend In The News

DOGE software engineer’s computer infected by info-stealing malware

In The News

Google hits back after Apple exec says AI is hurting search

macOS In The News

Present and Future of Kotlin for Web

Backend News (official)

Plasma 6.3.5 update for Kubuntu 25.04 available via PPA

Linux News (official)

Elon Musk is responsible for “killing the world’s poorest children,” says Bill Gates

In The News

Memory Safety Features in Zig

Backend In The News

Hardening GitHub Actions: Lessons from Recent Attacks | Wiz Blog

In The News

Erlang OTP-27.3.4 released!

Backend News (official)

Scala 3.3.6 LTS is now available!

Backend News (official)

AI is Like Cars

Blogs/Articles/Talks/Podcasts

View all threads ❯

We ❤️ helpful members!

We reward our most helpful members via our MOTM scheme - by giving away a whopping 25 books per year!

We're in Beta

About us Mission Statement See our Roadmap

Cognitive Behaviors that Enable Self-Improving Reasoners, or, Four Habits of Highly Effective STaRs

CommunityNews

Cognitive Behaviors that Enable Self-Improving Reasoners, or, Four Habits of Highly Effective STaRs

Popular General Dev topics

Pocketlang - a small, fast, functional language written in C and syntactically similar to Ruby

Rust Is the Future of JavaScript Infrastructure

A simple WASM maze generator in Go (demo)

Kyria Build, Part 1: A wireless ergonomic keyboard

Why Flutter is the most popular cross-platform mobile SDK

C++ Cheat Sheets

NimSkull: A Hard Fork of Nim

How to design a good API and why it matters (2006)

Introducing the ChatGPT/LLM error tracker

Dark mode is not as good for your eyes as you believe (2019)

Other popular topics

HELLO WORLD (Introductions thread!)

Onivim 2 Code Editor

Your Mac Isn't Yours - worse, it spies on you and sends it home unencrypted

Do you have any (non-dev) hobbies?

Languages Without Garbage Collection

Spotlight: Mike Riley (Author) Interview and AMA!

The overengineered Solution to my Pigeon Problem

Spotlight: Erin Dees (Author) Interview and AMA!

Do you prefer regular mechanical keyboards or low profile mechanical keyboards and why?

Zig now has built-in HTTP server and client in std

Latest in In The News

Latest (all)

We ❤️ helpful members!

Categories:

Popular Portals

We're in Beta

Cognitive Behaviors that Enable Self-Improving Reasoners, or, Four Habits of Highly Effective STaRs

CommunityNews

Cognitive Behaviors that Enable Self-Improving Reasoners, or, Four Habits of Highly Effective STaRs

Popular General Dev topics

Pocketlang - a small, fast, functional language written in C and syntactically similar to Ruby

Rust Is the Future of JavaScript Infrastructure

A simple WASM maze generator in Go (demo)

Kyria Build, Part 1: A wireless ergonomic keyboard

Why Flutter is the most popular cross-platform mobile SDK

C++ Cheat Sheets

NimSkull: A Hard Fork of Nim

How to design a good API and why it matters (2006)

Introducing the ChatGPT/LLM error tracker

Dark mode is not as good for your eyes as you believe (2019)

Other popular topics

HELLO WORLD (Introductions thread!)

Onivim 2 Code Editor

Your Mac Isn't Yours - worse, it spies on you and sends it home unencrypted

Do you have any (non-dev) hobbies?

Languages Without Garbage Collection

Spotlight: Mike Riley (Author) Interview and AMA!

The overengineered Solution to my Pigeon Problem

Spotlight: Erin Dees (Author) Interview and AMA!

Do you prefer regular mechanical keyboards or low profile mechanical keyboards and why?

Zig now has built-in HTTP server and client in std

Latest in In The News

Latest (all)

We ❤️ helpful members!

Devtalk Sponsors

Categories:

Popular Portals

Devtalk Sponsors

We're in Beta