CommunityNews

CommunityNews

Cognitive Behaviors that Enable Self-Improving Reasoners, or, Four Habits of Highly Effective STaRs

Test-time inference has emerged as a powerful paradigm for enabling language models to ``think’’ longer and more carefully about complex challenges, much like skilled human experts. While reinforcement learning (RL) can drive self-improvement in language models on verifiable tasks, some models exhibit substantial gains while others quickly plateau. For instance, we find that Qwen-2.5-3B far exceeds Llama-3.2-3B under identical RL training for the game of Countdown. This discrepancy raises a critical question: what intrinsic properties enable effective self-improvement? We introduce a framework to investigate this question by analyzing four key cognitive behaviors – verification, backtracking, subgoal setting, and backward chaining – that both expert human problem solvers and successful language models employ. Our study reveals that Qwen naturally exhibits these reasoning behaviors, whereas Llama initially lacks them. In systematic experimentation with controlled behavioral datasets, we find that priming Llama with examples containing these reasoning behaviors enables substantial improvements during RL, matching or exceeding Qwen’s performance. Importantly, the presence of reasoning behaviors, rather than correctness of answers, proves to be the critical factor – models primed with incorrect solutions containing proper reasoning patterns achieve comparable performance to those trained on correct solutions. Finally, leveraging continued pretraining with OpenWebMath data, filtered to amplify reasoning behaviors, enables the Llama model to match Qwen’s self-improvement trajectory. Our findings establish a fundamental relationship between initial reasoning behaviors and the capacity for improvement, explaining why some language models effectively utilize additional computation while others plateau.

Read in full here:

Popular General Dev topics Top

First poster: AstonJ
Pocketlang is a small (~3000 semicolons) and fast functional language written in C. It’s syntactically similar to Ruby and it can be lear...
New
First poster: OvermindDL1
Rust Is The Future of JavaScript Infrastructure – Lee Robinson. Why is Rust being used to replace parts of the JavaScript web ecosystem ...
New
First poster: OvermindDL1
GitHub - deadpixi/wasm-maze-generator: A simple WASM maze generator in Go. A simple WASM maze generator in Go. Contribute to deadpixi/wa...
New
First poster: bot
It has some interesting features: It’s entirely wireless (the left half speaks Bluetooth to the right half, and the right half speaks B...
New
First poster: gflashner
Why Flutter is the most popular cross-platform mobile SDK. Running a development team for each mobile platform sucks up resources from o...
New
CommunityNews
C++ Cheat Sheets & Infographics. Graphics and cheat sheets, each capturing one aspect of C++: algorithms/containers/STL, language ba...
New
First poster: bot
GitHub - nim-works/nimskull: An in development statically typed systems programming language; with sustainability at its core. We, the co...
New
CommunityNews
ABSTRACT In lieu of a traditional , I’ve tried to distill the essence of the talk into a collection of maxims: All programmers are API ...
New
First poster: bot
Large Language Models like ChatGPT say The Darnedest Things. The Errors They MakeWhy We Need to Document Them, and What We Have Decided ...
New
First poster: jkdiaz
Dark mode isn’t as good for your eyes as you believe. The shadowy display mode has leagues of fans claiming it helps reduce eye strain, ...
New

Other popular topics Top

Devtalk
Hello Devtalk World! Please let us know a little about who you are and where you’re from :nerd_face:
New
AstonJ
Thanks to @foxtrottwist’s and @Tomas’s posts in this thread: Poll: Which code editor do you use? I bought Onivim! :nerd_face: https://on...
New
Exadra37
On modern versions of macOS, you simply can’t power on your computer, launch a text editor or eBook reader, and write or read, without a ...
New
Rainer
Not sure if following fits exactly this thread, or if we should have a hobby thread… For many years I’m designing and building model air...
New
AstonJ
Continuing the discussion from Thinking about learning Crystal, let’s discuss - I was wondering which languages don’t GC - maybe we can c...
New
PragmaticBookshelf
Author Spotlight Mike Riley @mriley This month, we turn the spotlight on Mike Riley, author of Portable Python Projects. Mike’s book ...
New
First poster: bot
The overengineered Solution to my Pigeon Problem. TL;DR: I built a wifi-equipped water gun to shoot the pigeons on my balcony, controlle...
New
PragmaticBookshelf
Author Spotlight Erin Dees @undees Welcome to our new author spotlight! We had the pleasure of chatting with Erin Dees, co-author of ...
New
DevotionGeo
I have always used antique keyboards like Cherry MX 1800 or Cherry MX 8100 and almost always have modified the switches in some way, like...
New
First poster: bot
zig/http.zig at 7cf2cbb33ef34c1d211135f56d30fe23b6cacd42 · ziglang/zig. General-purpose programming language and toolchain for maintaini...
New

Latest in In The News