CommunityNews

CommunityNews

Test-time inference has emerged as a powerful paradigm for enabling language models to ``think’’ longer and more carefully about complex challenges, much like skilled human experts. While reinforcement learning (RL) can drive self-improvement in language models on verifiable tasks, some models exhibit substantial gains while others quickly plateau. For instance, we find that Qwen-2.5-3B far exceeds Llama-3.2-3B under identical RL training for the game of Countdown. This discrepancy raises a critical question: what intrinsic properties enable effective self-improvement? We introduce a framework to investigate this question by analyzing four key cognitive behaviors – verification, backtracking, subgoal setting, and backward chaining – that both expert human problem solvers and successful language models employ. Our study reveals that Qwen naturally exhibits these reasoning behaviors, whereas Llama initially lacks them. In systematic experimentation with controlled behavioral datasets, we find that priming Llama with examples containing these reasoning behaviors enables substantial improvements during RL, matching or exceeding Qwen’s performance. Importantly, the presence of reasoning behaviors, rather than correctness of answers, proves to be the critical factor – models primed with incorrect solutions containing proper reasoning patterns achieve comparable performance to those trained on correct solutions. Finally, leveraging continued pretraining with OpenWebMath data, filtered to amplify reasoning behaviors, enables the Llama model to match Qwen’s self-improvement trajectory. Our findings establish a fundamental relationship between initial reasoning behaviors and the capacity for improvement, explaining why some language models effectively utilize additional computation while others plateau.

Read in full here:

Popular General Dev topics Top

AstonJ
Always interested in seeing what apps people use and how they organise their phones/home screens! Here’s mine…
New
AstonJ
I’ve been hearing quite a lot of comments relating to the sound of a keyboard, with one of the most desirable of these called ‘thock’, he...
New
Rainer
Not sure if following fits exactly this thread, or if we should have a hobby thread… For many years I’m designing and building model air...
New
herminiotorres
Someone where use Doom Emacs right now? I like to starting this topic to discuss it and learn a little bit more, not just only the emacs ...
New
AstonJ
Things like smart speakers (such Amazon Alexa), smart TVs or other devices with built in microphones, cameras or with other features that...
New
New
First poster: bot
To build a web application you need to make architecture decisions across a range of topics. The beauty of Ruby on Rails or Django is tha...
New
First poster: bot
Developing Godot Projects with Neovim. When I started using Godot Engine, what surprised me the most is the built-in Language Server Pro...
New
CommunityNews
9 fintech engineering mistakes. Read this list unless you want to build a money dissappearing system
New
New

Other popular topics Top

wolf4earth
@AstonJ prompted me to open this topic after I mentioned in the lockdown thread how I started to do a lot more for my fitness. https://f...
New
PragmaticBookshelf
A PragProg Hero’s Journey with Brian P. Hogan @bphogan Have you ever worried that your only legacy will be in the form of legacy...
New
Exadra37
I am thinking in building or buy a desktop computer for programing, both professionally and on my free time, and my choice of OS is Linux...
New
New
AstonJ
Just done a fresh install of macOS Big Sur and on installing Erlang I am getting: asdf install erlang 23.1.2 Configure failed. checking ...
New
Exadra37
I am asking for any distro that only has the bare-bones to be able to get a shell in the server and then just install the packages as we ...
New
AstonJ
Seems like a lot of people caught it - just wondered whether any of you did? As far as I know I didn’t, but it wouldn’t surprise me if I...
New
mafinar
This is going to be a long an frequently posted thread. While talking to a friend of mine who has taken data structure and algorithm cou...
New
New
First poster: bot
Large Language Models like ChatGPT say The Darnedest Things. The Errors They MakeWhy We Need to Document Them, and What We Have Decided ...
New