CommunityNews

CommunityNews

Outcome-based Reinforcement Learning to Predict the Future

Reinforcement learning with verifiable rewards (RLVR) has boosted math and coding in large language models, yet there has been little effort to extend RLVR into messier, real-world domains like forecasting. One sticking point is that outcome-based reinforcement learning for forecasting must learn from binary, delayed, and noisy rewards, a regime where standard fine-tuning is brittle. We show that outcome-only online RL on a 14B model can match frontier-scale accuracy and surpass it in calibration and hypothetical prediction market betting by adapting two leading algorithms, Group-Relative Policy Optimisation (GRPO) and ReMax, to the forecasting setting. Our adaptations remove per-question variance scaling in GRPO, apply baseline-subtracted advantages in ReMax, hydrate training with 100k temporally consistent synthetic questions, and introduce lightweight guard-rails that penalise gibberish, non-English responses and missing rationales, enabling a single stable pass over 110k events. Scaling ReMax to 110k questions and ensembling seven predictions yields a 14B model that matches frontier baseline o1 on accuracy on our holdout set (Brier = 0.193, p = 0.23) while beating it in calibration (ECE = 0.042, p < 0.001). A simple trading rule turns this calibration edge into $127 of hypothetical profit versus $92 for o1 (p = 0.037). This demonstrates that refined RLVR methods can convert small-scale LLMs into potentially economically valuable forecasting tools, with implications for scaling this to larger models.

Read in full here:

Where Next?

Popular Ai topics Top

First poster: CommunityNews
SOME OF THE most dazzling recent advances in artificial intelligence have come thanks to resources only available at big tech companies, ...
New
First poster: jacobtriton
Why AI is Harder Than We Think. Since its beginning in the 1950s, the field of artificial intelligence has cycled several times between...
New
First poster: bot
Language technology powered by AI can perpetuate bias if we are not careful. We need to be sure that language AI is trained to be ethical...
New
First poster: CommunityNews
Getting a glimpse into Nvidia’s R&amp;D has become a regular feature of the spring GTC conference with Bill Dally, chief scientist and se...
New
First poster: bot
Autonomous Drones Challenge Human Champions in First “Fair” Race. Watching robots operate with speed and precision is always impressive,...
New
First poster: bot
Ghostwriter - Code faster with AI. An AI pair programmer that helps you write better code, faster.
New
First poster: CommunityNews
OpenJourney is a Text-to-Image AI model which has the goal of bringing an open source equivalent to Midjourney to the people. It is curre...
New
First poster: gflashner
Google’s openly available Gemma collection of AI models has reached a milestone: over 150 million downloads. Omar Sanseviero, a developer...
New
First poster: happyrat1
With a leap in the evolution of large language models, some leading thinkers are questioning whether AI might become sentient
New
CommunityNews
Comparison and ranking the performance of over 100 AI models (LLMs) across key metrics including intelligence, price, performance and spe...
New

Other popular topics Top

Devtalk
Hello Devtalk World! Please let us know a little about who you are and where you’re from :nerd_face:
New
PragmaticBookshelf
Learn from the award-winning programming series that inspired the Elixir language, and go on a step-by-step journey through the most impo...
New
AstonJ
What chair do you have while working… and why? Is there a ‘best’ type of chair or working position for developers?
New
dasdom
No chair. I have a standing desk. This post was split into a dedicated thread from our thread about chairs :slight_smile:
New
mafinar
This is going to be a long an frequently posted thread. While talking to a friend of mine who has taken data structure and algorithm cou...
New
AstonJ
If you want a quick and easy way to block any website on your Mac using Little Snitch simply… File &gt; New Rule: And select Deny, O...
New
New
First poster: AstonJ
Jan | Rethink the Computer. Jan turns your computer into an AI machine by running LLMs locally on your computer. It’s a privacy-focus, l...
New
RobertRichards
Hair Salon Games for Girls Fun Girls Hair Saloon game is mainly developed for kids. This game allows users to select virtual avatars to ...
New
PragmaticBookshelf
A concise guide to MySQL 9 database administration, covering fundamental concepts, techniques, and best practices. Neil Smyth MySQL...
New