CommunityNews

CommunityNews

Outcome-based Reinforcement Learning to Predict the Future

Reinforcement learning with verifiable rewards (RLVR) has boosted math and coding in large language models, yet there has been little effort to extend RLVR into messier, real-world domains like forecasting. One sticking point is that outcome-based reinforcement learning for forecasting must learn from binary, delayed, and noisy rewards, a regime where standard fine-tuning is brittle. We show that outcome-only online RL on a 14B model can match frontier-scale accuracy and surpass it in calibration and hypothetical prediction market betting by adapting two leading algorithms, Group-Relative Policy Optimisation (GRPO) and ReMax, to the forecasting setting. Our adaptations remove per-question variance scaling in GRPO, apply baseline-subtracted advantages in ReMax, hydrate training with 100k temporally consistent synthetic questions, and introduce lightweight guard-rails that penalise gibberish, non-English responses and missing rationales, enabling a single stable pass over 110k events. Scaling ReMax to 110k questions and ensembling seven predictions yields a 14B model that matches frontier baseline o1 on accuracy on our holdout set (Brier = 0.193, p = 0.23) while beating it in calibration (ECE = 0.042, p < 0.001). A simple trading rule turns this calibration edge into $127 of hypothetical profit versus $92 for o1 (p = 0.037). This demonstrates that refined RLVR methods can convert small-scale LLMs into potentially economically valuable forecasting tools, with implications for scaling this to larger models.

Read in full here:

Where Next?

Popular Ai topics Top

First poster: bot
NVIDIA Doubles Down: Announces A100 80GB GPU, Supercharging World’s Most Powerful GPU for AI Supercomputing. SC20—NVIDIA today unveiled ...
New
First poster: CommunityNews
In their decades-long chase to create artificial intelligence, computer scientists have designed and developed all kinds of complicated m...
New
First poster: CommunityNews
Should we be concerned that the decisions of AIs are inscrutable? | Psyche Ideas. Machine learning is a black box – even when the decisi...
New
First poster: CommunityNews
Imagine you’re sitting at a casino’s poker table. Someone has explained the basic rules to you, but you’ve never played before and don’t ...
New
New
First poster: joni
My experience trying to write original, full-length human-sounding articles using Claude AI. You can use AI tools like Claude to help yo...
New
First poster: brennan
It’s Not a Hypothetical, I’ve Already Lost My Job to AI For The Last Year
New
New
First poster: happyrat1
With a leap in the evolution of large language models, some leading thinkers are questioning whether AI might become sentient
New
First poster: AstonJ
From fear to optimism: why I am convinced AI is worth embracing.
New

Other popular topics Top

PragmaticBookshelf
Take your Go skills to the next level by learning how to design, develop, and deploy a distributed service. Start from the bare essential...
New
PragmaticBookshelf
Learn from the award-winning programming series that inspired the Elixir language, and go on a step-by-step journey through the most impo...
New
PragmaticBookshelf
Ruby, Io, Prolog, Scala, Erlang, Clojure, Haskell. With Seven Languages in Seven Weeks, by Bruce A. Tate, you’ll go beyond the syntax—and...
New
brentjanderson
Bought the Moonlander mechanical keyboard. Cherry Brown MX switches. Arms and wrists have been hurting enough that it’s time I did someth...
New
AstonJ
We have a thread about the keyboards we have, but what about nice keyboards we come across that we want? If you have seen any that look n...
New
PragmaticBookshelf
Tailwind CSS is an exciting new CSS framework that allows you to design your site by composing simple utility classes to create complex e...
New
DevotionGeo
The V Programming Language Simple language for building maintainable programs V is already mentioned couple of times in the forum, but I...
New
mafinar
This is going to be a long an frequently posted thread. While talking to a friend of mine who has taken data structure and algorithm cou...
New
New
New