CommunityNews

CommunityNews

Outcome-based Reinforcement Learning to Predict the Future

Reinforcement learning with verifiable rewards (RLVR) has boosted math and coding in large language models, yet there has been little effort to extend RLVR into messier, real-world domains like forecasting. One sticking point is that outcome-based reinforcement learning for forecasting must learn from binary, delayed, and noisy rewards, a regime where standard fine-tuning is brittle. We show that outcome-only online RL on a 14B model can match frontier-scale accuracy and surpass it in calibration and hypothetical prediction market betting by adapting two leading algorithms, Group-Relative Policy Optimisation (GRPO) and ReMax, to the forecasting setting. Our adaptations remove per-question variance scaling in GRPO, apply baseline-subtracted advantages in ReMax, hydrate training with 100k temporally consistent synthetic questions, and introduce lightweight guard-rails that penalise gibberish, non-English responses and missing rationales, enabling a single stable pass over 110k events. Scaling ReMax to 110k questions and ensembling seven predictions yields a 14B model that matches frontier baseline o1 on accuracy on our holdout set (Brier = 0.193, p = 0.23) while beating it in calibration (ECE = 0.042, p < 0.001). A simple trading rule turns this calibration edge into $127 of hypothetical profit versus $92 for o1 (p = 0.037). This demonstrates that refined RLVR methods can convert small-scale LLMs into potentially economically valuable forecasting tools, with implications for scaling this to larger models.

Read in full here:

Where Next?

Popular Ai topics Top

New
First poster: CommunityNews
AI models are increasingly applied in high-stakes domains like health and conservation. Data quality carries an elevated signifi- cance i...
New
First poster: CommunityNews
Imagine you’re sitting at a casino’s poker table. Someone has explained the basic rules to you, but you’ve never played before and don’t ...
New
New
CommunityNews
GitHub - MadRabbit/halmak: The final version of the AI designed keyboard layout. The final version of the AI designed keyboard layout - ...
New
First poster: bot
Autonomous Drones Challenge Human Champions in First “Fair” Race. Watching robots operate with speed and precision is always impressive,...
New
First poster: bot
DeepMind AI learns simple physics like a baby. Neural network could be a step towards programs for studying how human infants learn.
New
New
AstonJ
This is cool! DEEPSEEK-V3 ON M4 MAC: BLAZING FAST INFERENCE ON APPLE SILICON We just witnessed something incredible: the largest open-s...
New
CommunityNews
I run Claude Code with --dangerously-skip-permissions flag, giving it full system access. Let me show you a new way of approaching comput...
New

Other popular topics Top

AstonJ
What chair do you have while working… and why? Is there a ‘best’ type of chair or working position for developers?
New
Exadra37
Please tell us what is your preferred monitor setup for programming(not gaming) and why you have chosen it. Does your monitor have eye p...
New
AstonJ
This looks like a stunning keycap set :orange_heart: A LEGENDARY KEYBOARD LIVES ON When you bought an Apple Macintosh computer in the e...
New
Exadra37
Oh just spent so much time on this to discover now that RancherOS is in end of life but Rancher is refusing to mark the Github repo as su...
New
AstonJ
Continuing the discussion from Thinking about learning Crystal, let’s discuss - I was wondering which languages don’t GC - maybe we can c...
New
AstonJ
We’ve talked about his book briefly here but it is quickly becoming obsolete - so he’s decided to create a series of 7 podcasts, the firs...
New
PragmaticBookshelf
Build efficient applications that exploit the unique benefits of a pure functional language, learning from an engineer who uses Haskell t...
New
PragmaticBookshelf
Get the comprehensive, insider information you need for Rails 8 with the new edition of this award-winning classic. Sam Ruby @rubys ...
New
AnfaengerAlex
Hello, I’m a beginner in Android development and I’m facing an issue with my project setup. In my build.gradle.kts file, I have the foll...
New
AstonJ
This is cool! DEEPSEEK-V3 ON M4 MAC: BLAZING FAST INFERENCE ON APPLE SILICON We just witnessed something incredible: the largest open-s...
New