CommunityNews

CommunityNews

Outcome-based Reinforcement Learning to Predict the Future

Reinforcement learning with verifiable rewards (RLVR) has boosted math and coding in large language models, yet there has been little effort to extend RLVR into messier, real-world domains like forecasting. One sticking point is that outcome-based reinforcement learning for forecasting must learn from binary, delayed, and noisy rewards, a regime where standard fine-tuning is brittle. We show that outcome-only online RL on a 14B model can match frontier-scale accuracy and surpass it in calibration and hypothetical prediction market betting by adapting two leading algorithms, Group-Relative Policy Optimisation (GRPO) and ReMax, to the forecasting setting. Our adaptations remove per-question variance scaling in GRPO, apply baseline-subtracted advantages in ReMax, hydrate training with 100k temporally consistent synthetic questions, and introduce lightweight guard-rails that penalise gibberish, non-English responses and missing rationales, enabling a single stable pass over 110k events. Scaling ReMax to 110k questions and ensembling seven predictions yields a 14B model that matches frontier baseline o1 on accuracy on our holdout set (Brier = 0.193, p = 0.23) while beating it in calibration (ECE = 0.042, p < 0.001). A simple trading rule turns this calibration edge into $127 of hypothetical profit versus $92 for o1 (p = 0.037). This demonstrates that refined RLVR methods can convert small-scale LLMs into potentially economically valuable forecasting tools, with implications for scaling this to larger models.

Read in full here:

Where Next?

Popular Ai topics Top

New
First poster: CommunityNews
Imagine you’re sitting at a casino’s poker table. Someone has explained the basic rules to you, but you’ve never played before and don’t ...
New
First poster: bot
DeepMind AI predicts incoming rainfall with high accuracy. Having flexed its muscles in predicting kidney injury, toppling Go champions ...
New
CommunityNews
DeepMind’s New AI With a Memory Outperforms Algorithms 25 Times Its Size. DeepMind’s model, with just 7 billion parameters, outperformed...
New
First poster: bot
How a robot is being used in the highly specialised business of creating flavours.
New
New
First poster: CommunityNews
OpenJourney is a Text-to-Image AI model which has the goal of bringing an open source equivalent to Midjourney to the people. It is curre...
New
alvinkatojr
This was/is a great read that counters the common “woe is me” fear of AI. Author knows his stuff and breaks down the 8 fallacies tied to...
New
First poster: gflashner
Google’s openly available Gemma collection of AI models has reached a milestone: over 150 million downloads. Omar Sanseviero, a developer...
New
gfqdjb
With all the AI buzz around coding assistants, and being a bit concerned about being dependent on third-party cloud providers here, I dec...
New

Other popular topics Top

PragmaticBookshelf
Brace yourself for a fun challenge: build a photorealistic 3D renderer from scratch! In just a couple of weeks, build a ray tracer that r...
New
siddhant3030
I’m thinking of buying a monitor that I can rotate to use as a vertical monitor? Also, I want to know if someone is using it for program...
New
PragmaticBookshelf
Tailwind CSS is an exciting new CSS framework that allows you to design your site by composing simple utility classes to create complex e...
New
Exadra37
Oh just spent so much time on this to discover now that RancherOS is in end of life but Rancher is refusing to mark the Github repo as su...
New
mafinar
Crystal recently reached version 1. I had been following it for awhile but never got to really learn it. Most languages I picked up out o...
New
rustkas
Intensively researching Erlang books and additional resources on it, I have found that the topic of using Regular Expressions is either c...
New
AstonJ
Saw this on TikTok of all places! :lol: Anyone heard of them before? Lite:
New
New
New
First poster: AstonJ
Jan | Rethink the Computer. Jan turns your computer into an AI machine by running LLMs locally on your computer. It’s a privacy-focus, l...
New