CommunityNews

CommunityNews

Outcome-based Reinforcement Learning to Predict the Future

Reinforcement learning with verifiable rewards (RLVR) has boosted math and coding in large language models, yet there has been little effort to extend RLVR into messier, real-world domains like forecasting. One sticking point is that outcome-based reinforcement learning for forecasting must learn from binary, delayed, and noisy rewards, a regime where standard fine-tuning is brittle. We show that outcome-only online RL on a 14B model can match frontier-scale accuracy and surpass it in calibration and hypothetical prediction market betting by adapting two leading algorithms, Group-Relative Policy Optimisation (GRPO) and ReMax, to the forecasting setting. Our adaptations remove per-question variance scaling in GRPO, apply baseline-subtracted advantages in ReMax, hydrate training with 100k temporally consistent synthetic questions, and introduce lightweight guard-rails that penalise gibberish, non-English responses and missing rationales, enabling a single stable pass over 110k events. Scaling ReMax to 110k questions and ensembling seven predictions yields a 14B model that matches frontier baseline o1 on accuracy on our holdout set (Brier = 0.193, p = 0.23) while beating it in calibration (ECE = 0.042, p < 0.001). A simple trading rule turns this calibration edge into $127 of hypothetical profit versus $92 for o1 (p = 0.037). This demonstrates that refined RLVR methods can convert small-scale LLMs into potentially economically valuable forecasting tools, with implications for scaling this to larger models.

Read in full here:

Where Next?

Popular Ai topics Top

New
First poster: CommunityNews
Artificial intelligence and machine learning exist on the back of a lot of hard work from humans. Alongside the scientists, there are th...
#ai
New
First poster: bot
An ancient language has defied decryption for 100 years. Can AI crack the code?. Machine learning can translate between two known langua...
New
First poster: bot
A research group has taught AI to magnetically wrangle a high-powered stream of plasma used for fusion research — but wait! Put away your...
New
First poster: bot
Autonomous Drones Challenge Human Champions in First “Fair” Race. Watching robots operate with speed and precision is always impressive,...
New
First poster: bot
You can’t solve AI security problems with more AI. One of the most common proposed solutions to prompt injection attacks (where an AI la...
New
First poster: bot
AI video editor can recognize objects, people, and sounds, allowing editing via text.
New
CommunityNews
GitHub Copilot litigation · Joseph Saveri Law Firm & Matthew Butterick. GitHub Copilot litigation
New
First poster: AstonJ
I presented an invited keynote at the AI Engineer World’s Fair in San Francisco this week. This is my third time speaking at the event—he...
New
CommunityNews
Openly available AI tool creates steerable 3D-like video, but requires serious GPU muscle.
New

Other popular topics Top

Exadra37
I am thinking in building or buy a desktop computer for programing, both professionally and on my free time, and my choice of OS is Linux...
New
New
AstonJ
We have a thread about the keyboards we have, but what about nice keyboards we come across that we want? If you have seen any that look n...
New
New
AstonJ
If you are experiencing Rails console using 100% CPU on your dev machine, then updating your development and test gems might fix the issu...
New
PragmaticBookshelf
Build highly interactive applications without ever leaving Elixir, the way the experts do. Let LiveView take care of performance, scalabi...
New
hilfordjames
There appears to have been an update that has changed the terminology for what has previously been known as the Taskbar Overflow - this h...
New
New
New
PragmaticBookshelf
Explore the power of Ash Framework by modeling and building the domain for a real-world web application. Rebecca Le @sevenseacat and ...
New