
CommunityNews
Reinforcement Pre-Training
In this work, we introduce Reinforcement Pre-Training (RPT) as a new scaling paradigm for large language models and reinforcement learning (RL). Specifically, we reframe next-token prediction as a reasoning task trained using RL, where it receives verifiable rewards for correctly predicting the next token for a given context. RPT offers a scalable method to leverage vast amounts of text data for general-purpose RL, rather than relying on domain-specific annotated answers. By incentivizing the capability of next-token reasoning, RPT significantly improves the language modeling accuracy of predicting the next tokens. Moreover, RPT provides a strong pre-trained foundation for further reinforcement fine-tuning. The scaling curves show that increased training compute consistently improves the next-token prediction accuracy. The results position RPT as an effective and promising scaling paradigm to advance language model pre-training.
Read in full here:
Popular Ai topics









Other popular topics









Categories:
Sub Categories:
Popular Portals
- /elixir
- /rust
- /wasm
- /ruby
- /erlang
- /phoenix
- /keyboards
- /rails
- /js
- /python
- /security
- /go
- /swift
- /vim
- /clojure
- /java
- /haskell
- /emacs
- /svelte
- /onivim
- /typescript
- /crystal
- /c-plus-plus
- /tailwind
- /kotlin
- /gleam
- /react
- /flutter
- /elm
- /ocaml
- /ash
- /vscode
- /opensuse
- /centos
- /php
- /deepseek
- /html
- /zig
- /scala
- /debian
- /nixos
- /lisp
- /textmate
- /sublime-text
- /agda
- /react-native
- /kubuntu
- /arch-linux
- /revery
- /ubuntu
- /spring
- /manjaro
- /django
- /diversity
- /nodejs
- /lua
- /julia
- /slackware
- /c
- /markdown