CommunityNews

CommunityNews

Language Models Need Sleep

Transformer-based large language models are increasingly used for long-horizon tasks; however, their attention mechanism scales poorly with context length. To handle this, we study a sleep-like consolidation mechanism in which a model periodically converts recent context into persistent fast weights before clearing its key-value cache. During sleep, the model performs $N$ offline recurrent passes over the accumulated context and updates the fast weights in its state-space model (SSM) blocks through a learned local rule. During inference, this shifts extra computation to sleep while preserving the latency of wake-time prediction. We test our method on controlled synthetic tasks, including cellular automata and multi-hop graph retrieval, as well as a realistic math reasoning task, on which a regular transformer as well as SSM-attention hybrid models fail. We then show that increasing sleep duration $N$ for our models improves performance, with the largest gains on examples that require deeper reasoning.

Read in full here:

Where Next?

Popular Ai topics Top

First poster: CommunityNews
Now that DeepMind has taught AI to master the game of Go—and furthered its advantage in chess—they’ve turned their attention to another b...
New
First poster: CommunityNews
A simple algorithm that revolutionizes how neural networks approach language is now taking on image classification as well. It may not st...
New
First poster: CommunityNews
Getting a glimpse into Nvidia’s R&D has become a regular feature of the spring GTC conference with Bill Dally, chief scientist and se...
New
First poster: CommunityNews
Chat-bots are amazing these days! About a month ago LaMDA made the news when it apparently convinced an engineer at Google that it was se...
New
First poster: DevotionGeo
Voice synthesis PR stunt calls upon the dead to help sell an AI product.
New
AstonJ
This is cool! DEEPSEEK-V3 ON M4 MAC: BLAZING FAST INFERENCE ON APPLE SILICON We just witnessed something incredible: the largest open-s...
New
First poster: gflashner
Google’s openly available Gemma collection of AI models has reached a milestone: over 150 million downloads. Omar Sanseviero, a developer...
New
First poster: TimButterfield
A new agentic IDE that works alongside you from prototype to production
New
CommunityNews
Comparison and ranking the performance of over 100 AI models (LLMs) across key metrics including intelligence, price, performance and spe...
New
CommunityNews
But the reality is that 75% of the people on our engineering team lost their jobs here yesterday because of the brutal impact AI has had ...
New

Other popular topics Top

Devtalk
Hello Devtalk World! Please let us know a little about who you are and where you’re from :nerd_face:
New
PragmaticBookshelf
Design and develop sophisticated 2D games that are as much fun to make as they are to play. From particle effects and pathfinding to soci...
New
AstonJ
You might be thinking we should just ask who’s not using VSCode :joy: however there are some new additions in the space that might give V...
New
DevotionGeo
The V Programming Language Simple language for building maintainable programs V is already mentioned couple of times in the forum, but I...
New
AstonJ
If you get Can't find emacs in your PATH when trying to install Doom Emacs on your Mac you… just… need to install Emacs first! :lol: bre...
New
AstonJ
Was just curious to see if any were around, found this one: I got 51/100: Not sure if it was meant to buy I am sure at times the b...
New
AstonJ
If you want a quick and easy way to block any website on your Mac using Little Snitch simply… File > New Rule: And select Deny, O...
New
New
hilfordjames
There appears to have been an update that has changed the terminology for what has previously been known as the Taskbar Overflow - this h...
New
RobertRichards
Hair Salon Games for Girls Fun Girls Hair Saloon game is mainly developed for kids. This game allows users to select virtual avatars to ...
New