CommunityNews

Language Models Need Sleep

Transformer-based large language models are increasingly used for long-horizon tasks; however, their attention mechanism scales poorly with context length. To handle this, we study a sleep-like consolidation mechanism in which a model periodically converts recent context into persistent fast weights before clearing its key-value cache. During sleep, the model performs $N$ offline recurrent passes over the accumulated context and updates the fast weights in its state-space model (SSM) blocks through a learned local rule. During inference, this shifts extra computation to sleep while preserving the latency of wake-time prediction. We test our method on controlled synthetic tasks, including cellular automata and multi-hop graph retrieval, as well as a realistic math reasoning task, on which a regular transformer as well as SSM-attention hybrid models fail. We then show that increasing sleep duration $N$ for our models improves performance, with the largest gains on examples that require deeper reasoning.

Read in full here:

View thread on forum

0 1 0

2026-05-27 03:29:51 UTC

Where Next?

View thread on forum

Home AI>In The News

0 1 0

Last post

Popular Ai topics

AI>In The News

How artificial intelligence may be making you buy things

bbc.co.uk

0 1465 0

2020-11-09 16:49:22 UTC

New

AI>In The News

DeepMind’s New AI with a Memory Outperforms Algorithms 25 Times Its Size

DeepMind’s New AI With a Memory Outperforms Algorithms 25 Times Its Size. DeepMind’s model, with just 7 billion parameters, outperformed...

singularityhub.com

#algorithms #deepmind

5 1229 1

2021-12-27 15:25:21 UTC

New

AI>In The News

Nvidia R&D chief on how AI is improving chip design

Getting a glimpse into Nvidia’s R&D has become a regular feature of the spring GTC conference with Bill Dally, chief scientist and se...

hpcwire.com

#nvidia #design

0 1188 0

2022-04-20 14:08:47 UTC

New

AI>In The News

OpenJourney: Midjourney, but Open Source

OpenJourney is a Text-to-Image AI model which has the goal of bringing an open source equivalent to Midjourney to the people. It is curre...

open-journey.github.io

0 2151 0

2023-01-26 03:25:56 UTC

New

AI>In The News

DeepSeek (671B) running on a cluster of 8 Mac Mini Pros with 64GB RAM each

This is cool! DEEPSEEK-V3 ON M4 MAC: BLAZING FAST INFERENCE ON APPLE SILICON We just witnessed something incredible: the largest open-s...

#ai #macs /deepseek

0 6695 1

2025-01-29 18:43:37 UTC

New

AI>In The News

AI: Where in the Loop Should Humans Go?

SRE Fred Hebert provides you with a list of questions to ask about potential AI solutions, including where humans should be involved.

honeycomb.io

/elixir /erlang /go

5 774 3

2025-03-18 18:04:30 UTC

New

AI>In The News

The many fallacies of 'AI won't take your job, but someone using AI will'

This was/is a great read that counters the common “woe is me” fear of AI. Author knows his stuff and breaks down the 8 fallacies tied to...

open.substack.com

#ai #artificial-intelligence

8 1189 5

2025-05-15 12:00:05 UTC

New

AI>In The News

Google’s Gemma AI models surpass 150M downloads

Google’s openly available Gemma collection of AI models has reached a milestone: over 150 million downloads. Omar Sanseviero, a developer...

techcrunch.com

#google

4 697 3

2025-06-17 13:29:11 UTC

New

AI>In The News

Claude Code is My Computer | Peter Steinberger

I run Claude Code with --dangerously-skip-permissions flag, giving it full system access. Let me show you a new way of approaching comput...

steipete.me

#code

0 914 0

2025-06-04 04:26:28 UTC

New

AI>In The News

Claude-code - native LSP support

Claude Code is an agentic coding tool that lives in your terminal, understands your codebase, and helps you code faster by executing rout...

github.com

#code #changelog #claude

0 1 0

2025-12-23 13:53:12 UTC

New

Other popular topics

General Dev>Learning Resources

Seven More Languages in Seven Weeks

Learn from the award-winning programming series that inspired the Elixir language, and go on a step-by-step journey through the most impo...

pragprog.com

#pragprog /elixir /julia /lua #published-book #factor /elm #minikanren /idris /book-seven-more-languages-in-seven-weeks

4 5862 0

2020-04-29 21:59:54 UTC

New

Backend>Questions

Can someone explain the -t option/flag in docker run command?

I know that -t flag is used along with -i flag for getting an interactive shell. But I cannot digest what the man page for docker run com...

#docker

7 10261 2

2020-09-01 07:19:16 UTC

New

Community>Journals

Programming Erlang Book Club

My first contact with Erlang was about 2 years ago when I used RabbitMQ, which is written in Erlang, for my job. This made me curious and...

/erlang /book-programming-erlang-2nd-edition #book-club

195 6815 95

2025-02-16 20:22:17 UTC

New

General Dev>Hardware

Seen any cool new keyboards?

We have a thread about the keyboards we have, but what about nice keyboards we come across that we want? If you have seen any that look n...

/keyboards #mechanical-keyboards

49 5910 39

2025-05-10 22:54:44 UTC

New

General Dev>Hardware

Keyboard thock (sound)

I’ve been hearing quite a lot of comments relating to the sound of a keyboard, with one of the most desirable of these called ‘thock’, he...

/keyboards #mechanical-keyboards

14 11197 8

2020-11-11 11:59:23 UTC

New

General Dev>Hardware

Planck vs Preonic vs Subatomic (Keyboards)

I ended up cancelling my Moonlander order as I think it’s just going to be a bit too bulky for me. I think the Planck and the Preonic (o...

/keyboards #mechanical-keyboards #ortholinear #planck #preonic

105 17596 47

2021-05-28 21:32:35 UTC

New

Backend>Learning Resources

Programming Phoenix LiveView

Build highly interactive applications without ever leaving Elixir, the way the experts do. Let LiveView take care of performance, scalabi...

pragprog.com

#pragprog /elixir /phoenix #published-book /book-programming-phoenix-liveview

83 11955 26

2026-03-24 14:01:24 UTC

New

Backend>Learning Resources

Programming WebRTC

Use WebRTC to build web applications that stream media and data in real time directly from one user to another, all in the browser. ...

pragprog.com

#pragprog #published-book /js #webrtc /book-programming-webrtc

27 6969 6

2021-11-20 19:03:04 UTC

New

Backend>Chat

Data Structures and Algorithms with Elixir

This is going to be a long an frequently posted thread. While talking to a friend of mine who has taken data structure and algorithm cou...

/elixir #algorithms #data-structures

108 11869 31

2024-11-14 02:14:00 UTC

New

Community>In The Spotlight

Spotlight: Mike Riley (Author) Interview and AMA!

Author Spotlight Mike Riley @mriley This month, we turn the spotlight on Mike Riley, author of Portable Python Projects. Mike’s book ...

#author-spotlight /python #iot /book-portable-python-projects #internet-of-things

62 7035 19

2022-06-09 14:01:01 UTC

New

AI>In The News

Language Models Need Sleep

AI>In The News

Stack Overflow’s forum is dead thanks to AI, but the company’s still kicking... thanks to AI

AI>In The News

Greg Brockman: Inside the 72 Hours That Almost Killed OpenAI

AI>In The News

Constraint Decay: The Fragility of LLM Agents in Backend Code Generation

AI>In The News

Reasonix — DeepSeek-native AI coding agent

AI>In The News

All the bugs they found

AI>In The News

No slop grenade

AI>In The News

A new generation of ads for the AI era of Search

AI>In The News

Hating AI is good, actually

AI>In The News

AI is just unauthorised plagiarism at a bigger scale

AI>In The News

AI In The News ❯

Latest on Devtalk

Netherlands blocks US takeover of vital digital supplier

General Dev>In The News

Language Models Need Sleep

AI>In The News

Stack Overflow’s forum is dead thanks to AI, but the company’s still kicking... thanks to AI

AI>In The News

Modern Blu-ray drives can now rip GameCube, Wii, and Xbox 360 games to PC — third-party OmniDrive firmware unlocks game rips from physical media on select players

Game Dev>In The News

PHP's Oddities

Backend>In The News

Neoclassical C++: segmented iterators revisited (1)

Backend>In The News

Reverse engineering circuitry in a Spacelab computer from 1980

General Dev>In The News

Thinking Elixir 305 - Eleven Minutes to Mayhem

Backend>Blogs/Talks

Scammers are abusing an internal Microsoft account to send spam links

General Dev>In The News

‘Fuck you, Bambu’: How one private message could change the face of 3D printing

General Dev>In The News

Amazon Web Services - Four Years and Out

General Dev>In The News

Key, in sight - a guide, of sorts, to keyboard customization

General Dev>In The News

I keep bouncing off the Scheme language

General Dev>In The News

(Now Go Bang!) The C64 Dead Test Font

Backend>In The News

React Native v0.86.0-rc.2 released!

Hybrid>Official News

Devtalk ❯

We ❤️ helpful members!

We reward our most helpful members via our MOTM scheme - by giving away a whopping 25 books per year!

Sub Categories:

We're in Beta

About us Mission Statement See our Roadmap

Language Models Need Sleep

CommunityNews

Language Models Need Sleep

Where Next?

Popular Ai topics

How artificial intelligence may be making you buy things

DeepMind’s New AI with a Memory Outperforms Algorithms 25 Times Its Size

Nvidia R&D chief on how AI is improving chip design

OpenJourney: Midjourney, but Open Source

DeepSeek (671B) running on a cluster of 8 Mac Mini Pros with 64GB RAM each

AI: Where in the Loop Should Humans Go?

The many fallacies of 'AI won't take your job, but someone using AI will'

Google’s Gemma AI models surpass 150M downloads

Claude Code is My Computer | Peter Steinberger

Claude-code - native LSP support

Other popular topics

Seven More Languages in Seven Weeks

Can someone explain the -t option/flag in docker run command?

Programming Erlang Book Club

Seen any cool new keyboards?

Keyboard thock (sound)

Planck vs Preonic vs Subatomic (Keyboards)

Programming Phoenix LiveView

Programming WebRTC

Data Structures and Algorithms with Elixir

Spotlight: Mike Riley (Author) Interview and AMA!

Sponsor Spotlight

AI>In The News

Latest on Devtalk

We ❤️ helpful members!

Devtalk Sponsors

Categories:

Sub Categories:

Popular Portals

Devtalk Sponsors

We're in Beta