CommunityNews

CommunityNews

Solving a Million-Step LLM Task with Zero Errors

LLMs have achieved remarkable breakthroughs in reasoning, insights, and tool use, but chaining these abilities into extended processes at the scale of those routinely executed by humans, organizations, and societies has remained out of reach. The models have a persistent error rate that prevents scale-up: for instance, recent experiments in the Towers of Hanoi benchmark domain showed that the process inevitably becomes derailed after at most a few hundred steps. Thus, although LLM research is often still benchmarked on tasks with relatively few dependent logical steps, there is increasing attention on the ability (or inability) of LLMs to perform long range tasks. This paper describes MAKER, the first system that successfully solves a task with over one million LLM steps with zero errors, and, in principle, scales far beyond this level. The approach relies on an extreme decomposition of a task into subtasks, each of which can be tackled by focused microagents. The high level of modularity resulting from the decomposition allows error correction to be applied at each step through an efficient multi-agent voting scheme. This combination of extreme decomposition and error correction makes scaling possible. Thus, the results suggest that instead of relying on continual improvement of current LLMs, massively decomposed agentic processes (MDAPs) may provide a way to efficiently solve problems at the level of organizations and societies.

Read in full here:

Where Next?

Popular Ai topics Top

First poster: bot
Kicking off another busy Spring GPU Technology Conference for NVIDIA, this morning the graphics and accelerator designer is announcing th...
New
CommunityNews
Artificial intelligence is now smart enough to write tracks that earn streaming service royalties.
New
First poster: CommunityNews
Getting a glimpse into Nvidia’s R&D has become a regular feature of the spring GTC conference with Bill Dally, chief scientist and se...
New
First poster: bot
You can’t solve AI security problems with more AI. One of the most common proposed solutions to prompt injection attacks (where an AI la...
New
First poster: bot
Chri Besenbruch, CEO of Deep Render, sees many problems with the way video compression standards are developed today. He thinks they aren...
New
First poster: brennan
It’s Not a Hypothetical, I’ve Already Lost My Job to AI For The Last Year
New
First poster: AstonJ
From fear to optimism: why I am convinced AI is worth embracing.
New
First poster: TimButterfield
A new agentic IDE that works alongside you from prototype to production
New
CommunityNews
Comparison and ranking the performance of over 100 AI models (LLMs) across key metrics including intelligence, price, performance and spe...
New
CommunityNews
Reproducibility is a bedrock of scientific progress. However, it’s remarkably difficult to get reproducible results out of large language...
New

Other popular topics Top

AstonJ
If it’s a mechanical keyboard, which switches do you have? Would you recommend it? Why? What will your next keyboard be? Pics always w...
New
PragmaticBookshelf
Write Elixir tests that you can be proud of. Dive into Elixir’s test philosophy and gain mastery over the terminology and concepts that u...
New
DevotionGeo
I know that these benchmarks might not be the exact picture of real-world scenario, but still I expect a Rust web framework performing a ...
New
AstonJ
We have a thread about the keyboards we have, but what about nice keyboards we come across that we want? If you have seen any that look n...
New
AstonJ
If you are experiencing Rails console using 100% CPU on your dev machine, then updating your development and test gems might fix the issu...
New
New
New
First poster: AstonJ
Jan | Rethink the Computer. Jan turns your computer into an AI machine by running LLMs locally on your computer. It’s a privacy-focus, l...
New
AstonJ
This is cool! DEEPSEEK-V3 ON M4 MAC: BLAZING FAST INFERENCE ON APPLE SILICON We just witnessed something incredible: the largest open-s...
New
PragmaticBookshelf
A concise guide to MySQL 9 database administration, covering fundamental concepts, techniques, and best practices. Neil Smyth MySQL...
New