CommunityNews

LLMs Corrupt Your Documents When You Delegate

Large Language Models (LLMs) are poised to disrupt knowledge work, with the emergence of delegated work as a new interaction paradigm (e.g., vibe coding). Delegation requires trust - the expectation that the LLM will faithfully execute the task without introducing errors into documents. We introduce DELEGATE-52 to study the readiness of AI systems in delegated workflows. DELEGATE-52 simulates long delegated workflows that require in-depth document editing across 52 professional domains, such as coding, crystallography, and music notation. Our large-scale experiment with 19 LLMs reveals that current models degrade documents during delegation: even frontier models (Gemini 3.1 Pro, Claude 4.6 Opus, GPT 5.4) corrupt an average of 25% of document content by the end of long workflows, with other models failing more severely. Additional experiments reveal that agentic tool use does not improve performance on DELEGATE-52, and that degradation severity is exacerbated by document size, length of interaction, or presence of distractor files. Our analysis shows that current LLMs are unreliable delegates: they introduce sparse but severe errors that silently corrupt documents, compounding over long interaction.

Read in full here:

View thread on forum

#llms

0 2 0

2026-05-13 04:19:44 UTC

Where Next?

View thread on forum

llms

Home AI>In The News

#llms

0 2 0

Last post

Popular Ai topics

AI>In The News

Nvidia Uses AI to Slash Bandwidth on Video Calls

NVIDIA Uses AI to Slash Bandwidth on Video Calls. NVIDIA Research has invented a way to use AI to dramatically reduce video call bandwid...

petapixel.com

#video #nvidia

1 966 0

2020-10-09 15:35:49 UTC

New

AI>In The News

Combating Anti-Blackness in the AI Community

In response to a national and international awakening on the issues of anti-Blackness and systemic discrimination, we have penned this pi...

devinguillory.com

#community /diversity

0 1449 1

2021-01-31 21:13:15 UTC

New

AI>In The News

AI Is Discovering Patterns in Pure Mathematics That Have Never Been Seen Before

AI Is Discovering Patterns in Pure Mathematics That Have Never Been Seen Before. We can add suggesting and proving mathematical theorems...

sciencealert.com

#mathematics

0 1139 0

2021-12-11 23:07:15 UTC

New

AI>In The News

DeepMind AI learns simple physics like a baby

DeepMind AI learns simple physics like a baby. Neural network could be a step towards programs for studying how human infants learn.

nature.com

#deepmind

0 964 0

2022-07-11 23:16:33 UTC

New

AI>In The News

New AI assistant can browse, search, and use web apps like a human

Adept’s ACT-1 has learned how to automate complex UI tasks in web apps using an AI model.

arstechnica.com

#apps #web

0 968 0

2022-09-16 02:29:18 UTC

New

AI>In The News

Klarna CEO says the company stopped hiring a year ago because AI 'can already do all of the jobs'

Klarna CEO says the company stopped hiring a year ago because AI ‘can already do all of the jobs’. Klarna CEO Sebastian Siemiatkowski sa...

businessinsider.com

/erlang #jobs #klarna

2 814 2

2024-12-24 16:46:22 UTC

New

AI>In The News

The Great Displacement Is Already Well Underway

It’s Not a Hypothetical, I’ve Already Lost My Job to AI For The Last Year

shawnfromportland.substack.com

6 723 4

2025-06-09 01:55:59 UTC

New

AI>In The News

Ollama's new engine for multimodal models · Ollama Blog

Ollama now supports new multimodal models with its new engine.

ollama.com

#blog #ollama

0 786 0

2025-05-16 14:30:19 UTC

New

AI>In The News

Switching to Claude Code + VSCode inside Docker

Why I decided to ditch Cursor and switch to running Claude Code in an isolated environment + diy guide!

timsh.org

#docker #code /vscode #claude

0 849 2

2026-04-21 12:51:23 UTC

New

AI>In The News

Crush: The glamourous AI coding agent for your favourite terminal 💘

The glamourous AI coding agent for your favourite terminal :heart_with_arrow: - charmbracelet/crush

github.com

#terminal #coding #github #crush

0 1121 0

2025-07-31 01:27:58 UTC

New

Other popular topics

Backend>Learning Resources

Programming Machine Learning

Machine learning can be intimidating, with its reliance on math and algorithms that most programmers don't encounter in their regular wor...

pragprog.com

#pragprog #ai /python #published-book /book-programming-machine-learning #math #algorithms

6 5350 3

2023-10-03 15:08:13 UTC

New

General Dev>Learning Resources

Seven More Languages in Seven Weeks

Learn from the award-winning programming series that inspired the Elixir language, and go on a step-by-step journey through the most impo...

pragprog.com

#pragprog /elixir /julia /lua #published-book #factor /elm #minikanren /idris /book-seven-more-languages-in-seven-weeks

4 5862 0

2020-04-29 21:59:54 UTC

New

General Dev>Hardware

Moonlander Keyboard (Mechanical) (Ergonomic) (Split) (Ortholinear)

Bought the Moonlander mechanical keyboard. Cherry Brown MX switches. Arms and wrists have been hurting enough that it’s time I did someth...

#hardware /keyboards #moonlander #mechanical-keyboards #ortholinear #ergonomic

212 17779 90

2021-07-13 15:33:55 UTC

New

General Dev>Hardware

Poll: Which keyboard layout do you use?

poll poll Be sure to check out @Dusty’s article posted here: An Introduction to Alternative Keyboard Layouts It’s one of the best write-...

colemakmods.github.io

#polls /keyboards

10 6048 11

2020-10-31 23:12:33 UTC

New

General Dev>Code Editors

Dendron: a personal knowledge management tool on top of VSCode

/vscode #visual-studio-code

30 8077 9

2021-05-05 12:15:29 UTC

New

General Dev>Dev Chat

The V Programming Language

The V Programming Language Simple language for building maintainable programs V is already mentioned couple of times in the forum, but I...

#programminguages /v

21 13874 7

2021-04-12 15:13:42 UTC

New

Backend>Learning Resources

Programming WebRTC

Use WebRTC to build web applications that stream media and data in real time directly from one user to another, all in the browser. ...

pragprog.com

#pragprog #published-book /js #webrtc /book-programming-webrtc

27 6969 6

2021-11-20 19:03:04 UTC

New

Community>In The Spotlight

Spotlight: Rebecca Skinner (Author) Interview and AMA!

Author Spotlight Rebecca Skinner @RebeccaSkinner Welcome to our latest author spotlight, where we sit down with Rebecca Skinner, auth...

#author-spotlight /haskell /book-effective-haskell

106 11719 28

2022-11-16 10:29:37 UTC

New

Android>Questions

Clipboard readtext not working in android webview

Inside our android webview app, we are trying to paste the copied content from another app eg (notes) using navigator.clipboard.readtext ...

#android #clipboard

1 5651 0

2022-09-27 18:52:03 UTC

New

Backend>Learning Resources

Ash Framework

Explore the power of Ash Framework by modeling and building the domain for a real-world web application. Rebecca Le @sevenseacat and ...

pragprog.com

#pragprog /elixir #published-book /ash /book-ash-framework

15 7555 9

2025-02-06 12:19:21 UTC

New

AI>In The News

AI Agent - Build custom plugins without writing any code

AI>In The News

What is AI good at?

AI>In The News

Real businesses built live by Michii, an AI autonomous company

AI>In The News

AI didn’t replace our Security Team, it multiplied it

AI>In The News

Visuali.io: AI Image Generator & Photo Editor

AI>In The News

Agents Are Invention Machines

AI>In The News

Claude Code: Anatomy of a Misfeature

AI>In The News

Kimi K3 - Intelligence, Performance & Price Analysis

AI>In The News

Introducing LM Studio Bionic: the AI agent for open models

AI>In The News

Grok Build is open source

AI>In The News

AI In The News ❯

Latest on Devtalk

Kotlin v2.4.20-Beta2 released!

Backend>Official News

AssemblyScript v0.28.20 released!

Frontend>Official News

Apple has decided to compete for creativity app users

macOS>In The News

AI Agent - Build custom plugins without writing any code

AI>In The News

Free Ink · An open ecosystem for e-readers

General Dev>In The News

'VPNs are lawful technical tools,' says EU Court in landmark Anne Frank copyright ruling

General Dev>In The News

React v19.2.8, v19.1.9 and v19.0.8 released!

Frontend>Official News

Gleam v1.18.0-rc1 released!

Backend>Official News

React Native v0.87.0-rc.2 and v0.87.0-rc.1 released!

Hybrid>Official News

What is AI good at?

AI>In The News

Real businesses built live by Michii, an AI autonomous company

AI>In The News

The ACLU Is Arming Lawyers to Expose State Surveillance Secrets

General Dev>In The News

What is the Semantic Layer?

General Dev>In The News

Lossless Model Compression Experiment

General Dev>In The News

SleeperGem: RubyGems supply chain attack targets dormant maintainer accounts

General Dev>In The News

Devtalk ❯

We ❤️ helpful members!

We reward our most helpful members via our MOTM scheme - by giving away a whopping 25 books per year!

Sub Categories:

We're in Beta

About us Mission Statement See our Roadmap

LLMs Corrupt Your Documents When You Delegate

CommunityNews

LLMs Corrupt Your Documents When You Delegate

Where Next?

Popular Ai topics

Nvidia Uses AI to Slash Bandwidth on Video Calls

Combating Anti-Blackness in the AI Community

AI Is Discovering Patterns in Pure Mathematics That Have Never Been Seen Before

DeepMind AI learns simple physics like a baby

New AI assistant can browse, search, and use web apps like a human

Klarna CEO says the company stopped hiring a year ago because AI 'can already do all of the jobs'

The Great Displacement Is Already Well Underway

Ollama's new engine for multimodal models · Ollama Blog

Switching to Claude Code + VSCode inside Docker

Crush: The glamourous AI coding agent for your favourite terminal 💘

Other popular topics

Programming Machine Learning

Seven More Languages in Seven Weeks

Moonlander Keyboard (Mechanical) (Ergonomic) (Split) (Ortholinear)

Poll: Which keyboard layout do you use?

Dendron: a personal knowledge management tool on top of VSCode

The V Programming Language

Programming WebRTC

Spotlight: Rebecca Skinner (Author) Interview and AMA!

Clipboard readtext not working in android webview

Ash Framework

Sponsor Spotlight

AI>In The News

Latest on Devtalk

We ❤️ helpful members!

Devtalk Sponsors

Categories:

Sub Categories:

Popular Portals

Devtalk Sponsors

We're in Beta