CommunityNews

CommunityNews

The Curse of Recursion: Training on Generated Data Makes Models Forget

The Curse of Recursion: Training on Generated Data Makes Models Forget.
Stable Diffusion revolutionised image creation from descriptive text. GPT-2,
GPT-3(.5) and GPT-4 demonstrated astonishing performance across a variety of
language tasks. ChatGPT introduced such language models to the general public.
It is now clear that large language models (LLMs) are here to stay, and will
bring about drastic change in the whole ecosystem of online text and images. In
this paper we consider what the future might hold. What will happen to GPT-{n}
once LLMs contribute much of the language found online? We find that use of
model-generated content in training causes irreversible defects in the
resulting models, where tails of the original content distribution disappear.
We refer to this effect as Model Collapse and show that it can occur in
Variational Autoencoders, Gaussian Mixture Models and LLMs. We build
theoretical intuition behind the phenomenon and portray its ubiquity amongst
all learned generative models. We demonstrate that it has to be taken seriously
if we are to sustain the benefits of training from large-scale data scraped
from the web. Indeed, the value of data collected about genuine human
interactions with systems will be increasingly valuable in the presence of
content generated by LLMs in data crawled from the Internet.

Read in full here:

This thread was posted by one of our members via one of our news source trackers.

Where Next?

Popular General Dev topics Top

First poster: iPaul
TOKYO (Kyodo) – Japan’s government plans to encourage firms to let their employees choose to work four days a week instead of five, aimin...
New
First poster: Maartz
This Keyboard Lets People Type So Fast It’s Banned From Typing Competitions. A new peripheral lets you keep typing without ever lifting ...
New
First poster: OvermindDL1
You can now buy a 100W USB-C cable with a built-in power meter. They’re just $20 on Amazon, and they work!
New
First poster: cpgo
8 reasons to ditch Chrome and switch to Firefox. Chrome may dominate, but Firefox is a known name among browsers for a reason. Whether y...
New
CommunityNews
GitHub - ItzCrazyKns/Perplexica: Perplexica is an AI-powered search engine. It is an Open source alternative to Perplexity AI. Perplexic...
New
First poster: AstonJ
Truly independent web browser. Contribute to LadybirdBrowser/ladybird development by creating an account on GitHub.
New
First poster: dyowee
olmOCR is an open-source tool for converting PDFs to text with high accuracy, preserving reading order and supporting tables, equations, ...
New
First poster: alvinkatojr
There are countless articles why developers should not focus on Frameworks too much and instead learn to understand the underlying langua...
New
CommunityNews
Rendering Action Mailer emails with Phlex components and layouts: Clean, Composable, and Completely Ruby - Blog post by Camillo Visini
New
New

Other popular topics Top

Exadra37
Please tell us what is your preferred monitor setup for programming(not gaming) and why you have chosen it. Does your monitor have eye p...
New
New
New
AstonJ
I have seen the keycaps I want - they are due for a group-buy this week but won’t be delivered until October next year!!! :rofl: The Ser...
New
AstonJ
This looks like a stunning keycap set :orange_heart: A LEGENDARY KEYBOARD LIVES ON When you bought an Apple Macintosh computer in the e...
New
gagan7995
API 4 Path: /user/following/ Method: GET Description: Returns the list of all names of people whom the user follows Response [ { ...
New
First poster: joeb
The File System Access API with Origin Private File System. WebKit supports new API that makes it possible for web apps to create, open,...
New
AstonJ
Was just curious to see if any were around, found this one: I got 51/100: Not sure if it was meant to buy I am sure at times the b...
New
New
mindriot
Ok, well here are some thoughts and opinions on some of the ergonomic keyboards I have, I guess like mini review of each that I use enoug...
New