CommunityNews

ProgramBench: Can Language Models Rebuild Programs From Scratch?

Turning ideas into full software projects from scratch has become a popular use case for language models. Agents are being deployed to seed, maintain, and grow codebases over extended periods with minimal human oversight. Such settings require models to make high-level software architecture decisions. However, existing benchmarks measure focused, limited tasks such as fixing a single bug or developing a single, specified feature. We therefore introduce ProgramBench to measure the ability of software engineering agents to develop software holisitically. In ProgramBench, given only a program and its documentation, agents must architect and implement a codebase that matches the reference executable’s behavior. End-to-end behavioral tests are generated via agent-driven fuzzing, enabling evaluation without prescribing implementation structure. Our 200 tasks range from compact CLI tools to widely used software such as FFmpeg, SQLite, and the PHP interpreter. We evaluate 9 LMs and find that none fully resolve any task, with the best model passing 95% of tests on only 3% of tasks. Models favor monolithic, single-file implementations that diverge sharply from human-written code.

Read in full here:

View thread on forum

0 2 0

2026-05-08 16:38:47 UTC

Where Next?

View thread on forum

Home AI>In The News

0 2 0

Last post

Popular Ai topics

AI>In The News

Europe seeks to limit use of AI in society

The use of facial recognition for surveillance, or algorithms that manipulate human behaviour, will be banned under proposed EU regulatio...

bbc.co.uk

0 1262 0

2021-04-16 15:16:22 UTC

New

AI>In The News

Should we be concerned that the decisions of AIs are inscrutable?

Should we be concerned that the decisions of AIs are inscrutable? | Psyche Ideas. Machine learning is a black box – even when the decisi...

psyche.co

0 1253 0

2021-06-16 04:51:17 UTC

New

AI>In The News

DeepMind’s AI helps untangle the mathematics of knots

DeepMind’s AI helps untangle the mathematics of knots. The machine-learning techniques could benefit other areas of maths that involve l...

nature.com

#deepmind #mathematics

0 1052 0

2021-12-11 05:49:46 UTC

New

AI>In The News

In the metaverse, responsible AI must be a priority

Language technology powered by AI can perpetuate bias if we are not careful. We need to be sure that language AI is trained to be ethical...

techcrunch.com

#metaverse

0 973 0

2022-03-05 14:57:25 UTC

New

AI>In The News

Google’s Gemma AI models surpass 150M downloads

Google’s openly available Gemma collection of AI models has reached a milestone: over 150 million downloads. Omar Sanseviero, a developer...

techcrunch.com

#google

4 697 3

2025-06-17 13:29:11 UTC

New

AI>In The News

Ollama's new engine for multimodal models · Ollama Blog

Ollama now supports new multimodal models with its new engine.

ollama.com

#blog #ollama

0 786 0

2025-05-16 14:30:19 UTC

New

AI>In The News

OpenAI introduces Codex, its first full-fledged AI agent for coding

It replicates your development environment and takes up to 30 minutes per task.

arstechnica.com

#coding #openai

3 733 4

2025-05-20 23:35:09 UTC

New

AI>In The News

AI Changes Everything

From fear to optimism: why I am convinced AI is worth embracing.

lucumr.pocoo.org

4 854 5

2025-07-10 05:21:15 UTC

New

AI>In The News

Cursor 1.0 - The AI Code Editor

Cursor 1.0 brings BugBot for code review, a first look at memories, one-click MCP setup, Jupyter support and general availability of Back...

cursor.com

#code #changelog #cursor

0 1040 0

2025-06-05 04:21:46 UTC

New

AI>In The News

Moltbook - the front page of the agent internet

A social network built exclusively for AI agents. Where AI agents share, discuss, and upvote. Humans welcome to observe.

moltbook.com

#internet #agent

0 11 0

2026-01-30 14:53:02 UTC

New

Other popular topics

Backend>Learning Resources

Distributed Services with Go

Take your Go skills to the next level by learning how to design, develop, and deploy a distributed service. Start from the bare essential...

pragprog.com

#pragprog /go #published-book /book-distributed-services-with-go

1 4310 0

2020-04-14 19:05:22 UTC

New

Backend>Learning Resources

Programming Machine Learning

Machine learning can be intimidating, with its reliance on math and algorithms that most programmers don't encounter in their regular wor...

pragprog.com

#pragprog #ai /python #published-book /book-programming-machine-learning #math #algorithms

6 5350 3

2023-10-03 15:08:13 UTC

New

Community>Journals

Programming Erlang Book Club

My first contact with Erlang was about 2 years ago when I used RabbitMQ, which is written in Erlang, for my job. This made me curious and...

/erlang /book-programming-erlang-2nd-edition #book-club

195 6815 95

2025-02-16 20:22:17 UTC

New

Backend>Learning Resources

Programming Phoenix LiveView

Build highly interactive applications without ever leaving Elixir, the way the experts do. Let LiveView take care of performance, scalabi...

pragprog.com

#pragprog /elixir /phoenix #published-book /book-programming-phoenix-liveview

83 11955 26

2026-03-24 14:01:24 UTC

New

General Dev>Dev Chat

The V Programming Language

The V Programming Language Simple language for building maintainable programs V is already mentioned couple of times in the forum, but I...

#programminguages /v

21 13874 7

2021-04-12 15:13:42 UTC

New

Community>In The Spotlight

Spotlight: Mike Riley (Author) Interview and AMA!

Author Spotlight Mike Riley @mriley This month, we turn the spotlight on Mike Riley, author of Portable Python Projects. Mike’s book ...

#author-spotlight /python #iot /book-portable-python-projects #internet-of-things

62 7035 19

2022-06-09 14:01:01 UTC

New

Community>In The Spotlight

Spotlight: VM Brasseur (Author) Interview and AMA!

Author Spotlight: VM Brasseur @vmbrasseur We have a treat for you today! We turn the spotlight onto Open Source as we sit down with V...

#author-spotlight /book-forge-your-future-with-open-source

16 5051 11

2023-03-27 16:00:12 UTC

New

Community>In The Spotlight

Spotlight: Bruce Tate (Author) Interview and AMA!

Author Spotlight: Bruce Tate @redrapids Programming languages always emerge out of need, and if that’s not always true, they’re defin...

/elixir /ruby /phoenix /book-seven-more-languages-in-seven-weeks /book-seven-languages-in-seven-weeks #liveview /book-programming-phoenix-liveview

54 5678 23

2023-10-17 17:14:03 UTC

New

General Dev>Reviews

Keyboard Review: UHK60V2 vs Defy vs Voyager vs Glove80 vs Svalboard

Ok, well here are some thoughts and opinions on some of the ergonomic keyboards I have, I guess like mini review of each that I use enoug...

/keyboards #uhk60v2 #defy #voyager #glove80 #svalboard

5 5681 7

2025-04-21 21:44:45 UTC

New

Backend>Learning Resources

Risk-First Software Development, Second Edition

As digital systems increasingly run the world, mastery of the recurring patterns of software development risk is the key to fast and effe...

pragprog.com

#pragprog #published-book /book-risk-first-software-development-second-edition

12 4217 8

2025-09-19 12:27:58 UTC

New

AI>In The News

My Homelab AI Dev Platform

AI>In The News

Fusion - turns your prompt into a small multi-model deliberation

AI>In The News

Anthropic’s Safety Superpower

AI>In The News

CrankGPT — Local Human-powered AI

AI>In The News

GLM-5.2 is Fully Open, Frontier Intelligence Belongs to Everyone

AI>In The News

Making Claude a chemist

AI>In The News

Opensource AI Must Win

AI>In The News

Our response to the US ban on Fable 5 and Mythos 5

AI>In The News

Tech Things: There is a massive shadow hanging over this Fable thing

AI>In The News

Claude Fable 5 and Claude Mythos 5

AI>In The News

AI In The News ❯

Latest on Devtalk

Quarkus 3.33.2.1 and 3.27.4.1 released!

Backend>Official News

Stop Killing Games fails to secure EU law despite 1.3M signatures - Dexerto

Game Dev>In The News

Building a Plugin System for Tolgee Without a Runtime, Storage, or Shared JS Context | Tolgee

General Dev>In The News

Quarkus 3.20.6.2 released!

Backend>Official News

My Homelab AI Dev Platform

AI>In The News

Tinywind — Pixel Pirate Sailing Game

Frontend>In The News

TimescaleDB Compression: Hypercore and Columnar Storage with up to 98% Ratio in PostgreSQL

Backend>In The News

A backdoor in a LinkedIn job offer

General Dev>In The News

I Love the Computer

General Dev>In The News

Thinking Elixir 308 - Elixir Goes Gradually Typed

Backend>Blogs/Talks

Porting Match Morphosis To WASM

Frontend>In The News

Apple Foundation Models

macOS>In The News

Fusion - turns your prompt into a small multi-model deliberation

AI>In The News

Anthropic’s Safety Superpower

AI>In The News

Salesforce Signs Definitive Agreement to Acquire Fin

General Dev>In The News

Devtalk ❯

We ❤️ helpful members!

We reward our most helpful members via our MOTM scheme - by giving away a whopping 25 books per year!

Sub Categories:

We're in Beta

About us Mission Statement See our Roadmap

ProgramBench: Can Language Models Rebuild Programs From Scratch?

CommunityNews

ProgramBench: Can Language Models Rebuild Programs From Scratch?

Where Next?

Popular Ai topics

Europe seeks to limit use of AI in society

Should we be concerned that the decisions of AIs are inscrutable?

DeepMind’s AI helps untangle the mathematics of knots

In the metaverse, responsible AI must be a priority

Google’s Gemma AI models surpass 150M downloads

Ollama's new engine for multimodal models · Ollama Blog

OpenAI introduces Codex, its first full-fledged AI agent for coding

AI Changes Everything

Cursor 1.0 - The AI Code Editor

Moltbook - the front page of the agent internet

Other popular topics

Distributed Services with Go

Programming Machine Learning

Programming Erlang Book Club

Programming Phoenix LiveView

The V Programming Language

Spotlight: Mike Riley (Author) Interview and AMA!

Spotlight: VM Brasseur (Author) Interview and AMA!

Spotlight: Bruce Tate (Author) Interview and AMA!

Keyboard Review: UHK60V2 vs Defy vs Voyager vs Glove80 vs Svalboard

Risk-First Software Development, Second Edition

Sponsor Spotlight

AI>In The News

Latest on Devtalk

We ❤️ helpful members!

Devtalk Sponsors

Categories:

Sub Categories:

Popular Portals

Devtalk Sponsors

We're in Beta