CommunityNews

ProgramBench: Can Language Models Rebuild Programs From Scratch?

Turning ideas into full software projects from scratch has become a popular use case for language models. Agents are being deployed to seed, maintain, and grow codebases over extended periods with minimal human oversight. Such settings require models to make high-level software architecture decisions. However, existing benchmarks measure focused, limited tasks such as fixing a single bug or developing a single, specified feature. We therefore introduce ProgramBench to measure the ability of software engineering agents to develop software holisitically. In ProgramBench, given only a program and its documentation, agents must architect and implement a codebase that matches the reference executable’s behavior. End-to-end behavioral tests are generated via agent-driven fuzzing, enabling evaluation without prescribing implementation structure. Our 200 tasks range from compact CLI tools to widely used software such as FFmpeg, SQLite, and the PHP interpreter. We evaluate 9 LMs and find that none fully resolve any task, with the best model passing 95% of tests on only 3% of tasks. Models favor monolithic, single-file implementations that diverge sharply from human-written code.

Read in full here:

View thread on forum

0 2 0

2026-05-08 16:38:47 UTC

Where Next?

View thread on forum

Home AI>In The News

0 2 0

Last post

Popular Ai topics

AI>In The News

Nvidia Announces A100 80GB GPU for AI

NVIDIA Doubles Down: Announces A100 80GB GPU, Supercharging World’s Most Powerful GPU for AI Supercomputing. SC20—NVIDIA today unveiled ...

nvidianews.nvidia.com

#nvidia

0 1351 1

2020-11-19 00:28:58 UTC

New

AI>In The News

Google AI tool can help patients identify skin conditions

Google has unveiled a tool that uses artificial intelligence to help spot skin, hair and nail conditions, based on images uploaded by pat...

bbc.co.uk

#google

0 1339 0

2021-05-20 19:24:41 UTC

New

AI>In The News

DeepMind’s AI helps untangle the mathematics of knots

DeepMind’s AI helps untangle the mathematics of knots. The machine-learning techniques could benefit other areas of maths that involve l...

nature.com

#deepmind #mathematics

0 1052 0

2021-12-11 05:49:46 UTC

New

AI>In The News

Making Things Think – AI Book

Making Things Think: How AI and Deep Learning Power the Products We Use — Holloway. AI now shapes our lives, yet few people know how mac...

holloway.com

#book

0 1473 0

2022-07-07 23:21:31 UTC

New

AI>In The News

Replit's In-Browser Coding AI

Ghostwriter - Code faster with AI. An AI pair programmer that helps you write better code, faster.

replit.com

#coding #browser

0 895 0

2022-11-02 00:32:52 UTC

New

AI>In The News

AI could already be conscious. Are we ready for it?

With a leap in the evolution of large language models, some leading thinkers are questioning whether AI might become sentient

bbc.com

6 831 7

2025-06-19 04:40:45 UTC

New

AI>In The News

Developer survey shows trust in AI coding tools is falling as usage rises

“AI solutions that are almost right, but not quite” lead to more debugging work.

arstechnica.com

#coding #developer #survey

11 956 9

2025-08-20 15:35:32 UTC

New

AI>In The News

Read That F*cking Code!

Stop vibe-coding blindly! Why reading AI-generated code is crucial in 2025. Avoid security flaws, architectural decay, and knowledge loss...

etsd.tech

#code

3 607 3

2025-08-12 20:59:43 UTC

New

AI>In The News

Impeccable: Design skills for AI coding tools

1 skill, 17 commands, and curated anti-patterns for impeccable frontend design. Works with Cursor, Claude Code, Gemini CLI, and Codex CLI...

impeccable.style

#coding #design

0 1 0

2026-01-15 17:17:29 UTC

New

AI>In The News

AI didn't delete your database, you did

Last week, a tweet went viral showing a guy claiming that a Cursor/Claude agent deleted his company’s production database. We watched fro...

idiallo.com

#database

0 157 2

2026-05-10 23:36:42 UTC

New

Other popular topics

General Dev>Dev Chat

What dev-related stuff have you been up to?

Reading something? Working on something? Planning something? Changing jobs even!? If you’re up for sharing, please let us know what you’...

#community

1063 23050 405

2026-05-25 12:34:11 UTC

New

General Dev>Hardware

Moonlander Keyboard (Mechanical) (Ergonomic) (Split) (Ortholinear)

Bought the Moonlander mechanical keyboard. Cherry Brown MX switches. Arms and wrists have been hurting enough that it’s time I did someth...

#hardware /keyboards #moonlander #mechanical-keyboards #ortholinear #ergonomic

212 17779 90

2021-07-13 15:33:55 UTC

New

General Dev>Dev Chat

Which language or framework do you want to learn next?

Curious to know which languages and frameworks you’re all thinking about learning next :upside_down_face: Perhaps if there’s enough peop...

#community #learning

243 6639 97

2025-12-01 07:17:12 UTC

New

General Dev>Hardware

Poll: Which keyboard layout do you use?

poll poll Be sure to check out @Dusty’s article posted here: An Introduction to Alternative Keyboard Layouts It’s one of the best write-...

colemakmods.github.io

#polls /keyboards

10 6048 11

2020-10-31 23:12:33 UTC

New

Backend>Questions

Erlang's not installing on macOS Big Sur "You are natively building Erlang/OTP for a later version of MacOSX than current version"

Just done a fresh install of macOS Big Sur and on installing Erlang I am getting: asdf install erlang 23.1.2 Configure failed. checking ...

#macos /erlang #big-sur #asdf

10 6212 8

2021-01-16 12:33:23 UTC

New

Frontend>Learning Resources

Modern CSS with Tailwind

Tailwind CSS is an exciting new CSS framework that allows you to design your site by composing simple utility classes to create complex e...

pragprog.com

#pragprog /tailwind #published-book /book-modern-css-with-tailwind

12 5813 4

2021-05-13 14:50:23 UTC

New

General Dev>Code Editors

Doom-Emacs: Can't find emacs in your PATH

If you get Can't find emacs in your PATH when trying to install Doom Emacs on your Mac you… just… need to install Emacs first! :lol: bre...

#macos /emacs #doom-emacs

4 5837 0

2022-02-04 00:32:03 UTC

New

Community>In The Spotlight

Spotlight: Mike Riley (Author) Interview and AMA!

Author Spotlight Mike Riley @mriley This month, we turn the spotlight on Mike Riley, author of Portable Python Projects. Mike’s book ...

#author-spotlight /python #iot /book-portable-python-projects #internet-of-things

62 7035 19

2022-06-09 14:01:01 UTC

New

General Dev>In The News

Jan: An open source alternative to ChatGPT that runs on the desktop

Jan | Rethink the Computer. Jan turns your computer into an AI machine by running LLMs locally on your computer. It’s a privacy-focus, l...

jan.ai

#desktop #chatgpt

4 5652 4

2024-03-29 08:42:30 UTC

New

Backend>Questions

Psql: error: connection to server on socket "/tmp/.s.PGSQL.5432" failed: No such file or directory

If you’re getting errors like this: psql: error: connection to server on socket “/tmp/.s.PGSQL.5432” failed: No such file or directory ...

#macos /rails /postgresql

1 5553 1

2024-10-17 02:03:48 UTC

New

AI>In The News

My Homelab AI Dev Platform

AI>In The News

Fusion - turns your prompt into a small multi-model deliberation

AI>In The News

Anthropic’s Safety Superpower

AI>In The News

CrankGPT — Local Human-powered AI

AI>In The News

GLM-5.2 is Fully Open, Frontier Intelligence Belongs to Everyone

AI>In The News

Making Claude a chemist

AI>In The News

Opensource AI Must Win

AI>In The News

Our response to the US ban on Fable 5 and Mythos 5

AI>In The News

Tech Things: There is a massive shadow hanging over this Fable thing

AI>In The News

Claude Fable 5 and Claude Mythos 5

AI>In The News

AI In The News ❯

Latest on Devtalk

Grails v7.0.12 released!

Backend>Official News

Quarkus 3.33.2.1 and 3.27.4.1 released!

Backend>Official News

Stop Killing Games fails to secure EU law despite 1.3M signatures - Dexerto

Game Dev>In The News

Building a Plugin System for Tolgee Without a Runtime, Storage, or Shared JS Context | Tolgee

General Dev>In The News

Quarkus 3.20.6.2 released!

Backend>Official News

My Homelab AI Dev Platform

AI>In The News

Tinywind — Pixel Pirate Sailing Game

Frontend>In The News

TimescaleDB Compression: Hypercore and Columnar Storage with up to 98% Ratio in PostgreSQL

Backend>In The News

A backdoor in a LinkedIn job offer

General Dev>In The News

I Love the Computer

General Dev>In The News

Thinking Elixir 308 - Elixir Goes Gradually Typed

Backend>Blogs/Talks

Porting Match Morphosis To WASM

Frontend>In The News

Apple Foundation Models

macOS>In The News

Fusion - turns your prompt into a small multi-model deliberation

AI>In The News

Anthropic’s Safety Superpower

AI>In The News

Devtalk ❯

We ❤️ helpful members!

We reward our most helpful members via our MOTM scheme - by giving away a whopping 25 books per year!

Sub Categories:

We're in Beta

About us Mission Statement See our Roadmap

ProgramBench: Can Language Models Rebuild Programs From Scratch?

CommunityNews

ProgramBench: Can Language Models Rebuild Programs From Scratch?

Where Next?

Popular Ai topics

Nvidia Announces A100 80GB GPU for AI

Google AI tool can help patients identify skin conditions

DeepMind’s AI helps untangle the mathematics of knots

Making Things Think – AI Book

Replit's In-Browser Coding AI

AI could already be conscious. Are we ready for it?

Developer survey shows trust in AI coding tools is falling as usage rises

Read That F*cking Code!

Impeccable: Design skills for AI coding tools

AI didn't delete your database, you did

Other popular topics

What dev-related stuff have you been up to?

Moonlander Keyboard (Mechanical) (Ergonomic) (Split) (Ortholinear)

Which language or framework do you want to learn next?

Poll: Which keyboard layout do you use?

Erlang's not installing on macOS Big Sur "You are natively building Erlang/OTP for a later version of MacOSX than current version"

Modern CSS with Tailwind

Doom-Emacs: Can't find emacs in your PATH

Spotlight: Mike Riley (Author) Interview and AMA!

Jan: An open source alternative to ChatGPT that runs on the desktop

Psql: error: connection to server on socket "/tmp/.s.PGSQL.5432" failed: No such file or directory

Sponsor Spotlight

AI>In The News

Latest on Devtalk

We ❤️ helpful members!

Devtalk Sponsors

Categories:

Sub Categories:

Popular Portals

Devtalk Sponsors

We're in Beta