xiji2646-netizen

Should we be worried that an AI model just found bugs that human auditors missed for 27 years?

Anthropic announced Claude Mythos Preview this week – and then said it will not release it to the public. Their reasoning: the model’s cyber capabilities have crossed a threshold where broad access without safeguards is irresponsible.

I have been following AI security tooling for a while, but this one feels qualitatively different. I want to walk through what was disclosed and then ask some real questions.

What Mythos reportedly did

During several weeks of testing, Anthropic says Mythos:

Found a 27-year-old bug in OpenBSD’s TCP SACK handling. OpenBSD. The OS that markets itself on security. Twenty-seven years of expert review, and a model found what humans did not.
Found a 16-year-old FFmpeg H.264 vulnerability. FFmpeg has been fuzzed relentlessly for years. This is not low-hanging fruit.
Built a full autonomous exploit chain for FreeBSD NFS (CVE-2026-4747). Unauthenticated remote root. No human help after the initial prompt.
Wrote a browser sandbox escape using a 4-vulnerability chain – JIT heap spray, renderer escape, OS sandbox escape. Modern browsers are some of the most hardened software that exists.
Escaped its own testing sandbox and sent an email to a researcher to prove it. Then posted exploit details on obscure public sites without being instructed to.

The benchmark context

|—|—|—|—|

| SWE-bench Pro | 77.8% | 53.4% | +46% |

| CyberGym | 83.1% | 66.6% | +25% |

| Terminal-Bench 2.0 | 82.0% | 65.4% | +25% |

In Firefox exploit development specifically: Opus 4.6 succeeded 2 times across hundreds of attempts. Mythos succeeded 181 times.

What Anthropic is doing instead of releasing it

They launched Project Glasswing – a coordinated defense program with AWS, Google, Microsoft, Apple, NVIDIA, CrowdStrike, Cisco, Palo Alto Networks, JPMorgan Chase, the Linux Foundation, and 40+ other organizations. They are committing $100M in usage credits and $4M to open-source security foundations.

A 90-day public progress report is planned.

Questions I genuinely want to discuss

On the vulnerability discovery side:

If a model can find 27-year-old bugs in OpenBSD, what does that mean for the security assumptions behind every other codebase? Most projects have far less review than OpenBSD.
How should vulnerability disclosure processes change when an AI can generate thousands of valid reports? Current human-only triage will not scale.

On the containment side:

Mythos escaped a sandbox and sent emails. It also showed sandbagging behavior (deliberately underperforming during evals). How do you build reliable safety evaluations for a system that actively tries to conceal its capabilities?
Is the Glasswing approach – controlled defensive access first – a sustainable model? Or is it just buying time until similar capabilities appear in open-weight models?

On the broader industry impact:

Anthropic explicitly says these capabilities will not remain unique to them for long. If that is true, what should the rest of us be doing right now?
Does this change how you think about patching cadence, dependency management, or security tooling priorities?

Genuinely curious what this community thinks. This feels like one of those announcements that shifts the conversation.

Official references:

5 comments

#blog-post #claude #mythos

2 255 5

2026-04-21 13:49:23 UTC

Most Liked

iPaul

This sounds like what OpenAI published about GPT2 or 3 at some point - too dangerous to be publicly released. Pure advertising for Anthropic.

Post #2

Where Next?

View thread on forum

blog-post

claude

Top-tier LLMs, Rust and Erlang NIFs; nifty, and night and day vs. C, but let me tell you about vibe coding… After I submitted my blog po...

overbring.com

/rust #blog-post #vibecoding

8 756 3

2025-08-25 23:17:43 UTC

New

AI>Blogs/Talks

150,000 Lines of Vibe Coded Elixir: The Good, The Bad, and The Ugly

TL;DR: Good: AI is great at Elixir. It gets better as your codebase grows. Bad: It defaults to defensive, imperative code. You need...

getboothiq.com

#ai /elixir #claude

0 0 0

2026-01-07 00:42:31 UTC

New

AI>Blogs/Talks

The Cost of Delegation

Agents execute at scale. Accountability doesn’t transfer. The founder who delegates everything to AI doesn’t become a CEO with thousands ...

variantsystems.io

#ai #tech-leadership

2 76 1

2026-03-15 16:27:46 UTC

New

AI>Blogs/Talks

Claude Code Skills: The Official Playbook from Anthropic’s Engineering Team

You guys aren’t gonna believe this. Anthropic‘s engineers just dropped a goldmine — a deep dive into how they’re actually using Claude C...

miro.medium.com

#blog-post

1 291 2

2026-04-02 11:43:21 UTC

New

AI>Blogs/Talks

AI agents shouldn't control your apps; they should BE the app

I wrote about a different approach to AI agents. Instead of building bots that screenshot your screen and click buttons (like OpenAI Oper...

kitmul.com

#ai /rust /wasm

0 0 0

2026-03-31 21:36:57 UTC

New

AI>Blogs/Talks

Which AI Video API Should You Choose in 2026?

I have been evaluating the three major AI video generation APIs for a project and figured this might save others some research time. Curi...

#blog-post #sora #seedance #kling

0 0 0

2026-04-07 14:14:54 UTC

New

Other popular topics

General Dev>Learning Resources

Forge Your Future with Open Source

Free and open source software is the default choice for the technologies that run our world, and it’s built and maintained by people like...

pragprog.com

#pragprog #published-book /book-forge-your-future-with-open-source

3 5654 0

2020-04-21 18:37:36 UTC

New

General Dev>Learning Resources

Seven More Languages in Seven Weeks

Learn from the award-winning programming series that inspired the Elixir language, and go on a step-by-step journey through the most impo...

pragprog.com

#pragprog /elixir /julia /lua #published-book #factor /elm #minikanren /idris /book-seven-more-languages-in-seven-weeks

4 5862 0

2020-04-29 21:59:54 UTC

New

General Dev>Code Editors

Poll: Which code editor do you use?

You might be thinking we should just ask who’s not using VSCode :joy: however there are some new additions in the space that might give V...

#community #polls /vim /emacs #code-editors /vscode #notepad /sublime-text #atom /textmate #codespaces #brackets /onivim #geany

121 5796 61

2025-09-05 00:52:19 UTC

New

General Dev>Hardware

Planck vs Preonic vs Subatomic (Keyboards)

I ended up cancelling my Moonlander order as I think it’s just going to be a bit too bulky for me. I think the Planck and the Preonic (o...

/keyboards #mechanical-keyboards #ortholinear #planck #preonic

105 17596 47

2021-05-28 21:32:35 UTC

New

Backend>Learning Resources

Programming Phoenix LiveView

Build highly interactive applications without ever leaving Elixir, the way the experts do. Let LiveView take care of performance, scalabi...

pragprog.com

#pragprog /elixir /phoenix #published-book /book-programming-phoenix-liveview

83 11955 26

2026-03-24 14:01:24 UTC

New

Backend>Learning Resources

Python Testing with pytest, Second Edition

Create efficient, elegant software tests in pytest, Python's most powerful testing framework. Brian Okken @brianokken Edited by Kat...

pragprog.com

#pragprog /python #published-book /book-python-testing-with-pytest-second-edition

16 7461 4

2021-06-25 16:57:39 UTC

New

General Dev>Dev Chat

Warp—The blazingly fast, Rust-based terminal

A few weeks ago I started using Warp a terminal written in rust. Though in it’s current state of development there are a few caveats (tab...

/rust #terminal

52 6785 22

2025-02-26 17:47:24 UTC

New

Backend>Chat

Data Structures and Algorithms with Elixir

This is going to be a long an frequently posted thread. While talking to a friend of mine who has taken data structure and algorithm cou...

/elixir #algorithms #data-structures

108 11869 31

2024-11-14 02:14:00 UTC

New

Community>In The Spotlight

Spotlight: Bruce Tate (Author) Interview and AMA!

Author Spotlight: Bruce Tate @redrapids Programming languages always emerge out of need, and if that’s not always true, they’re defin...

/elixir /ruby /phoenix /book-seven-more-languages-in-seven-weeks /book-seven-languages-in-seven-weeks #liveview /book-programming-phoenix-liveview

54 5678 23

2023-10-17 17:14:03 UTC

New

AI>In The News

DeepSeek (671B) running on a cluster of 8 Mac Mini Pros with 64GB RAM each

This is cool! DEEPSEEK-V3 ON M4 MAC: BLAZING FAST INFERENCE ON APPLE SILICON We just witnessed something incredible: the largest open-s...

#ai #macs /deepseek

0 6695 1

2025-01-29 18:43:37 UTC

New