xiji2646-netizen

xiji2646-netizen

Has anyone else noticed Opus 4.6 getting worse at coding tasks?

I’ve been tracking this for the past two weeks and wanted to see if others are experiencing the same thing.

BridgeBench (independent hallucination benchmark) now shows Opus 4.6 at #10 with a 33% fabrication rate — down from #2 with 83.3% accuracy just weeks ago. That’s one in three responses containing fabricated information.

The root cause appears to be two default changes:

  • Effort level default dropped from “high” to “medium” (March 3, 2026)

  • Adaptive thinking introduced (Feb 9, 2026) — under medium effort, some turns get zero reasoning tokens

An AMD exec analyzed 6,852 sessions and measured a 67% reasoning depth drop. @om_patel5’s A/B test (same prompt, 4.6 vs 4.5) showed 4.6 failing 5/5 while 4.5 passed 5/5.

What’s working for me:


export CLAUDE_CODE_EFFORT_LEVEL=max

export CLAUDE_CODE_DISABLE_ADAPTIVE_THINKING=1

Or just /effort max\ per session.

Some devs are switching back to Opus 4.5 entirely (`claude-opus-4-5-20251101`).

Curious: are you seeing the same patterns? Have the env vars helped? Anyone found other workarounds?

References: BridgeBench (bridgebench.ai/hallucination), GitHub Issue #42796

Most Liked

peterchancc

peterchancc

Saw on X/Twitter that some people are also experiencing the same issue.

Where Next?

Popular Ai topics Top

AstonJ
I saw this clip of Elon Musk talking about AI and wondered what others think - are you looking forward to AI? Or do you find it concerning?
New
AstonJ
This video about multi-agent AI is a really nice watch - it only took them a few million tries to master certain strategies - doing much ...
#ai
New
AstonJ
Loads of news stories about DeepSeek here in the last few days, no surprise as it’s been making headlines across the world! Currently a h...
New
nix0097
Hello I hope you’re doing well. I’m looking to develop a custom chatbot and would love to collaborate with you on this project. The chat...
New
xiji2646-netizen
Woke up to this today: Claude Code’s complete source code exposed via npm source map. Not a snippet. All 512,000 lines. 1,900 TypeScript ...
New
xiji2646-netizen
I’ve been following Seedance 2.0 since ByteDance dropped it in February, and after a few weeks of testing through third-party APIs, I wan...
New
xiji2646-netizen
Anthropic launched Claude Design this week and there’s a lot of noise about the generation demos and the stock reaction. But the feature ...
New
xiji2646-netizen
Been using Claude Code on Max for a few weeks and kept running into rate limits by early afternoon. Same tasks as colleagues who weren’t ...
New
xiji2646-netizen
Been using the skills repo (77K stars, #1 on GitHub Trending recently) with Claude Code. Sharing what worked and what did not. What work...
New
xiji2646-netizen
Codex mobile in the ChatGPT app https://techcrunch.com/wp-content/uploads/2026/05/App-view.png?resize=1200,675) Codex shipped a batch o...
New

Other popular topics Top

Devtalk
Hello Devtalk World! Please let us know a little about who you are and where you’re from :nerd_face:
New
PragmaticBookshelf
Free and open source software is the default choice for the technologies that run our world, and it’s built and maintained by people like...
New
Exadra37
Please tell us what is your preferred monitor setup for programming(not gaming) and why you have chosen it. Does your monitor have eye p...
New
siddhant3030
I’m thinking of buying a monitor that I can rotate to use as a vertical monitor? Also, I want to know if someone is using it for program...
New
AstonJ
You might be thinking we should just ask who’s not using VSCode :joy: however there are some new additions in the space that might give V...
New
Rainer
My first contact with Erlang was about 2 years ago when I used RabbitMQ, which is written in Erlang, for my job. This made me curious and...
New
PragmaticBookshelf
Rust is an exciting new programming language combining the power of C with memory safety, fearless concurrency, and productivity boosters...
New
AstonJ
This looks like a stunning keycap set :orange_heart: A LEGENDARY KEYBOARD LIVES ON When you bought an Apple Macintosh computer in the e...
New
Maartz
Hi folks, I don’t know if I saw this here but, here’s a new programming language, called Roc Reminds me a bit of Elm and thus Haskell. ...
New
NewsBot
Node.js v22.14.0 has been released. Link: Release 2025-02-11, Version 22.14.0 'Jod' (LTS), @aduh95 · nodejs/node · GitHub
New