xiji2646-netizen

xiji2646-netizen

Has anyone else noticed Opus 4.6 getting worse at coding tasks?

I’ve been tracking this for the past two weeks and wanted to see if others are experiencing the same thing.

BridgeBench (independent hallucination benchmark) now shows Opus 4.6 at #10 with a 33% fabrication rate — down from #2 with 83.3% accuracy just weeks ago. That’s one in three responses containing fabricated information.

The root cause appears to be two default changes:

  • Effort level default dropped from “high” to “medium” (March 3, 2026)

  • Adaptive thinking introduced (Feb 9, 2026) — under medium effort, some turns get zero reasoning tokens

An AMD exec analyzed 6,852 sessions and measured a 67% reasoning depth drop. @om_patel5’s A/B test (same prompt, 4.6 vs 4.5) showed 4.6 failing 5/5 while 4.5 passed 5/5.

What’s working for me:


export CLAUDE_CODE_EFFORT_LEVEL=max

export CLAUDE_CODE_DISABLE_ADAPTIVE_THINKING=1

Or just /effort max\ per session.

Some devs are switching back to Opus 4.5 entirely (`claude-opus-4-5-20251101`).

Curious: are you seeing the same patterns? Have the env vars helped? Anyone found other workarounds?

References: BridgeBench (bridgebench.ai/hallucination), GitHub Issue #42796

Most Liked

peterchancc

peterchancc

Saw on X/Twitter that some people are also experiencing the same issue.

Where Next?

Popular Ai topics Top

AstonJ
I have a feeling we’re going to see a lot of threads about DeepSeek, so have put up a portal for it :003:
New
AstonJ
Tucker: You’ve had complaints from one programmer who said you steal people’s stuff without paying them and he winded up being murdered.
New
nix0097
Hello I hope you’re doing well. I’m looking to develop a custom chatbot and would love to collaborate with you on this project. The chat...
New
xiji2646-netizen
Woke up to this today: Claude Code’s complete source code exposed via npm source map. Not a snippet. All 512,000 lines. 1,900 TypeScript ...
New
xiji2646-netizen
I’ve been tracking this for the past two weeks and wanted to see if others are experiencing the same thing. BridgeBench (independent hal...
New
xiji2646-netizen
DeepSeek just released V4 and the pricing is hard to ignore. V4-Flash: $0.28/M output tokens. V4-Pro: $2.19/M. Both with 1M token contex...
New
xiji2646-netizen
DeepSeek officially launched deepseek-v4-flash and deepseek-v4-pro in preview on April 24, 2026. The legacy routes (deepseek-chat, deepse...
New
xiji2646-netizen
Been using Claude Code on Max for a few weeks and kept running into rate limits by early afternoon. Same tasks as colleagues who weren’t ...
New
xiji2646-netizen
Been using the skills repo (77K stars, #1 on GitHub Trending recently) with Claude Code. Sharing what worked and what did not. What work...
New
xiji2646-netizen
Google shipped 3.5 Flash at I/O 2026. The “budget” Flash model now beats 3.1 Pro on coding and tool-calling benchmarks. Key numbers (fro...
New

Other popular topics Top

PragmaticBookshelf
Stop developing web apps with yesterday’s tools. Today, developers are increasingly adopting Clojure as a web-development platform. See f...
New
New
Maartz
Hi folks, I don’t know if I saw this here but, here’s a new programming language, called Roc Reminds me a bit of Elm and thus Haskell. ...
New
foxtrottwist
A few weeks ago I started using Warp a terminal written in rust. Though in it’s current state of development there are a few caveats (tab...
New
PragmaticBookshelf
Author Spotlight Mike Riley @mriley This month, we turn the spotlight on Mike Riley, author of Portable Python Projects. Mike’s book ...
New
New
PragmaticBookshelf
Build modern server-driven web applications using htmx. Whatever programming language you use, you’ll write less (and cleaner) code. ...
New
CommunityNews
A Brief Review of the Minisforum V3 AMD Tablet. Update: I have created an awesome-minisforum-v3 GitHub repository to list information fo...
New
AstonJ
This is cool! DEEPSEEK-V3 ON M4 MAC: BLAZING FAST INFERENCE ON APPLE SILICON We just witnessed something incredible: the largest open-s...
New
CommunityNews
Open-source implementation of the classic GTA engine now running directly in your browser. Experience the reVC technology demo on DOS.Zon...
New