xiji2646-netizen

xiji2646-netizen

Has anyone else noticed Opus 4.6 getting worse at coding tasks?

I’ve been tracking this for the past two weeks and wanted to see if others are experiencing the same thing.

BridgeBench (independent hallucination benchmark) now shows Opus 4.6 at #10 with a 33% fabrication rate — down from #2 with 83.3% accuracy just weeks ago. That’s one in three responses containing fabricated information.

The root cause appears to be two default changes:

  • Effort level default dropped from “high” to “medium” (March 3, 2026)

  • Adaptive thinking introduced (Feb 9, 2026) — under medium effort, some turns get zero reasoning tokens

An AMD exec analyzed 6,852 sessions and measured a 67% reasoning depth drop. @om_patel5’s A/B test (same prompt, 4.6 vs 4.5) showed 4.6 failing 5/5 while 4.5 passed 5/5.

What’s working for me:


export CLAUDE_CODE_EFFORT_LEVEL=max

export CLAUDE_CODE_DISABLE_ADAPTIVE_THINKING=1

Or just /effort max\ per session.

Some devs are switching back to Opus 4.5 entirely (`claude-opus-4-5-20251101`).

Curious: are you seeing the same patterns? Have the env vars helped? Anyone found other workarounds?

References: BridgeBench (bridgebench.ai/hallucination), GitHub Issue #42796

Most Liked

peterchancc

peterchancc

Saw on X/Twitter that some people are also experiencing the same issue.

Where Next?

Popular Ai topics Top

AstonJ
I saw this clip of Elon Musk talking about AI and wondered what others think - are you looking forward to AI? Or do you find it concerning?
New
AstonJ
This video about multi-agent AI is a really nice watch - it only took them a few million tries to master certain strategies - doing much ...
#ai
New
AstonJ
Can you spot the AI generated person in the pic below? ▶ Spoiler Video here:
New
Eiji
Today, I tried to find some information and few times I not only got completely wrong answers, but even fake GitHub links … Every time I ...
#ai
New
AstonJ
I have a feeling we’re going to see a lot of threads about DeepSeek, so have put up a portal for it :003:
New
AstonJ
This is a very quick guide, you just need to: Download LM Studio: https://lmstudio.ai/ Click on search Type DeepSeek, then select the o...
New
AstonJ
AI has been a hot topic here on Devtalk recently, so along that theme: How useful do you think AI dev tools are right now and how useful ...
New
apoorv-2204
I’m reaching out to all software engineers, especially senior developers — I really want to hear your thoughts. I’ve always loved buildi...
New
kammy
Hi everyone! The other day I was having a debate with my friends about whether or not the top LLM models are “good at design.” I’d love ...
New
xiji2646-netizen
Anthropic shipped something called Dreaming for Managed Agents this week. It’s a scheduled background process that runs between sessions ...
New

Other popular topics Top

PragmaticBookshelf
Andy and Dave wrote this influential, classic book to help their clients create better software and rediscover the joy of coding. Almost ...
New
PragmaticBookshelf
Free and open source software is the default choice for the technologies that run our world, and it’s built and maintained by people like...
New
PragmaticBookshelf
Write Elixir tests that you can be proud of. Dive into Elixir’s test philosophy and gain mastery over the terminology and concepts that u...
New
AstonJ
This looks like a stunning keycap set :orange_heart: A LEGENDARY KEYBOARD LIVES ON When you bought an Apple Macintosh computer in the e...
New
dimitarvp
Small essay with thoughts on macOS vs. Linux: I know @Exadra37 is just waiting around the corner to scream at me “I TOLD YOU SO!!!” but I...
New
PragmaticBookshelf
Author Spotlight Jamis Buck @jamis This month, we have the pleasure of spotlighting author Jamis Buck, who has written Mazes for Prog...
New
PragmaticBookshelf
Explore the power of Ash Framework by modeling and building the domain for a real-world web application. Rebecca Le @sevenseacat and ...
New
RobertRichards
Hair Salon Games for Girls Fun Girls Hair Saloon game is mainly developed for kids. This game allows users to select virtual avatars to ...
New
mindriot
Ok, well here are some thoughts and opinions on some of the ergonomic keyboards I have, I guess like mini review of each that I use enoug...
New
CommunityNews
Open-source implementation of the classic GTA engine now running directly in your browser. Experience the reVC technology demo on DOS.Zon...
New