xiji2646-netizen
Has anyone else noticed Opus 4.6 getting worse at coding tasks?
I’ve been tracking this for the past two weeks and wanted to see if others are experiencing the same thing.
BridgeBench (independent hallucination benchmark) now shows Opus 4.6 at #10 with a 33% fabrication rate — down from #2 with 83.3% accuracy just weeks ago. That’s one in three responses containing fabricated information.
The root cause appears to be two default changes:
-
Effort level default dropped from “high” to “medium” (March 3, 2026)
-
Adaptive thinking introduced (Feb 9, 2026) — under medium effort, some turns get zero reasoning tokens
An AMD exec analyzed 6,852 sessions and measured a 67% reasoning depth drop. @om_patel5’s A/B test (same prompt, 4.6 vs 4.5) showed 4.6 failing 5/5 while 4.5 passed 5/5.
What’s working for me:
export CLAUDE_CODE_EFFORT_LEVEL=max
export CLAUDE_CODE_DISABLE_ADAPTIVE_THINKING=1
Or just /effort max\ per session.
Some devs are switching back to Opus 4.5 entirely (`claude-opus-4-5-20251101`).
Curious: are you seeing the same patterns? Have the env vars helped? Anyone found other workarounds?
References: BridgeBench (bridgebench.ai/hallucination), GitHub Issue #42796
Popular Ai topics
Other popular topics
Categories:
Sub Categories:
Popular Portals
- /elixir
- /rust
- /wasm
- /ruby
- /erlang
- /phoenix
- /keyboards
- /python
- /js
- /rails
- /security
- /go
- /swift
- /vim
- /clojure
- /java
- /emacs
- /haskell
- /svelte
- /onivim
- /typescript
- /kotlin
- /c-plus-plus
- /crystal
- /tailwind
- /react
- /gleam
- /ocaml
- /flutter
- /elm
- /vscode
- /ash
- /html
- /opensuse
- /zig
- /centos
- /deepseek
- /php
- /scala
- /react-native
- /lisp
- /sublime-text
- /textmate
- /nixos
- /debian
- /agda
- /django
- /deno
- /kubuntu
- /arch-linux
- /nodejs
- /spring
- /ubuntu
- /revery
- /manjaro
- /julia
- /lua
- /diversity
- /markdown
- /slackware









