xiji2646-netizen

xiji2646-netizen

Gemini 3.5 Flash launched today - quick breakdown for anyone running agent workloads

Google shipped 3.5 Flash at I/O 2026. The “budget” Flash model now beats 3.1 Pro on coding and tool-calling benchmarks.

Key numbers (from Google):

  • MCP Atlas (tool calling): 83.6% vs 3.1 Pro’s 78.2%
  • Terminal-Bench (coding): 76.2% vs 70.3%
  • Finance Agent v2: 57.9% vs 43.0%
  • 4x faster, ~40% cheaper than Pro
  • $1.50/M input, $9/M output, $0.15/M cached

Where it does NOT win:

  • Computer Use: not supported (GPT-5.5 only)
  • SWE-Bench Pro: Opus 4.7 still leads
  • Abstract reasoning: 3.1 Pro still edges it

My quick take on model routing:

  • Multi-tool agent loops → Flash
  • Heavy code refactoring → Opus 4.7
  • GUI automation → GPT-5.5

Anyone tested it on real agent workflows yet? Curious how the 4x speed claim holds up in practice.

Where Next?

Popular Ai topics Top

AstonJ
Watching any? Any favourites? :upside_down_face:
New
AstonJ
Can you spot the AI generated person in the pic below? ▶ Spoiler Video here:
New
AstonJ
I have a feeling we’re going to see a lot of threads about DeepSeek, so have put up a portal for it :003:
New
AstonJ
AI has been a hot topic here on Devtalk recently, so along that theme: How useful do you think AI dev tools are right now and how useful ...
New
apoorv-2204
I’m reaching out to all software engineers, especially senior developers — I really want to hear your thoughts. I’ve always loved buildi...
New
apoorv-2204
How are you using AI in my life? How the day to day life is changed around you? professional and in personal life? I it use for autocom...
#ai
New
xiji2646-netizen
I’ve been tracking this for the past two weeks and wanted to see if others are experiencing the same thing. BridgeBench (independent hal...
New
xiji2646-netizen
Google just dropped a significant Deep Research upgrade: collaborative planning, multi-tool orchestration (MCP servers, Code Execution, F...
New
xiji2646-netizen
Been using a two-stage workflow for AI video production that’s been consistently more reliable than text-to-video: Generate a 3×3 stor...
New
xiji2646-netizen
There’s a GitHub repo at forrestchang/andrej-karpathy-skills that’s sitting at 97.8k stars. It’s a single CLAUDE.md file with four behavi...
New

Other popular topics Top

AstonJ
If it’s a mechanical keyboard, which switches do you have? Would you recommend it? Why? What will your next keyboard be? Pics always w...
New
PragmaticBookshelf
Stop developing web apps with yesterday’s tools. Today, developers are increasingly adopting Clojure as a web-development platform. See f...
New
DevotionGeo
I know that these benchmarks might not be the exact picture of real-world scenario, but still I expect a Rust web framework performing a ...
New
PragmaticBookshelf
From finance to artificial intelligence, genetic algorithms are a powerful tool with a wide array of applications. But you don't need an ...
New
AstonJ
I’ve been hearing quite a lot of comments relating to the sound of a keyboard, with one of the most desirable of these called ‘thock’, he...
New
AstonJ
Thanks to @foxtrottwist’s and @Tomas’s posts in this thread: Poll: Which code editor do you use? I bought Onivim! :nerd_face: https://on...
New
New
First poster: bot
zig/http.zig at 7cf2cbb33ef34c1d211135f56d30fe23b6cacd42 · ziglang/zig. General-purpose programming language and toolchain for maintaini...
New
New
PragmaticBookshelf
As digital systems increasingly run the world, mastery of the recurring patterns of software development risk is the key to fast and effe...
New