CommunityNews

CommunityNews

Web Bench - A new way to compare AI Browser Agents

TL;DR: Web Bench is a new dataset to evaluate web browsing agents that consists of 5,750 tasks on 452 different websites, with 2,454 tasks being open sourced. Anthropic Sonnet 3.7 CUA is the current SOTA, with the detailed results here.

Over the past few months, Web

Read in full here:

Where Next?

Popular Ai topics Top

New
AstonJ
Well done DeepMind… wonder what else they’re working on… One of biology’s biggest mysteries has been solved using artificial intelligen...
New
First poster: bot
AI Is Discovering Patterns in Pure Mathematics That Have Never Been Seen Before. We can add suggesting and proving mathematical theorems...
New
New
New
First poster: CommunityNews
Making Things Think: How AI and Deep Learning Power the Products We Use — Holloway. AI now shapes our lives, yet few people know how mac...
New
First poster: CommunityNews
Chat-bots are amazing these days! About a month ago LaMDA made the news when it apparently convinced an engineer at Google that it was se...
New
AstonJ
This is cool! DEEPSEEK-V3 ON M4 MAC: BLAZING FAST INFERENCE ON APPLE SILICON We just witnessed something incredible: the largest open-s...
New
CommunityNews
The glamourous AI coding agent for your favourite terminal :heart_with_arrow: - charmbracelet/crush
New
First poster: jkdiaz
TechCrunch spoke to experienced coders about their time using AI-generated code about what they see as the future of vibe coding.
New

Other popular topics Top

AstonJ
What chair do you have while working… and why? Is there a ‘best’ type of chair or working position for developers?
New
dasdom
No chair. I have a standing desk. This post was split into a dedicated thread from our thread about chairs :slight_smile:
New
AstonJ
There’s a whole world of custom keycaps out there that I didn’t know existed! Check out all of our Keycaps threads here: https://forum....
New
AstonJ
I’ve been hearing quite a lot of comments relating to the sound of a keyboard, with one of the most desirable of these called ‘thock’, he...
New
PragmaticBookshelf
Learn different ways of writing concurrent code in Elixir and increase your application's performance, without sacrificing scalability or...
New
AstonJ
Was just curious to see if any were around, found this one: I got 51/100: Not sure if it was meant to buy I am sure at times the b...
New
New
sir.laksmana_wenk
I’m able to do the “artistic” part of game-development; character designing/modeling, music, environment modeling, etc. However, I don’t...
New
PragmaticBookshelf
Explore the power of Ash Framework by modeling and building the domain for a real-world web application. Rebecca Le @sevenseacat and ...
New
Fl4m3Ph03n1x
Background Lately I am in a quest to find a good quality TTS ai generation tool to run locally in order to create audio for some videos I...
New