
CommunityNews
Web Bench - A new way to compare AI Browser Agents
TL;DR: Web Bench is a new dataset to evaluate web browsing agents that consists of 5,750 tasks on 452 different websites, with 2,454 tasks being open sourced. Anthropic Sonnet 3.7 CUA is the current SOTA, with the detailed results here.
Over the past few months, Web
Read in full here:
Popular Ai topics

Artificial intelligence and machine learning exist on the back of a lot of hard work from humans.
Alongside the scientists, there are th...
New

Use AI to turn simple brushstrokes into realistic landscape images. Create backgrounds quickly, or speed up your concept exploration so y...
New

How a robot is being used in the highly specialised business of creating flavours.
New

New

We present Imagen, a text-to-image diffusion model with an unprecedented degree of photorealism and a deep level of language understandin...
New

Upcoming “Hopper” GPU broke records in its MLPerf debut, according to Nvidia.
New

Ghostwriter generates, completes, or transforms code in 16 languages, similar to GitHub Copilot.
New

ChatGPT aims to produce accurate and harmless talk—but it’s a work in progress.
New

OpenJourney is a Text-to-Image AI model which has the goal of bringing an open source equivalent to Midjourney to the people. It is curre...
New

Ollama now supports new multimodal models with its new engine.
New
Other popular topics

Any thoughts on Svelte?
Svelte is a radical new approach to building user interfaces. Whereas traditional frameworks like React and Vue...
New

Which, if any, games do you play? On what platform?
I just bought (and completed) Minecraft Dungeons for my Nintendo Switch. Other than ...
New

Bought the Moonlander mechanical keyboard. Cherry Brown MX switches. Arms and wrists have been hurting enough that it’s time I did someth...
New

You might be thinking we should just ask who’s not using VSCode :joy: however there are some new additions in the space that might give V...
New

My first contact with Erlang was about 2 years ago when I used RabbitMQ, which is written in Erlang, for my job. This made me curious and...
New

Oh just spent so much time on this to discover now that RancherOS is in end of life but Rancher is refusing to mark the Github repo as su...
New

Here’s the story how one of the world’s first production deployments of LiveView came to be - and how trying to improve it almost caused ...
New

Intensively researching Erlang books and additional resources on it, I have found that the topic of using Regular Expressions is either c...
New

Will Swifties’ war on AI fakes spark a deepfake porn reckoning?
New

Background
Lately I am in a quest to find a good quality TTS ai generation tool to run locally in order to create audio for some videos I...
New
Categories:
Sub Categories:
Popular Portals
- /elixir
- /rust
- /ruby
- /wasm
- /erlang
- /phoenix
- /keyboards
- /rails
- /js
- /python
- /security
- /go
- /swift
- /vim
- /clojure
- /emacs
- /java
- /haskell
- /onivim
- /svelte
- /typescript
- /crystal
- /c-plus-plus
- /kotlin
- /tailwind
- /gleam
- /react
- /ocaml
- /flutter
- /elm
- /vscode
- /ash
- /opensuse
- /centos
- /php
- /deepseek
- /html
- /zig
- /scala
- /textmate
- /sublime-text
- /debian
- /nixos
- /lisp
- /agda
- /react-native
- /kubuntu
- /arch-linux
- /revery
- /ubuntu
- /spring
- /django
- /manjaro
- /diversity
- /lua
- /nodejs
- /slackware
- /julia
- /c
- /neovim