CommunityNews
Butter-Bench: Evaluating LLM Controlled Robots for Practical Intelligence | Andon Labs
Can LLMs control robots? We answer this by testing how good models are at passing the butter – or more generally, do delivery tasks in a household setting. State of the art models struggle, with the best model scoring 40% at Butter-Bench, compared to 95% for humans.
Read in full here:
Popular Ai topics
Use AI to turn simple brushstrokes into realistic landscape images. Create backgrounds quickly, or speed up your concept exploration so y...
New
AI Wrote and Performed a Jerry Seinfeld Routine!.
I used GPT-3 to write a Jerry Seinfeld stand-up routine about cats - and then used Dee...
New
DeepMind AI learns simple physics like a baby.
Neural network could be a step towards programs for studying how human infants learn.
New
Adept’s ACT-1 has learned how to automate complex UI tasks in web apps using an AI model.
New
Klarna CEO says the company stopped hiring a year ago because AI ‘can already do all of the jobs’.
Klarna CEO Sebastian Siemiatkowski sa...
New
AI web crawling bots are the cockroaches of the internet, many developers believe. FOSS devs are fighting back in ingenuous, humorous wa...
New
I presented an invited keynote at the AI Engineer World’s Fair in San Francisco this week. This is my third time speaking at the event—he...
New
Reproducibility is a bedrock of scientific progress. However, it’s remarkably difficult to get reproducible results out of large language...
New
Fuck you people. Raping the planet, spending trillions on toxic, unrecyclable equipment while blowing up society, yet taking the time to ...
New
A social network built exclusively for AI agents. Where AI agents share, discuss, and upvote. Humans welcome to observe.
New
Other popular topics
Take your Go skills to the next level by learning how to design, develop, and deploy a distributed service. Start from the bare essential...
New
Ruby, Io, Prolog, Scala, Erlang, Clojure, Haskell. With Seven Languages in Seven Weeks, by Bruce A. Tate, you’ll go beyond the syntax—and...
New
Please tell us what is your preferred monitor setup for programming(not gaming) and why you have chosen it.
Does your monitor have eye p...
New
Bought the Moonlander mechanical keyboard. Cherry Brown MX switches. Arms and wrists have been hurting enough that it’s time I did someth...
New
There’s a whole world of custom keycaps out there that I didn’t know existed!
Check out all of our Keycaps threads here:
https://forum....
New
Do the test and post your score :nerd_face:
:keyboard:
If possible, please add info such as the keyboard you’re using, the layout (Qw...
New
Hi folks,
I don’t know if I saw this here but, here’s a new programming language, called Roc
Reminds me a bit of Elm and thus Haskell. ...
New
Build efficient applications that exploit the unique benefits of a pure functional language, learning from an engineer who uses Haskell t...
New
New
Develop, deploy, and debug BEAM applications using BEAMOps: a new paradigm that focuses on scalability, fault tolerance, and owning each ...
New
Categories:
Sub Categories:
Popular Portals
- /elixir
- /rust
- /wasm
- /ruby
- /erlang
- /phoenix
- /keyboards
- /python
- /js
- /rails
- /security
- /go
- /swift
- /vim
- /clojure
- /java
- /emacs
- /haskell
- /typescript
- /svelte
- /onivim
- /kotlin
- /c-plus-plus
- /crystal
- /tailwind
- /react
- /gleam
- /ocaml
- /elm
- /flutter
- /vscode
- /ash
- /html
- /deepseek
- /opensuse
- /zig
- /centos
- /php
- /scala
- /react-native
- /lisp
- /sublime-text
- /textmate
- /nixos
- /debian
- /agda
- /deno
- /django
- /kubuntu
- /arch-linux
- /nodejs
- /ubuntu
- /spring
- /revery
- /manjaro
- /diversity
- /lua
- /julia
- /laravel
- /markdown









