CommunityNews

CommunityNews

Butter-Bench: Evaluating LLM Controlled Robots for Practical Intelligence | Andon Labs

Can LLMs control robots? We answer this by testing how good models are at passing the butter – or more generally, do delivery tasks in a household setting. State of the art models struggle, with the best model scoring 40% at Butter-Bench, compared to 95% for humans.

Read in full here:

Where Next?

Popular Ai topics Top

First poster: bot
Use AI to turn simple brushstrokes into realistic landscape images. Create backgrounds quickly, or speed up your concept exploration so y...
New
First poster: bot
AI Wrote and Performed a Jerry Seinfeld Routine!. I used GPT-3 to write a Jerry Seinfeld stand-up routine about cats - and then used Dee...
New
First poster: bot
DeepMind AI learns simple physics like a baby. Neural network could be a step towards programs for studying how human infants learn.
New
First poster: bot
Adept’s ACT-1 has learned how to automate complex UI tasks in web apps using an AI model.
New
First poster: alvinkatojr
Klarna CEO says the company stopped hiring a year ago because AI ‘can already do all of the jobs’. Klarna CEO Sebastian Siemiatkowski sa...
New
First poster: Eiji
AI web crawling bots are the cockroaches of the internet, many developers believe. FOSS devs are fighting back in ingenuous, humorous wa...
New
First poster: AstonJ
I presented an invited keynote at the AI Engineer World’s Fair in San Francisco this week. This is my third time speaking at the event—he...
New
CommunityNews
Reproducibility is a bedrock of scientific progress. However, it’s remarkably difficult to get reproducible results out of large language...
New
First poster: jss
Fuck you people. Raping the planet, spending trillions on toxic, unrecyclable equipment while blowing up society, yet taking the time to ...
New
CommunityNews
A social network built exclusively for AI agents. Where AI agents share, discuss, and upvote. Humans welcome to observe.
New

Other popular topics Top

PragmaticBookshelf
Take your Go skills to the next level by learning how to design, develop, and deploy a distributed service. Start from the bare essential...
New
PragmaticBookshelf
Ruby, Io, Prolog, Scala, Erlang, Clojure, Haskell. With Seven Languages in Seven Weeks, by Bruce A. Tate, you’ll go beyond the syntax—and...
New
Exadra37
Please tell us what is your preferred monitor setup for programming(not gaming) and why you have chosen it. Does your monitor have eye p...
New
brentjanderson
Bought the Moonlander mechanical keyboard. Cherry Brown MX switches. Arms and wrists have been hurting enough that it’s time I did someth...
New
AstonJ
There’s a whole world of custom keycaps out there that I didn’t know existed! Check out all of our Keycaps threads here: https://forum....
New
AstonJ
Do the test and post your score :nerd_face: :keyboard: If possible, please add info such as the keyboard you’re using, the layout (Qw...
New
Maartz
Hi folks, I don’t know if I saw this here but, here’s a new programming language, called Roc Reminds me a bit of Elm and thus Haskell. ...
New
PragmaticBookshelf
Build efficient applications that exploit the unique benefits of a pure functional language, learning from an engineer who uses Haskell t...
New
New
PragmaticBookshelf
Develop, deploy, and debug BEAM applications using BEAMOps: a new paradigm that focuses on scalability, fault tolerance, and owning each ...
New