CommunityNews

Butter-Bench: Evaluating LLM Controlled Robots for Practical Intelligence | Andon Labs

Can LLMs control robots? We answer this by testing how good models are at passing the butter – or more generally, do delivery tasks in a household setting. State of the art models struggle, with the best model scoring 40% at Butter-Bench, compared to 95% for humans.

Read in full here:

View thread on forum

#llm

0 409 0

2025-11-01 02:09:30 UTC

Where Next?

View thread on forum

llm

Home AI>In The News

#llm

0 409 0

Last post

Popular Ai topics

AI>In The News

Nvidia Announces A100 80GB GPU for AI

NVIDIA Doubles Down: Announces A100 80GB GPU, Supercharging World’s Most Powerful GPU for AI Supercomputing. SC20—NVIDIA today unveiled ...

nvidianews.nvidia.com

#nvidia

0 1351 1

2020-11-19 00:28:58 UTC

New

AI>In The News

AI acquires the power to manipulate fusion, but wait, it’s actually good news

A research group has taught AI to magnetically wrangle a high-powered stream of plasma used for fusion research — but wait! Put away your...

techcrunch.com

#news

0 961 0

2022-02-18 02:17:18 UTC

New

AI>In The News

Building games and apps entirely through natural language using OpenAI’s code-davinci model

Building games and apps entirely through natural language using OpenAI’s code-davinci model. TL;DR: OpenAI has a new code generating mod...

andrewmayneblog.wordpress.com

#apps #games #code

0 1117 0

2022-03-19 02:14:24 UTC

New

AI>In The News

Hyundai announces $400M AI, robotics institute powered by Boston Dynamics

When Hyundai acquired Boston Dynamics at the end of 2020, there were plenty of open questions. Chief among them was why we should assume ...

techcrunch.com

#robotics

0 929 0

2022-08-15 13:27:08 UTC

New

AI>In The News

Klarna CEO says the company stopped hiring a year ago because AI 'can already do all of the jobs'

Klarna CEO says the company stopped hiring a year ago because AI ‘can already do all of the jobs’. Klarna CEO Sebastian Siemiatkowski sa...

businessinsider.com

/erlang #jobs #klarna

2 814 2

2024-12-24 16:46:22 UTC

New

AI>In The News

AI: Where in the Loop Should Humans Go?

SRE Fred Hebert provides you with a list of questions to ask about potential AI solutions, including where humans should be involved.

honeycomb.io

/elixir /erlang /go

5 774 3

2025-03-18 18:04:30 UTC

New

AI>In The News

AI video just took a startling leap in realism. Are we doomed?

Google’s Veo 3 delivers AI videos of realistic people with sound and music. We put it to the test.

arstechnica.com

#video #veo

10 695 8

2025-06-10 13:52:30 UTC

New

AI>In The News

Cursor snaps up enterprise startup Koala in challenge to GitHub Copilot

Cursor maker Anysphere is snapping up top talent from AI enterprise startups in an effort to compete with Microsoft’s GitHub Copilot.

techcrunch.com

#github #startup #cursor

0 575 0

2025-07-19 16:44:29 UTC

New

AI>In The News

Netflix starts using GenAI in its shows and films

Netflix said it used generative AI for the first time for a scene in an Argentinean show called “El Eternauta.”

techcrunch.com

#netflix

0 520 0

2025-07-20 04:57:15 UTC

New

AI>In The News

Read That F*cking Code!

Stop vibe-coding blindly! Why reading AI-generated code is crucial in 2025. Avoid security flaws, architectural decay, and knowledge loss...

etsd.tech

#code

3 607 3

2025-08-12 20:59:43 UTC

New

Other popular topics

General Dev>Learning Resources

A Common-Sense Guide to Data Structures and Algorithms, Second Edition

Algorithms and data structures are much more than abstract concepts. Mastering them enables you to write code that runs faster and more e...

pragprog.com

#pragprog /python /ruby #published-book /book-a-common-sense-guide-to-data-structures-and-algorithms-second-edition #math #algorithms /js

19 6022 5

2020-08-14 00:58:37 UTC

New

General Dev>Learning Resources

Forge Your Future with Open Source

Free and open source software is the default choice for the technologies that run our world, and it’s built and maintained by people like...

pragprog.com

#pragprog #published-book /book-forge-your-future-with-open-source

3 5654 0

2020-04-21 18:37:36 UTC

New

Community>Journals

Programming Erlang Book Club

My first contact with Erlang was about 2 years ago when I used RabbitMQ, which is written in Erlang, for my job. This made me curious and...

/erlang /book-programming-erlang-2nd-edition #book-club

195 6815 95

2025-02-16 20:22:17 UTC

New

Backend>Learning Resources

Effective Haskell

Build efficient applications that exploit the unique benefits of a pure functional language, learning from an engineer who uses Haskell t...

pragprog.com

#pragprog /haskell #published-book /book-effective-haskell

15 10218 1

2022-02-16 10:09:51 UTC

New

Game Dev>Questions

Can I use Java to program a game for Nintendo switch?

I am trying to crate a game for the Nintendo switch, I wanted to use Java as I am comfortable with that programming language. Can you use...

/java #nintendo

8 4771 3

2023-09-15 11:15:04 UTC

New

macOS>Chat

How to block any website on Mac using Little Snitch

If you want a quick and easy way to block any website on your Mac using Little Snitch simply… File > New Rule: And select Deny, O...

#macos #how-to #littlesnitch

5 11227 3

2022-07-05 00:59:40 UTC

New

Windows>Chat

Taskbar Overflow Menu (NOT System Tray Overflow)

There appears to have been an update that has changed the terminology for what has previously been known as the Taskbar Overflow - this h...

#taskbar-overflow-win-11

3 3715 2

2023-02-13 08:43:55 UTC

New

General Dev>In The News

Zig now has built-in HTTP server and client in std

zig/http.zig at 7cf2cbb33ef34c1d211135f56d30fe23b6cacd42 · ziglang/zig. General-purpose programming language and toolchain for maintaini...

github.com

/zig #http

0 5624 0

2023-05-19 00:35:41 UTC

New

Backend>Questions

Psql: error: connection to server on socket "/tmp/.s.PGSQL.5432" failed: No such file or directory

If you’re getting errors like this: psql: error: connection to server on socket “/tmp/.s.PGSQL.5432” failed: No such file or directory ...

#macos /rails /postgresql

1 5553 1

2024-10-17 02:03:48 UTC

New

AI>Questions

What are the best text-to-speech ai generation tools that you can run locally?

Background Lately I am in a quest to find a good quality TTS ai generation tool to run locally in order to create audio for some videos I...

#ai #text-to-speech

6 12337 3

2025-03-24 16:48:39 UTC

New

AI>In The News

AI meets Cryptography 1: What AI Found in Cloudflare's CIRCL

AI>In The News

Understanding the Dynamics of the AI Ecosystem with Pace Layers

AI>In The News

Retrieval-Augmented Generation (RAG)

AI>In The News

How LLMs Really Work

AI>In The News

Mapping with In-Memory Layers to Reduce LLM Overload

AI>In The News

How working memory could give rise to consciousness

AI>In The News

We have Mythos at Home: GLM 5.2 beats Claude in our Cyber Benchmarks

AI>In The News

Ford rehires human engineers after AI fails to match quality checks

AI>In The News

Professor denounces mass AI fraud on an exam at Brown University: ‘Academic integrity is at risk’

AI>In The News

Using Opus 4.8 to get a second opinion on an MRI and where it leaves me

AI>In The News

AI In The News ❯

Latest on Devtalk

Preact 10.29.7 released!

Frontend>Official News

Quarkus 3.37.2 released!

Backend>Official News

React Native v0.87.0-rc.0 released!

Hybrid>Official News

Deno v2.9.2 released!

Frontend>Official News

Spring Start Here, Second Edition

Frontend>Learning Resources

Node.js v26.5.0 released!

Backend>Official News

Nest v11.1.28 released!

Backend>Official News

Preact 10.29.6 released!

Frontend>Official News

AI meets Cryptography 1: What AI Found in Cloudflare's CIRCL

AI>In The News

Men’s average testosterone levels have halved in last 50 years, say scientists

Science/Tech>Science

Why TypeScript 7.0 Was Rewritten in Go — And What It Means for Your Dev Stack

Backend>In The News

Understanding the Dynamics of the AI Ecosystem with Pace Layers

AI>In The News

Preact 10.29.5 released!

Frontend>Official News

Learning another language appears to slow brain ageing, scientists say

Science/Tech>Science

Retrieval-Augmented Generation (RAG)

AI>In The News

Devtalk ❯

We ❤️ helpful members!

We reward our most helpful members via our MOTM scheme - by giving away a whopping 25 books per year!

Sub Categories:

We're in Beta

About us Mission Statement See our Roadmap

Butter-Bench: Evaluating LLM Controlled Robots for Practical Intelligence | Andon Labs

CommunityNews

Butter-Bench: Evaluating LLM Controlled Robots for Practical Intelligence | Andon Labs

Where Next?

Popular Ai topics

Nvidia Announces A100 80GB GPU for AI

AI acquires the power to manipulate fusion, but wait, it’s actually good news

Building games and apps entirely through natural language using OpenAI’s code-davinci model

Hyundai announces $400M AI, robotics institute powered by Boston Dynamics

Klarna CEO says the company stopped hiring a year ago because AI 'can already do all of the jobs'

AI: Where in the Loop Should Humans Go?

AI video just took a startling leap in realism. Are we doomed?

Cursor snaps up enterprise startup Koala in challenge to GitHub Copilot

Netflix starts using GenAI in its shows and films

Read That F*cking Code!

Other popular topics

A Common-Sense Guide to Data Structures and Algorithms, Second Edition

Forge Your Future with Open Source

Programming Erlang Book Club

Effective Haskell

Can I use Java to program a game for Nintendo switch?

How to block any website on Mac using Little Snitch

Taskbar Overflow Menu (NOT System Tray Overflow)

Zig now has built-in HTTP server and client in std

Psql: error: connection to server on socket "/tmp/.s.PGSQL.5432" failed: No such file or directory

What are the best text-to-speech ai generation tools that you can run locally?

Sponsor Spotlight

AI>In The News

Latest on Devtalk

We ❤️ helpful members!

Devtalk Sponsors

Categories:

Sub Categories:

Popular Portals

Devtalk Sponsors

We're in Beta