CommunityNews

CommunityNews

Butter-Bench: Evaluating LLM Controlled Robots for Practical Intelligence | Andon Labs

Can LLMs control robots? We answer this by testing how good models are at passing the butter – or more generally, do delivery tasks in a household setting. State of the art models struggle, with the best model scoring 40% at Butter-Bench, compared to 95% for humans.

Read in full here:

Where Next?

Popular Ai topics Top

First poster: bot
NVIDIA Uses AI to Slash Bandwidth on Video Calls. NVIDIA Research has invented a way to use AI to dramatically reduce video call bandwid...
New
New
First poster: jss
We are in the middle of an AI boom. Machine Learning experts command extraordinary salaries, investors are happy to open their hearts and...
New
First poster: bot
A research group has taught AI to magnetically wrangle a high-powered stream of plasma used for fusion research — but wait! Put away your...
New
First poster: bot
When Hyundai acquired Boston Dynamics at the end of 2020, there were plenty of open questions. Chief among them was why we should assume ...
New
First poster: bot
You can’t solve AI security problems with more AI. One of the most common proposed solutions to prompt injection attacks (where an AI la...
New
CommunityNews
AI supercomputer will use “tens of thousands” of Nvidia A100 and H100 GPUs.
New
First poster: AstonJ
SRE Fred Hebert provides you with a list of questions to ask about potential AI solutions, including where humans should be involved.
New
First poster: gflashner
Google’s openly available Gemma collection of AI models has reached a milestone: over 150 million downloads. Omar Sanseviero, a developer...
New
New

Other popular topics Top

AstonJ
Or looking forward to? :nerd_face:
483 11975 256
New
Exadra37
I am thinking in building or buy a desktop computer for programing, both professionally and on my free time, and my choice of OS is Linux...
New
siddhant3030
I’m thinking of buying a monitor that I can rotate to use as a vertical monitor? Also, I want to know if someone is using it for program...
New
AstonJ
SpaceVim seems to be gaining in features and popularity and I just wondered how it compares with SpaceMacs in 2020 - anyone have any thou...
New
brentjanderson
Bought the Moonlander mechanical keyboard. Cherry Brown MX switches. Arms and wrists have been hurting enough that it’s time I did someth...
New
AstonJ
There’s a whole world of custom keycaps out there that I didn’t know existed! Check out all of our Keycaps threads here: https://forum....
New
AstonJ
If you are experiencing Rails console using 100% CPU on your dev machine, then updating your development and test gems might fix the issu...
New
AstonJ
Was just curious to see if any were around, found this one: I got 51/100: Not sure if it was meant to buy I am sure at times the b...
New
PragmaticBookshelf
Author Spotlight: VM Brasseur @vmbrasseur We have a treat for you today! We turn the spotlight onto Open Source as we sit down with V...
New
PragmaticBookshelf
Get the comprehensive, insider information you need for Rails 8 with the new edition of this award-winning classic. Sam Ruby @rubys ...
New