CommunityNews

CommunityNews

Something weird is happening with LLMs and chess

Something weird is happening with LLMs and chess.
Are they good or bad?

Read in full here:

This thread was posted by one of our members via one of our news source trackers.

Most Liked

jmagnani

jmagnani

Been playing around with LLMs, but it feels like writing the right prompt is a trial-and-error thing.

Eiji

Eiji

Doing something using a tool, that doesn’t support it, doesn’t make sense and therefore it’s not worth analysing the results. :see_no_evil:


  1. e4 e6 2. d3 c5 3. Nf3 Nc6 4. g3 Nf6 5.

with such input data it’s pointless to show first 4 moves on graph. Saying

Wow, recent LLMs can sort of play chess! They fall apart after the early game (…)

is like saying:

While the input data was good, the results were bad

since as we see the LLMs were losing around said 4 initial moves. :chart_with_downwards_trend:


Since OpenAI is lame and doesn’t support full grammars, for the closed (OpenAI) models I tried generating up to 10 times and if it still couldn’t come up with a legal move, I just chose one randomly.

so …

Because the runner was cripple, his time was randomly chosen from a pool of possible times.

and? How does this adds anything to the discussion if the author have generated part or (possibly even) the whole output? :-1:


You are a chess grandmaster.
(…)
1. e4 e6 2. d3 c5 3. Nf3 Nc6 4. g3 Nf6 5.

It’s just limiting number of possibilities … In linked article there was no mention how the game was rated by the chess engine. There was no information how much blunders and how much mistakes there were. There was no tool used which estimates if said moves looked randomly or if there was actually some plan for the game. :face_with_diagonal_mouth:

Yet people found that LLMs could play all the way through to the end game, with never-before-seen boards.

Yes, chess engines does the same and better, so? It’s not really hard to write a simple algorithm which filters all moves to the only possible ones, doing a move and checking if the game is over. It’s a small surprise that there was no forced tie, but I guess even the weakest level can avoid it. :next_track_button:


The results were not generated from nowhere. There always need to be a source. While asking descriptive questions often helps it may drastically decrease number of possible results. In Google search for example, if you do not force a specific term, the engine is looking for a similar ones and the results may not always be the best. :confused:

Also the LLMs prefers mainstream narration for example preference for renewable energy among possible energy sources despite their disadvantages. The most popular LLMs are made by a huge companies and they can support everything including worst things and ideologies as long as it would not be against said companies. The good results were never considered as highest priority. :point_up:

At start we may be surprised about gpt-3.5-turbo-instruct, but then we notice that gpt-4.o at the start gives a better output, so it’s not “just better than others” - it’s just different. If it’s different (whatever it means) it’s not really worth to compare them. It’s like comparing 2 LLMs where each of them is based on extremely different sources with ideological background and be surprised that they discuss whether the best ideology is Nazism or Stalinism. :bulb:


and yeah … as always … that’s the powerful “AI” who would take our jobs and destroy humanity. I know chess only for fun and still I’m better than LLM which possibly contains information about thousands of chess plays. The only thing this article has definitely shown is that LLMs are far, far way from becoming an AI. :+1:

dani

dani

I agree, but let’s see in a couple of more years.

Where Next?

Popular General Dev topics Top

First poster: iPaul
TOKYO (Kyodo) – Japan’s government plans to encourage firms to let their employees choose to work four days a week instead of five, aimin...
New
First poster: mafinar
F# Is The Best Coding Language Today. If you want to personally pick up a programming language in order to become a better coder in what...
New
First poster: bot
It has some interesting features: It’s entirely wireless (the left half speaks Bluetooth to the right half, and the right half speaks B...
New
First poster: dimitarvp
A career ending mistake — Bitfield Consulting. As software engineers, we’re constantly making detailed, elaborate plans for computers to...
New
First poster: bot
Developing Godot Projects with Neovim. When I started using Godot Engine, what surprised me the most is the built-in Language Server Pro...
New
First poster: bot
GitHub - lucidrains/PaLM-rlhf-pytorch: Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the PaLM architectur...
New
First poster: bot
Large Language Models like ChatGPT say The Darnedest Things. The Errors They MakeWhy We Need to Document Them, and What We Have Decided ...
New
First poster: bot
sqlglot/python_sql_engine.md at main · tobymao/sqlglot. Python SQL Parser and Transpiler. Contribute to tobymao/sqlglot development by c...
New
New
CommunityNews
After six months of hard work, I’m thrilled to announce the general availability of Sidekiq 8.0! :partying_face::tada: Status Sidekiq is...
New

Other popular topics Top

Devtalk
Hello Devtalk World! Please let us know a little about who you are and where you’re from :nerd_face:
New
PragmaticBookshelf
Brace yourself for a fun challenge: build a photorealistic 3D renderer from scratch! In just a couple of weeks, build a ray tracer that r...
New
PragmaticBookshelf
Write Elixir tests that you can be proud of. Dive into Elixir’s test philosophy and gain mastery over the terminology and concepts that u...
New
AstonJ
Or looking forward to? :nerd_face:
498 13326 269
New
dimitarvp
Small essay with thoughts on macOS vs. Linux: I know @Exadra37 is just waiting around the corner to scream at me “I TOLD YOU SO!!!” but I...
New
PragmaticBookshelf
Create efficient, elegant software tests in pytest, Python's most powerful testing framework. Brian Okken @brianokken Edited by Kat...
New
AstonJ
Biggest jackpot ever apparently! :upside_down_face: I don’t (usually) gamble/play the lottery, but working on a program to predict the...
New
New
PragmaticBookshelf
Author Spotlight: Peter Ullrich @PJUllrich Data is at the core of every business, but it is useless if nobody can access and analyze ...
New
PragmaticBookshelf
Develop, deploy, and debug BEAM applications using BEAMOps: a new paradigm that focuses on scalability, fault tolerance, and owning each ...
New