I asked ChatGPT about playing chess: it says tests have shown it makes an illega...

simonw · 2025-04-28T06:08:08 1745820488

Preventing an LLM from making illegal moves should be very simple: provide it with tool access to something that tells it if a move is legal or not, then watch it iterate in a loop until it finds a move that it is allowed to make.

I expect this would dramatically improve the chess playing abilities of the competent tool using models, such as O3.

toolslive · 2025-04-28T06:23:03 1745821383

or just present it with the list of legal moves and force it to pick from said list.

simonw · 2025-04-28T06:38:06 1745822286

I imagine there are points in a chess game, especially early on, where that list could have hundreds of moves - could use up a fair amount of tokens.

toolslive · 2025-04-28T07:23:31 1745825011

Nope. The list is very limited. For the starting position: a3, a4, b3,b4,.......h3, h4, Na3, Nc3, Nf3, Nh3

That's 20 moves. the size grows a bit in the early middle game, but then drops again in the endgame. There do exist rather artificial positions with more than 200 legal moves, but the average number of legal moves in a position is around 40.

simonw · 2025-04-28T13:22:09 1745846529

Huh, that's really interesting, thanks.

I mentally counted the starting moves as being 8 pawns x2 = 16 pawn moves and 2x2 =4 4 knight moves, but then I doubled it for both sides to get 40 (which with hindsight was obviously wrong) and then assumed that once the pawns had moved a bit there would be more options from non-pawn pieces.

With an upper bound of ~200 in edge cases listing all possible moves wouldn't take up much room in the context at all. I wonder if it would give better results, too.

stavros · 2025-04-28T09:59:04 1745834344

You could also constrain the output grammar to legal moves, but if we're comparing its chess performance to humans', it would be unfair to not let it think.

bluefirebrand · 2025-04-28T21:28:25 1745875705

At any given time there are way more illegal moves than legal moves, though

red369 · 2025-04-28T08:15:18 1745828118

I have tried playing chess with ChatGPT a couple of times recently, and I found it was making illegal moves after about 4 or 5 moves.

The first few could be resolved by asking it to check its moves. After a few more, I was having to explain that knights can jump and therefore can’t be blocked. It was also trying to move pieces that weren’t there, onto squares alert occupied by its own pieces, and asking it to review was not getting anywhere. 10-15 moves is very optimistic, unless it’s counting each move by either side, i.e., White moves 5-8 times and Black moves 5-8 times. Even that seems optimistic, but the lower end could be right.

red369 · 2025-04-28T10:27:40 1745836060

I just tried again, and ChatGPT did much better. A notification said it was using GPT-4o mini, and it reached move 10 for White (me) before it lost the plot:

https://chatgpt.com/share/680f57b6-8554-800b-a042-f640224b91...

It didn't get much further with suggestions to review. Also, the small ASCII board it generated was incorrect much earlier, but it sometimes plays without that, so I let that go.