One thing I have not seen commented on is that ARC-AGI is a visual benchmark but...

csomar · 2024-12-21T02:05:06 1734746706

This is not new. When GPT-4 was released I was able to get it to generate SVGs albeit they were ugly they had the basics.

krackers · 2024-12-20T23:20:03 1734736803

Yeah this part is what makes the high performance even more surprising to me. The fact that LLMs are able to do so well on visual tasks (also seen with their ability to draw an image purely using textual output https://simonwillison.net/2024/Oct/25/pelicans-on-a-bicycle/) implies that not only do they actually have some "world model" but that this is in spite of the disadvantage given by having to fit a round peg in a square hole. It's like trying to map out the entire world using the orderly left-brain, without a more holistic spatial right-brain.

I wonder if anyone has experimented with having some sort of "visual" scratchpad instead of the "text-based" scratchpad that CoT uses.

skydhash · 2024-12-21T02:36:18 1734748578

A file is a stream of symbols encoded by bits according to some format. It’s pretty much 1D. It would be susprising that LLM couldn’t extract information from a file or a data stream.