This seems incorrect. I don't need Claude 3.5 Sonnet to operate a robot body for...

trzy · 2024-10-29T18:28:17 1730226497

Of course not but the task requires excellent image understanding, large context window, a mix of structured and unstructured output, high level and spatial reasoning, and a conversational layer on top.

I find it’s predictive of relative performance in other tasks I use LLMs for. Claude is the best. The only shortcoming is its peculiar verbosity.

Definitely superior to anything OpenAI has and miles beyond the “open weights” alternatives like Llama.

int_19h · 2024-10-29T19:54:21 1730231661

The problem is that it also fails on fairly simple logic puzzles that ChatGPT can do just fine.

For example, even the new 3.5 Sonnet can't solve this reliably:

> Doom Slayer needs to teleport from Phobos to Deimos. He has his pet bunny, his pet cacodemon, and a UAC scientist who tagged along. The Doom Slayer can only teleport with one of them at a time. But if he leaves the bunny and the cacodemon together alone, the bunny will eat the cacodemon. And if he leaves the cacodemon and the scientist alone, the cacodemon will eat the scientist. How should the Doom Slayer get himself and all his companions safely to Deimos?

In fact, not only its solution is wrong, but it can't figure out why it's wrong on its own if you ask it to self-check.

In contrast, GPT-4o always consistently gives the correct response.