Hacker News new | past | comments | ask | show | jobs | submit login

This seems incorrect. I don't need Claude 3.5 Sonnet to operate a robot body for me, and don't know anyone else who does. And general-purpose robotics is not going to be the most efficient way to have robots do many tasks ever, and certainly not in the short term.



Of course not but the task requires excellent image understanding, large context window, a mix of structured and unstructured output, high level and spatial reasoning, and a conversational layer on top.

I find it’s predictive of relative performance in other tasks I use LLMs for. Claude is the best. The only shortcoming is its peculiar verbosity.

Definitely superior to anything OpenAI has and miles beyond the “open weights” alternatives like Llama.


The problem is that it also fails on fairly simple logic puzzles that ChatGPT can do just fine.

For example, even the new 3.5 Sonnet can't solve this reliably:

> Doom Slayer needs to teleport from Phobos to Deimos. He has his pet bunny, his pet cacodemon, and a UAC scientist who tagged along. The Doom Slayer can only teleport with one of them at a time. But if he leaves the bunny and the cacodemon together alone, the bunny will eat the cacodemon. And if he leaves the cacodemon and the scientist alone, the cacodemon will eat the scientist. How should the Doom Slayer get himself and all his companions safely to Deimos?

In fact, not only its solution is wrong, but it can't figure out why it's wrong on its own if you ask it to self-check.

In contrast, GPT-4o always consistently gives the correct response.




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: