Hacker News new | past | comments | ask | show | jobs | submit login

LLM responses are random. One's failure is other's success. When evaluating we all should do rerurns and see how many times it fails or succeeds.

Without number of rerurns, the result is as good as random.




Okay?

OC was saying that the article said that Claude recognized the “artistic” lines of the image from just the scatter plot data.

That isn’t what happened.

The author added a png of the plot to the conversation.

Idk why I need to explain that twice.




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: