Hacker News new | past | comments | ask | show | jobs | submit login

How long has SimpleBench been posted? Out of the first 6 questions at https://simple-bench.com/try-yourself, o1-pro got 5/6 right.

It was interesting to see how it failed on question 6: https://chatgpt.com/c/6765e70e-44b0-800b-97bd-928919f04fbe

Apparently LLMs do not consider global thermonuclear war to be all that big a deal, for better or worse.




Don't worry, I also got that wrong :) I thought her affair would be the biggest problem for John.


John was an ex, not her partner. Tricky.




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: