You're right, I was wrong to say "most challenging" as there have been harder on... | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

modeless 4 months ago | parent | context | favorite | on: OpenAI O3 breakthrough high score on ARC-AGI-PUB

You're right, I was wrong to say "most challenging" as there have been harder ones coming out recently. I think the correct statement would be "most challenging long-standing benchmark" as I don't believe any other test designed in 2019 has resisted progress for so long. FrontierMath is only a month old. And of course the real key feature of ARC is that it is easy for humans. FrontierMath is (intentionally) not.

esafak 4 months ago [–]

They should put some famous, unsolved problems in the next edition so ML researchers do some actually useful work while they're "gaming" the benchmarks :)

modeless 4 months ago | [–]

I'm certain that the big labs will be gunning for the Millenium Prize problems.

Consider applying for YC's Summer 2025 batch! Applications are open till May 13
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact