Hacker News new | past | comments | ask | show | jobs | submit login

Not OP, but likely because that was the only metric/benchmark/however you want to call it OpenAI showcased in the stream and on the blog to highlight the improvement between 4o and 4.5. To say that this is not really a good metric for comparison, not least because prompting can have a massive impact in this regard, would be an understatement.



Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: