Hacker News new | past | comments | ask | show | jobs | submit login

It's trivial to come up with prompts that 4o fails. If it's hard to come up with prompts that 1o succeeds on but 4o fails, that implies the delta is not that great.



Or, the delta depends on the nature of the problem/prompt, we’ve not yet figured that out, there’s a relatively narrow range of prompts with large delta, and so finding those examples is a work in progress?




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: