Hacker News new | past | comments | ask | show | jobs | submit login

In the wake of the o1 release, and with the old aider benchmark saturating, Paul from aider has created a new, much harder benchmark. o1 dominates by a substantial margin.

https://aider.chat/docs/leaderboards/ https://aider.chat/2024/12/21/polyglot.html




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: