Hacker News new | past | comments | ask | show | jobs | submit login

It's not 6000/task (i.e per question). 6000 is about the retail cost for evaluating the entire benchmark on high efficiency (about 400 questions)



From reading the blog post and Twitter, and cost of other models, I think it's evident that it IS actually cost per task, see this tweet: https://files.catbox.moe/z1n8dc.jpg

And o1 cost $15/$60 for 1M in/out, so the estimated costs on the graph would match for a single task, not the whole benchmark.


The blog clarifies that it's $17-20 per task. Maybe it runs into thousands for tasks it can't solve?


That cost is for o3 low, o3 high goes into thousands per task.




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: