Hacker News new | past | comments | ask | show | jobs | submit login

>that result is not verifiable, not reproducable, unknown if it was leaked and how it was measured. Its kinda hype science.

It will be verifiable when the model is released. Open ai haven't released any benchmark scores that were shown falsified later so unless you have an actual reason to believe they're outright lying then it's not something to take seriously.

Frontier Math is a private benchmark with its highest tier of difficulty Terrence Tao says:

“These are extremely challenging. I think that in the near term basically the only way to solve them, short of having a real ___domain expert in the area, is by a combination of a semi-expert like a graduate student in a related field, maybe paired with some combination of a modern AI and lots of other algebra packages…”

Unless you have a reason to believe answers were leaked then again, not interested in baseless speculation.




> Open ai haven't released any benchmark scores

there are multiple research results demonstrating that various benchmarks are heavily leaked to GPT training data.

Is it intentionally or not, we can't figure out, but they have very strong incentive to cheat to get more investments.

> Unless you have a reason to believe answers were leaked then again, not interested in baseless speculation.

this is scientific methodology when results have to be reproduced or confirmed before believed.


Again, Frontier Math is private. Benchmarks leaked to GPT-4 are all public datasets on the internet. Frontier Math literally cannot leak that way.

If you don't want to take the benchmarks at face value then good for you but this entire conversation is pointless.


> Again, Frontier Math is private.

its private for outsiders, but it was developed in "collaboration" with OAI, and GPT was tested in the past on it, so they have it in logs somewhere.

> If you don't want to take the benchmarks at face value then good for you but this entire conversation is pointless.

If you think this entire conversation is pointless, then why do you continue?


>its private for outsiders, but it was developed in "collaboration" with OAI, and GPT was tested in the past on it, so they have it in logs somewhere.

They have logs of the questions probably but that's not enough. Frontier Math isn't something that can be fully solved without gathering top experts at multiple disciplines. Even Tao says he only knows who to ask for the most difficult set.

Basically, what you're suggesting at least with this benchmark in particular is far more difficult than you're implying.

>If you think this entire conversation is pointless, then why do you continue?

There's no point arguing about how efficient the models are being (the original point) if you won't even accept the results of the benchmarks. Why i'm continuing ? For now, it's only polite to clarify.


> Frontier Math isn't something that can be fully solved without gathering top experts

Tao's quote above referred on hardest 20% problems, they have 3 levels of difficulty, presumably first level is much easier. Also, as I mentioned OAI collaborated on creating benchmark, so they could have access to all solutions too.

> There's no point arguing

Lol, let me ask again, why you are arguing then? Yes, I have strong reasonable(imo) doubt that those results are valid.


The lowest set is easier but still incredibly difficult. Top experts are no longer required sure but that's it. You'll still need the best of the best undergrads at the very least to solve it.

>Also, as I mentioned OAI collaborated on creating benchmark, so they could have access to all solutions too.

Open AI didn't have any hand in providing problems, why you assume they have the solutions I have no idea.

>Lol, let me ask again, why you are arguing then? Yes, I have strong reasonable(imo) doubt that those results are valid.

Are you just bring obtuse or what ? I stopped arguing with you a couple responses ago. You have doubts? good for you. They don't make much sense but hey, good for you.

This is my last response here so have a nice day.


> You'll still need the best of the best undergrads at the very least to solve it.

Ok, so I hope you admit that OAI could manually solve them now?

> Open AI didn't have any hand in providing problem

And you know this exactly how?

> I stopped arguing with you a couple responses ago

sure, of course, lmao




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: