>that result is not verifiable, not reproducable, unknown if it was leaked and how it was measured. Its kinda hype science.
It will be verifiable when the model is released. Open ai haven't released any benchmark scores that were shown falsified later so unless you have an actual reason to believe they're outright lying then it's not something to take seriously.
Frontier Math is a private benchmark with its highest tier of difficulty Terrence Tao says:
“These are extremely challenging. I think that in the near term basically the only way to solve them, short of having a real ___domain expert in the area, is by a combination of a semi-expert like a graduate student in a related field, maybe paired with some combination of a modern AI and lots of other algebra packages…”
Unless you have a reason to believe answers were leaked then again, not interested in baseless speculation.
>its private for outsiders, but it was developed in "collaboration" with OAI, and GPT was tested in the past on it, so they have it in logs somewhere.
They have logs of the questions probably but that's not enough. Frontier Math isn't something that can be fully solved without gathering top experts at multiple disciplines. Even Tao says he only knows who to ask for the most difficult set.
Basically, what you're suggesting at least with this benchmark in particular is far more difficult than you're implying.
>If you think this entire conversation is pointless, then why do you continue?
There's no point arguing about how efficient the models are being (the original point) if you won't even accept the results of the benchmarks. Why i'm continuing ? For now, it's only polite to clarify.
> Frontier Math isn't something that can be fully solved without gathering top experts
Tao's quote above referred on hardest 20% problems, they have 3 levels of difficulty, presumably first level is much easier. Also, as I mentioned OAI collaborated on creating benchmark, so they could have access to all solutions too.
> There's no point arguing
Lol, let me ask again, why you are arguing then? Yes, I have strong reasonable(imo) doubt that those results are valid.
The lowest set is easier but still incredibly difficult. Top experts are no longer required sure but that's it. You'll still need the best of the best undergrads at the very least to solve it.
>Also, as I mentioned OAI collaborated on creating benchmark, so they could have access to all solutions too.
Open AI didn't have any hand in providing problems, why you assume they have the solutions I have no idea.
>Lol, let me ask again, why you are arguing then? Yes, I have strong reasonable(imo) doubt that those results are valid.
Are you just bring obtuse or what ? I stopped arguing with you a couple responses ago. You have doubts? good for you. They don't make much sense but hey, good for you.
It will be verifiable when the model is released. Open ai haven't released any benchmark scores that were shown falsified later so unless you have an actual reason to believe they're outright lying then it's not something to take seriously.
Frontier Math is a private benchmark with its highest tier of difficulty Terrence Tao says:
“These are extremely challenging. I think that in the near term basically the only way to solve them, short of having a real ___domain expert in the area, is by a combination of a semi-expert like a graduate student in a related field, maybe paired with some combination of a modern AI and lots of other algebra packages…”
Unless you have a reason to believe answers were leaked then again, not interested in baseless speculation.