>that result is not verifiable, not reproducable, unknown if it was leaked and h...

riku_iki · 2024-12-23T00:14:02 1734912842

> Open ai haven't released any benchmark scores

there are multiple research results demonstrating that various benchmarks are heavily leaked to GPT training data.

Is it intentionally or not, we can't figure out, but they have very strong incentive to cheat to get more investments.

> Unless you have a reason to believe answers were leaked then again, not interested in baseless speculation.

this is scientific methodology when results have to be reproduced or confirmed before believed.

og_kalu · 2024-12-23T00:26:24 1734913584

Again, Frontier Math is private. Benchmarks leaked to GPT-4 are all public datasets on the internet. Frontier Math literally cannot leak that way.

If you don't want to take the benchmarks at face value then good for you but this entire conversation is pointless.

riku_iki · 2024-12-23T00:31:02 1734913862

> Again, Frontier Math is private.

its private for outsiders, but it was developed in "collaboration" with OAI, and GPT was tested in the past on it, so they have it in logs somewhere.

> If you don't want to take the benchmarks at face value then good for you but this entire conversation is pointless.

If you think this entire conversation is pointless, then why do you continue?

og_kalu · 2024-12-23T00:53:11 1734915191

>its private for outsiders, but it was developed in "collaboration" with OAI, and GPT was tested in the past on it, so they have it in logs somewhere.

They have logs of the questions probably but that's not enough. Frontier Math isn't something that can be fully solved without gathering top experts at multiple disciplines. Even Tao says he only knows who to ask for the most difficult set.

Basically, what you're suggesting at least with this benchmark in particular is far more difficult than you're implying.

>If you think this entire conversation is pointless, then why do you continue?

There's no point arguing about how efficient the models are being (the original point) if you won't even accept the results of the benchmarks. Why i'm continuing ? For now, it's only polite to clarify.

riku_iki · 2024-12-23T02:12:35 1734919955

> Frontier Math isn't something that can be fully solved without gathering top experts

Tao's quote above referred on hardest 20% problems, they have 3 levels of difficulty, presumably first level is much easier. Also, as I mentioned OAI collaborated on creating benchmark, so they could have access to all solutions too.

> There's no point arguing

Lol, let me ask again, why you are arguing then? Yes, I have strong reasonable(imo) doubt that those results are valid.

og_kalu · 2024-12-23T14:32:24 1734964344

The lowest set is easier but still incredibly difficult. Top experts are no longer required sure but that's it. You'll still need the best of the best undergrads at the very least to solve it.

>Also, as I mentioned OAI collaborated on creating benchmark, so they could have access to all solutions too.

Open AI didn't have any hand in providing problems, why you assume they have the solutions I have no idea.

>Lol, let me ask again, why you are arguing then? Yes, I have strong reasonable(imo) doubt that those results are valid.

Are you just bring obtuse or what ? I stopped arguing with you a couple responses ago. You have doubts? good for you. They don't make much sense but hey, good for you.

This is my last response here so have a nice day.

riku_iki · 2024-12-23T22:08:55 1734991735

> You'll still need the best of the best undergrads at the very least to solve it.

Ok, so I hope you admit that OAI could manually solve them now?

> Open AI didn't have any hand in providing problem

And you know this exactly how?

> I stopped arguing with you a couple responses ago

sure, of course, lmao