Is this LLM? It was not public until later, and it was actually revealed first b...

anoncareer0212 · 2025-02-28T06:39:17 1740724757

FrontierMath benchmark people saying OpenAI had shared folder access to some subset of eval Qs, which has been replaced, take a few leaps, and yes, that's getting "data under the table" - but, those few leaps! - and which, let's be clear, is the motte here.

freehorse · 2025-02-28T13:10:52 1740748252

This is nonsense, obviously the problem with getting "data under the table" is that they may have used it to training their models, thus rendering the benchmarks invalid. But for this danger, there is no other risk for them having access to it beforehand. We do not know if they used it for training, but the only reassurance being some "verbal agreement", as is reported, is not very reassuring. People are free to adjust their P(model_capabilities|frontiermath_results) based on their own priors.

refulgentis · 2025-02-28T20:31:47 1740774707

> This is nonsense

What is "this"?

> obviously the problem with getting "data under the table" is that they may have used it to training their models

I've been avoiding mentioning the maximalist version of the argument (they got data under the table AND used it to train models), because training wasn't stated until now, and it would have been unfair to bring it up without mention. That is that's 2 baileys out from "they had access to a shared directory that had some test qs in it, and this was reported publicly, and fixed publicly"

There's been a fairly severe communication breakdown here, I don't want to distract from ex. what the nonense is, so I won't belabor that point, but I don't want you to think I don't want to engage on it - just won't in this singular posts.

> but the only reassurance being some "verbal agreement", as is reported, is not very reassuring

It's about as reassuring as it gets without them releasing the entire training data, which is, at best, with charity marginally, oh so marginally reassuring I assume? If the premise is we can't trust anything self-reported, they could lie there too?

> People are free to adjust their P(model_capabilities|frontiermath_results) based on their own priors.

Certainly, that's not in dispute (perhaps the idea that you are forbidden from adjusting your opinion is the nonsense you're referring to? I certainly can't control that :) Nor would I want to!)

freehorse · 2025-02-28T22:04:46 1740780286

What is nonsense is the suggestion that there is a "reasonable" argument that they had access to the data (which we now know), and an "ambitious" argument that they used the data. But nobody said that they know for certain that the data was used, this is a strawman argument. We are talking that now there is a non-zero probability that it was. This is obviously what we have been discussing since the beginning, else we would not care whether they had access or not and it would not have been mentioned. There is a simple, single argument made here in this thread.

And FFS I assume the dispute is about the P given by people, not about if people are allowed to have a P.