> Even if they provide the training set (which is not typically the case), you still have to take their word for it—that’s not really "verification."
If they've done it right, you can re-run the training and get the same weights. And maybe you could spot-check parts of it without running the full training (e.g. if there are glitch tokens in the weights, you'd look for where they came from in the training data, and if they weren't there at all that would be a red flag). Is it possible to release the wrong training set (or the wrong instructions) and hope you don't get caught? Sure, but demanding that it be published and available to check raises the bar and makes it much more risky to cheat.
If they've done it right, you can re-run the training and get the same weights. And maybe you could spot-check parts of it without running the full training (e.g. if there are glitch tokens in the weights, you'd look for where they came from in the training data, and if they weren't there at all that would be a red flag). Is it possible to release the wrong training set (or the wrong instructions) and hope you don't get caught? Sure, but demanding that it be published and available to check raises the bar and makes it much more risky to cheat.