I would say they are a fairly good measure of how well the model has integrated ...

Legend2440 41 days ago | parent | context | favorite | on: Gemini 2.5

I would say they are a fairly good measure of how well the model has integrated information from pretraining.

They are not so good at measuring reasoning, out-of-___domain performance, or creativity.