However, the way it is progressing is that the SOTA is saturating the current be...

However, the way it is progressing is that the SOTA is saturating the current benchmarks; then a new one is conceived as people understand the nature of what it means to be intelligent. It seems only natural to concentrate on one benchmark at a time.

Francois Chollet mentioned that the test tries to avoid curve fitting (which he states is the main ability of LLMs). However, they specifically restricted the number of examples to do this. It is not beyond the realms of possibility that many examples could have been generated by hand though, and that the curve fitting has been achieved, rather than discrete programming.

Anyway, it’s all supposition. It’s difficult to know how genuine the results is, without knowledge of how it was actually achieved.