Moreover, ARC-AGI-1 is now saturating – besides o3's new score, the fact is that a large ensemble of low-compute Kaggle solutions can now score 81% on the private eval.
If low-compute Kaggle solutions already does 81% - then why is o3's 75.7% considered such a breakthrough?
If low-compute Kaggle solutions already does 81% - then why is o3's 75.7% considered such a breakthrough?