You're right, I was wrong to say "most challenging" as there have been harder ones coming out recently. I think the correct statement would be "most challenging long-standing benchmark" as I don't believe any other test designed in 2019 has resisted progress for so long. FrontierMath is only a month old. And of course the real key feature of ARC is that it is easy for humans. FrontierMath is (intentionally) not.
They should put some famous, unsolved problems in the next edition so ML researchers do some actually useful work while they're "gaming" the benchmarks :)