I wonder if the authors have tried incorporating error feedback from Lean into their models.
Work from 2023 [1] showed general purpose models did better when they were able to incorporate error feedback, humans incorporate error feedback, but none of the SOTA models on minif2f seem to.
In a way, DeepSeek Prover's subgoal decomposition is a partial-step towards error/proof-state feedback. (DS Prover breaks down a proof into subgoals and attacks each subgoal separately with batched sampling, then puts the pieces back together.)
This is distinct from the approach of the previous SOTA for an open-weights model (Kimina Prover) which generated at the full-proof level.
While it was very impressive to see Kimina's ability to generate medium-length proofs (think AIME-level problems) without sub-goals or feedback at intermediate steps, it's likely that at least subgoal decomposition will be required for longer proofs (think IMO-level problems.)
I certainly agree that where and how error/proof state feedback is best incorporated (training data synthesis / reward function / CoT during inference / etc.) is a fascinating area of research. (It's rumored that GDM's AlphaProof does use proof state / lean feedback already.)
Work from 2023 [1] showed general purpose models did better when they were able to incorporate error feedback, humans incorporate error feedback, but none of the SOTA models on minif2f seem to.
[1]: https://arxiv.org/abs/2310.04353