It's not possible to write (usefully novel) proofs with an LLM, but we have other algorithms that can do that. Perhaps a reinforcement learning component could improve upon the search strategy in some way, but there's no compelling reason to use a predictive text model. (There's not even good reason to believe that naïve reinforcement learning would improve the mathematical ability of a system: RL says "that was good: do more of that", and mathematics is about discovery: thinking thoughts that nobody has ever thought before.)