I don't know a lot about this but it seems like if the sampling performance was adequate, external checks like theorem verification would work to get "over the data wall."
There have already been good results there with DeepMind's math Olympiad work. I think the LLM portion there was only for translating from informal to formal in the training process and in the final process they still used a manual translation to a formal description and the solver was transformer based and RL trained, but I think not starting with any language base, but it was able to learn some distribution helpful in solving the problems with RL, verifier,and light scaffolding of the tree search alone.