there is a straight-faced argument that transformers are already a mathematical research tool and even a source of new theorems. and if true, surely more revolutionary than some ultra complicated corner of math
Does there exist a person who would make this argument straight-faced? I am a professional mathematician and have yet to hear of anyone coaxing an even slightly interesting new theorem out of AI. I think the day is clearly coming but it's not here.
fair enough, i suppose im a believer that the seeds are planted, the day is soon. and i must say, it seems more worhtwhile trying to figure out how to finetune an llm/implement reinforcement learning that could do some form of pure math, than it is to try and do new pure math by hand
I do research in this field. LLMs can be used as (ridiculously inefficient) implementations of some search algorithms that we haven't yet identified and implemented in software ourselves, but which can be inferred from a statistical analysis of the literature. Sometimes those search algorithms generalise to new areas, but more often than not, they flail. The primary advantage of a language model is that it's one big algorithm: when one subcomponent would flail, another (more "confident") subcomponent becomes dominant; but that doesn't solve the case where none of the subcomponents are competent and confident in equal measure. In short: to the extent it's useful, it's a research dead-end. Any potential improvements we understand are better implemented as actual search algorithms.
You've probably seen that thing where ChatGPT cracked Enigma[0]. It used several orders of magnitude more computational power than a Bombe (even given Moore's Law, still thousands of times more electrical power), and still took two dozen times longer. You would literally be better off doing brute-force search with a German dictionary. Thus is it with mathematics: a brute-force search is usually cheaper and better than trying to use a language model.
Terry Tao is one of the staunchest knowledgeable advocates of GPT models in mathematical research, and afaik he doesn't even bother trying to use the models for proof search. It's like trying to build a house with a box of shoes: sure, the shoe is technically more versatile because you can't use a hammer for tightening bolts (the shoe's sole has enough friction to do this) or foot protection (the shoe is the right shape for this) or electrical isolation (the bottom surface of the shoe is largely rubber), but please just use a hammer if you want to manipulate nails.
It's not possible to write (usefully novel) proofs with an LLM, but we have other algorithms that can do that. Perhaps a reinforcement learning component could improve upon the search strategy in some way, but there's no compelling reason to use a predictive text model. (There's not even good reason to believe that naïve reinforcement learning would improve the mathematical ability of a system: RL says "that was good: do more of that", and mathematics is about discovery: thinking thoughts that nobody has ever thought before.)