fair enough, i suppose im a believer that the seeds are planted, the day is soon...

wizzwizz4 · 2025-05-08T11:37:32 1746704252

I do research in this field. LLMs can be used as (ridiculously inefficient) implementations of some search algorithms that we haven't yet identified and implemented in software ourselves, but which can be inferred from a statistical analysis of the literature. Sometimes those search algorithms generalise to new areas, but more often than not, they flail. The primary advantage of a language model is that it's one big algorithm: when one subcomponent would flail, another (more "confident") subcomponent becomes dominant; but that doesn't solve the case where none of the subcomponents are competent and confident in equal measure. In short: to the extent it's useful, it's a research dead-end. Any potential improvements we understand are better implemented as actual search algorithms.

You've probably seen that thing where ChatGPT cracked Enigma[0]. It used several orders of magnitude more computational power than a Bombe (even given Moore's Law, still thousands of times more electrical power), and still took two dozen times longer. You would literally be better off doing brute-force search with a German dictionary. Thus is it with mathematics: a brute-force search is usually cheaper and better than trying to use a language model.

Terry Tao is one of the staunchest knowledgeable advocates of GPT models in mathematical research, and afaik he doesn't even bother trying to use the models for proof search. It's like trying to build a house with a box of shoes: sure, the shoe is technically more versatile because you can't use a hammer for tightening bolts (the shoe's sole has enough friction to do this) or foot protection (the shoe is the right shape for this) or electrical isolation (the bottom surface of the shoe is largely rubber), but please just use a hammer if you want to manipulate nails.

[0]: https://www.techradar.com/news/we-watched-an-ai-crack-the-en... – and I know that's not the original ChatGPT®, but I am not rewarding this company for such a wasteful and pointless publicity stunt.

ldjkfkdsjnv · 2025-05-08T12:48:01 1746708481

You are talking about current capabilities versus what is probably possible. Sure, its not possible to write proofs with an LLM right now.

Somewhere inside openai is a reinforcement learning loop that looks like:

current_code = read(code_base_from_file.txt)

modify_prompt = "some prompt to modify code with expected outcome"

result = model.run(modify_prompt, current_code)

if check(result):

    provide_positive_feedback(model)

else:

    provide_negative_feedback(model)

and its clear that not only does this work, it maybe the future of what unravels software engineering.

the current models are being trained for coding, but theres no reason this couldnt be tried for other domains, like pure math.

wizzwizz4 · 2025-05-08T20:18:23 1746735503

It's not possible to write (usefully novel) proofs with an LLM, but we have other algorithms that can do that. Perhaps a reinforcement learning component could improve upon the search strategy in some way, but there's no compelling reason to use a predictive text model. (There's not even good reason to believe that naïve reinforcement learning would improve the mathematical ability of a system: RL says "that was good: do more of that", and mathematics is about discovery: thinking thoughts that nobody has ever thought before.)

cycrutchfield · 2025-05-08T16:34:56 1746722096

Wow, so easy. Why didn’t anybody think of that before? How shall I inscribe your Fields Medal, sir?

cycrutchfield · 2025-05-08T02:24:37 1746671077

https://en.wikipedia.org/wiki/Dunning%E2%80%93Kruger_effect

ldjkfkdsjnv · 2025-05-08T12:50:12 1746708612

only low iq people talk about the dunning kruger effect

cycrutchfield · 2025-05-08T16:33:09 1746721989

Only low iq people talk confidently about fields of study they barely comprehend.