Educate me. I find them useful but they are less so when you try to do something...

lblume · 2025-02-10T19:11:11 1739214671

Let's say I can't fully disclose the details because it is an area I am actively working on, but I had an algorithmical problem that was already solved in an ancient paper, but after a few hours of research I could find no open implementation of it anywhere. I thus spent quite some time re-implementing this algorithm from scratch, but it kept failing in quite a few edge cases that should have been covered by the original design.

Just to try it out, I uploaded the paper to DeepSeek-R1 and wrote a paragraph on the desired algorithm, that it should code it in Python and that the code should be as simple as possible while still working in exactly the way as described in the paper. About ten minutes later (quite a long reasoning time, but inspecting the chain of thought, it did almost no overthinking, but only reasoned about ideas I had or should have considered) it generated a perfect implementation that worked for every single test case. I uploaded my own attempt, and it correctly found two errors in my code that were actually attributable to naming inconsistencies in the original paper that the model was able to spot and fix on the fly. (The model did not output this, this I had to figure out myself.) I would have never expected AI to do that in my lifetime just two years ago.

I don't know whether that counts as "novel" to you, but before DeepSeek, I also thought that Copilot-like AI would not be able to really disrupt programming. But this one experience completely changed my view. It might be the case the model was trained on similar examples, but I find it unlikely just because the concrete algorithm cannot be found online except for the paper.

james_marks · 2025-02-10T20:55:11 1739220911

This fits my experience. When the information is encoded somehow already, LLM’s excel at translating to another medium.

Combined with the old “nothing new under the Sun” maxim, in that most ideas are re-hashes or new combinations of existing ideas, and you’ve got a changed landscape.

asadotzler · 2025-02-10T20:49:28 1739220568

clearly NOT novel as you so clearly explained, "an algorithmical problem that was already solved in an ancient paper"

lblume · 2025-02-11T07:24:47 1739258687

Well, of course. Realistically, I would not expect AI systems like this to be very useful for novel cutting-edge scientific results, proving mathematical theorems etc. in the next few years.

But this is not the majority of what software developers are doing and working on today. Most have a set of features or goals to implement using code satisfying certain constraints, which is what current reasoning AI models seem to be able to do very well. Of course, this test was not rigorous in any meaningful way, but it really changed my mind on the pace of this technology.

hansonkd · 2025-02-10T17:57:42 1739210262

I think the trap people fall in is that LLMs don't need to be novel or reason as well as a human to revolutionize society.

Plenty of value is already added just by converting unstructured data to structured data. If that is all LLMs did they would be still be a revolution in programming and human development. So much manual entry and development work has essentially evaporated overnight.

If there was never a chat based LLM "agent" LLMs just converting arbitrary text to structured JSON schema would be the biggest advancement in comp sci since the internet. There is nothing equivalent that existed before except for manual extraction or rule based hard coding.

Judging LLMs based on some criteria of creativity or intuition from a chat is missing the forest for the trees.

BeetleB · 2025-02-10T17:26:08 1739208368

> find them useful but they are less so when you try to do something novel.

Well over 90% of work out there is not novel. It just needs someone to do it.

mistrial9 · 2025-02-10T16:55:53 1739206553

this is one example https://gorilla.cs.berkeley.edu/leaderboard.html