Hacker News new | past | comments | ask | show | jobs | submit login

"LLMs can train on the official documentation of tools l/libraries but they can't experiment and figure out solutions to weird problems"

LLMs train on way more than just the official documentation: they train on the code itself, the unit tests for that code (which, for well written projects, cover all sorts of undocumented edge-based) and - for popular projects - thousands of examples of that library being used (and unit tested) "in the wild".

This is why LLMs are so effective at helping figure out edge-cases for widely used libraries.

The best coding LLMs are also trained on additional custom examples written by humans who were paid to build proprietary training data for those LLMs.

I suspect they are increasingly trained on artificially created examples which have been validated (to a certain extent) through executing that code before adding it to the training data. That's a unique advantage for code - it's a lot harder to "verify" non-code generated prose since you can't execute that and see if you get an error.




> they train on the code itself, the unit tests for that code

If understanding the code was enough, we wouldn't have any bugs or counterintuitive behaviors.

> and - for popular projects - thousands of examples of that library being used (and unit tested) "in the wild".

If people stopped contributing to forums, we won't have any such data for new things that are being made.


The examples I'm talking about come from openly licensed code in sources like GitHub, not from StackOverflow.

I would argue that code in GitHub is much more useful, because it's presented in the context of a larger application and is also more likely to work.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: