> The problem is eventually what are LLMs going’s to draw from?
Published documentation.
I'm going to make up a number but I'll defend it: 90% of the information content of stackoverflow is regurgitated from some manual somewhere. The problem is that the specific information you're looking for in the relevant documentation is often hard to find, and even when found is often hard to read. LLMs are fantastic at reading and understanding documentation.
I've answered dozens of questions on stackoverflow.com with tags like SIMD, SSE, AVX, NEON. Only a minority of these asked for a single SIMD instruction which does something specific. Usually people ask how to use the complete instruction set to accomplish something higher level.
Documentation alone doesn't answer questions like that, you need an expert who actually used that stuff.
And even for trivial questions, there is a lot out there that the doc developers ignored or hid.
For me, the current state of affairs is the difficulty of aiming search engines to the right answer. The answer might be out there, but figuring out HOW t oask the questions, which are the right keywords, etc - basically requires knowing where the answer is. LLMs have potential in rephrasing boththe question and what they might have glanced at here and there - even if obvious in hindsight.
Don’t use search engines unless you know what you’re searching. Which means I start with another material first (books, wiki, manual,…) which will give me enough ideas of what I want will be. Starting with a search engine is like searching for a piece of a jigzaw with only the picture on the box, you have to know it’s shape first and where it would go.
It's a very similar situation with concurrent programming. Knowing which instruction does a CAS, or the exact memory semantics of a particular load/store/etc., doesn't do much on its right own.
Knowledge gained from experience that is not included in documentation is also significant part of SO. For example "This library will not work with service Y because of X, they do not support feature Y, as I discovered when I tried to use it myself" or other empirical evidence about the behavior of software that isn't documented.
Published documentation has been and can be wrong. In the late 1990's and early 2000's when I still did a mix of Microsoft technologies and Java, I found several bad non-obvious errors in MSDN documentation. AI today would likely regurgitate it in a soft but seemingly mild but arguably authoritative sounding way. At least when discussing with real people after the arrows fly and the dust settles, we can figure out the truth.
Everything (and everyone for that matter) can be and has been wrong. What matter is if it is useful. And AI as it is now is pretty decent at finding ("regurgitating") information in large bodies of data much faster than humans and with enough accuracy to be "good enough" for most uses.
Nothing will ever replace your own critical thinking and judgment.
> At least when discussing with real people after the arrows fly and the dust settles, we can figure out the truth.
You can actually do that with AI now. I have been able to correct AI many times via a Socratic approach (where I didn't know the correct answer, but I knew the answer the AI gave me was wrong).
From personal experience, I'm skeptical of the quantity and especially quality of published documentation available, the completeness of that documentation, the degree to which it both recognizes and covers all the relevant edge cases, etc. Even Apple, which used to be quite good at that kind of thing, has increasingly effectively referred developers to their WWDC videos. I'm also skeptical of the ability of the LLMs to ingest and properly synthesize that documentation - I'm willing to bet the answers from SO and Reddit are doing more heavy lifting on shaping the LLM's "answers" than you're hoping here.
There is nothing in my couple decades of programming or experience with LLMs that suggests to me that published documentation is going to be sufficient to let an LLM produce sufficient quality output without human synthesis somehwere in the loop.
yes, a lot of stuff is like that, and LLMs are a good replacement for searching the docs; but the most useful SO answers are regarding issues that are either not documented or poorly documented, and someone knows because they tried it and found what did or did not work
Published documentation.
I'm going to make up a number but I'll defend it: 90% of the information content of stackoverflow is regurgitated from some manual somewhere. The problem is that the specific information you're looking for in the relevant documentation is often hard to find, and even when found is often hard to read. LLMs are fantastic at reading and understanding documentation.