The use case is creating embeddings for paragraphs from documents. Then, when the user issues a query, you create an embedding for the query (you can also do more complicated stuff), you find the most similar embeddings from the documents and you insert those as context into ChatGPT so that it can rephrase the documents to answer the users question.
This is fascinating, thanks for explaining this use case. It sounds like it's a fancy way to get a generic LLM to understand your ___domain by including the relevant context in a prompt.
What about specializing the LLM in the first place? Could you fine-tune the model by training it on those internal documents instead? Then you don't need to mess about with the prompt, right, since the LLM has "learned" the data?
With fine-tuning you are training on a couple hundred thousand tokens, compared to the billions of tokens the original dataset the LLM was trained on. It won't have as much of an effect. Maybe the original dataset includes a company policy from 5 years ago and now you fine-tune on the new policy. It's hard to guarantee what the model will spit out.
By giving the in-context prompt, it's more likely the model will choose the information you just gave it.
Of course, there are use cases where fine-tuning is worth it. For example, someone recently finetuned a model on their Messenger group chat that was going on for years, about 500k messages, and now they have a model that can imitate all the members of the chat.
In context learning (i.e., tuning the LLM by adding all relevant info to prompts) seems to be preferred over fine-tuning. I'm not sure what exactly the reason is, perhaps because fine-tuning often doesn't make sense in a dynamic setting since it's fairly expensive? On the other hand, the only reason the entire pipeline for vector DB exists is because context size is relatively limited.
I'd love to see a comparison between the two in terms of accuracy of the outputs and the degree of hallucinations.