Hopefully we can get a better RAG out of it. Currently people do incredibly primitive stuff like chunking text into chunks of a fixed size and adding them to vector DB.
An actually useful RAG would be to convert text to Q&A and use Q's embeddings as an index. Large context can make use of in-context learning to make better Q&A.
A lot of people in RAG already do this. I do this with my product: we process each page and create lists of potential questions that the page would answer, and then embed that.
We also embed the actual text, though, because I found that only doing the questions resulted in inferior performance.
So in this case, what your workflow might look like is:
1. Get text from page/section/chunk
2. Generate possible questions related to the page/section/chunk
3. Generate an embedding using { each possible question + page/section/chunk }
4. Incoming question targets the embedding and matches against { question + source }
Is this roughly it? How many questions do you generate? Do you save a separate embedding for each question? Or just stuff all of the questions back with the page/section/chunk?
Right now I just throw the different questions together in a single embedding for a given chunk, with the idea that there’s enough dimensionality to capture them all. But I haven’t tested embedding each question, matching on that vector, and then returning the corresponding chunk. That seems like it’d be worth testing out.
An actually useful RAG would be to convert text to Q&A and use Q's embeddings as an index. Large context can make use of in-context learning to make better Q&A.