> My original project had all sorts of complex stuff for detecting hallucinations and incorrect, spurious additions to the text (like "Here is the corrected text" preambles
> asks it to correct OCR errors
So, if I understand correctly, you add some prompt like "fix this text" and then the broken text?
Why don't you do it differently, by not using a chat model but instead a completion model and input the broken OCRd text in the model token by token and then get next token probabilities and then select the token that matches the original document as best as possible, maybe looking 3-5 tokens in advance?
Wouldn't this greatly decrease "hallucinations"?
I'm not trying to insult your approach, I'm just asking for your opinion.
What you describe is a very different approach. It would require orders of magnitude more inference requests, but it would be missing out on all the power and "intelligence" of these new models because they wouldn't have sufficient context to make sensible decisions about what might be wrong or how to fix it. Also, there are not many hallucinations anymore now that these better models are available. But what you describe may work well, I'm not sure.
> asks it to correct OCR errors
So, if I understand correctly, you add some prompt like "fix this text" and then the broken text?
Why don't you do it differently, by not using a chat model but instead a completion model and input the broken OCRd text in the model token by token and then get next token probabilities and then select the token that matches the original document as best as possible, maybe looking 3-5 tokens in advance?
Wouldn't this greatly decrease "hallucinations"?
I'm not trying to insult your approach, I'm just asking for your opinion.