"real improvements came from adjusting the prompts to make things clearer for the model, and not asking the model to do too much in a single pass"
This is spot on, and it's the same as how humans behave. If you give a human too many instructions at once, they won't follow all of them accurately.
I spend a lot of time thinking about LLMs + documents, and in my opinion, as the models get better, OCR is soon going to be a fully solved problem. The challenge then becomes explaining the ambiguity and intricacies of complex documents to AI models in an effective way, less so about the OCR capabilities itself.
disclaimer: I run a LLM document processing company called Extend (https://www.extend.app/).
Extend looks great - and your real estate play is very interesting. I’ve been playing around extracting key terms from residential leasehold (condominium-type) agreements. Interested to know if you’re doing this sort of thing?
This is spot on, and it's the same as how humans behave. If you give a human too many instructions at once, they won't follow all of them accurately.
I spend a lot of time thinking about LLMs + documents, and in my opinion, as the models get better, OCR is soon going to be a fully solved problem. The challenge then becomes explaining the ambiguity and intricacies of complex documents to AI models in an effective way, less so about the OCR capabilities itself.
disclaimer: I run a LLM document processing company called Extend (https://www.extend.app/).