Hacker News new | past | comments | ask | show | jobs | submit login

As far as I'm concerned, all of these specialty services are dead compared to a generalized LLM like OpenAI or Gemini.

I wrote in a previous post about how NLP services were dead because of LLMs and obviously people in NLP took great offense to that. But I was able to use the NLP abilities of an LLM without needing to know anything about the intricacies of NLP or any APIs and it worked great. This post on OCR pretty much shows exactly what I meant. Gemini does OCR almost as good as OmniAI (granted I've never heard of it), but at 1/10th the cost. OpenAI will only get better very quickly. Kudos to OmniAI for releasing honest data, though.

Sure you might get an additional 5% accuracy from OmniAI vs Gemini but a generalized LLM can do so much more than just OCR. I've been playing with OpenAI this entire weekend and literally the sky's the limit. Not only can you OCR images, you can ask the LLM to summarize it, transform it into HTML, classify it, give a rating based on whatever parameters you want, get a lexile score, all in a single API call. Plus it will even spit out the code to do all of the above for you to use as well. And if it doesn't do what you need it to do right now, it will pretty soon.

I think the future of AI is going to be pretty bleak for everyone except the extremely big players that can afford to invest hundreds of billions of dollars. I also think there's going to be a real battle of copyright in less than 5 years which will also favor the big rich players as well.




5% accuracy can be worth a lot.

The price of any of these services pales in comparison to getting a human involved in any fraction of cases.

It is likely reasonable to expect the base LMs to keep getting better and for there to not be a moat on accuracy in the long term, but businesses are not just built on benchmark accuracy and have plenty of other ways to survive, even if the technology under the hood changes.


YES

>>5% accuracy can be worth a lot.

Most surprising to me about these results is the BEST error rate was over 8% errors (91.7% accuracy) and the worse was 40%.

Their method of calculating errors seems quite good:

>> Accuracy is measured by comparing the JSON output from the OCR/Extraction to the ground truth JSON. We calculate the number of JSON differences divided by the total number fields in the ground truth JSON. We believe this calculation method lines up most closely with a real world expectation of accuracy.

>> Ex: if you are tasked with extracting 31 values from a document, and make 4 mistakes, that results in an 87% accuracy.

Especially where dealing with numbers and money, having 10% of them being wrong seems unusable, often worse than doing nothing.

Having humans check the results instead of doing the transcriptions would be better, but humans are notoriously bad at maintaining vigilance doing the same task over many documents.

What would be interesting is finding which two OCR/AI systems make the most different mistakes and running documents against both. Flagging only the disagreements for human verification would reduce the task substantially.


> What would be interesting is finding which two OCR/AI systems make the most different mistakes and running documents against both. Flagging only the disagreements for human verification would reduce the task substantially.

There have been OCR products that do that for decades, and I would hope all the ocr startups are doing the same already. Often times something is objectively difficult to read and the various models will all fail in the same place, reducing the expected utility of this method. It still helps of course. I forget the name of the product, there was one that used about 5 ocr engines and would use consensus to optimize its output. It could never beat ABBYY finereader though, it was a distant second place.


I think 87% to 92% accuracy really isn't much difference. You're still going to get errors to the point where the level and amount of checking you need to do isn't affected. Even at 98-99% you still have to do a lot of error checking.

But you get most of the bang for the buck for 1/10th the cost so I think overall it's far, far superior.


Wouldn't an issue be that whilst for LLM's replacing NLP's you don't often care about the super rare hickup or hallucination.

Whilst where OCR's tend to be used it's often a no go.... Just saying this trying to remember all the places where I've implemented it or seen it implemented. A common one was billing stuff.




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: