Hacker News new | past | comments | ask | show | jobs | submit login

My experience is that at least the models which are price-competitive (~= open weight and small enough to run on a 3/4090 - MiniCPM-V, Phi-3-V, Kosmos-2.5) are not as good as Tesseract or EasyOCR. They're often more accurate on plain text where their language knowledge is useful but on symbols, numbers, and weird formatting they're at best even. Sometimes they go completely off the rails when they see a dashed line or handwriting or an image, things which the conventional OCR tools can ignore or at least recover from.



Did you test the MiniCPM (v2.6) released last week ? It was able to extract (and label) most complex examples I gave it on their huggingface space:

https://huggingface.co/spaces/openbmb/MiniCPM-V-2_6




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: