Hacker News new | past | comments | ask | show | jobs | submit login

Vision transformers are good enough that you can use them alone even on cursive handwriting. I've had amazing results with Microsoft's models and have my own little piece of wrapper software I use to transcribe blog posts I write in my notebook.



I'd like to hear more about this! I keep coming back to trying to OCR my journals, but nothing I've tried so far works well (enough) on handwriting.


A couple of other people in the thread are using it too apparently. They're the Microsoft TROCR models. You do need a moderate amount of software to deskew, process, and segment the image before handing it to the model but after that it's typically extremely accurate in my experience.

Setting up my software online and monetizing it is next in the queue after my current side project. Although I haven't checked the model licenses.


Have you tried uploading image of your handwriting to ChatGPT interface with ChatGPT 4o?

And what the results were? And if not could you try and let us know what the results are.


Not with 4o, but I tried it with 4 (through Copilot) a while ago and the results were abysmal, even with very neatly printed handwriting.


Try again with 4o through the ChatGPT interface. Since I am getting very good results. I don't think gpt 4 was multimodal like gpt4o so must have used some other methodology?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: