Hacker News new | past | comments | ask | show | jobs | submit login

This has been my experience with Japanese texts as well. I have a number of fairly obscure Japanese books and magazines I’ve collected as part of a research interest. During the pandemic, I began digitizing them and found that nothing but Google OCR could extract the text correctly. I recently tried again with the libraries you mentioned, but they also performed worse than traditional tools.



Good to know :3

I'm currently planning to develop a tool to correct Arabic outputs for ASR and OCR. It will function like spell-correction but with a focus specifically on these two areas. Perhaps you could start something similar for Japanese? English (and Latin languages in general) perform at a different level across multiple tasks, to be honest...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: