> Why should we have to OCR something that exists already in a perfectly interch...

caspper69 · 2025-02-11T04:43:24 1739249004

Not everyone does their work in a web browser.

And even still, you don’t have to parse raw markup to grab properties from DOM elements. That could be handled by a browser plugin coupled with some some user guided training.

PDF is another beast entirely. I think there’s already a whole thread about that going on now. I’m going to zip my lips. I’m still waiting on Adobe to return my call from two years ago inquiring about the licensing costs of their parsing library for a small shop. Good thing I wasn’t relying on them to get that project done, and thank goodness for oss.