How to segment the document without LLM? I prefer to do all of this in 1 step wi...

jszymborski · 2024-08-10T10:38:50 1723286330

Segmenting can likely be done on a really small resolution and with a CNN, making it real short.

There are some heuristic ways of doing it but i doubt you'll be able to distinguish equations from text.

troysk · 2024-08-12T09:35:02 1723455302

Segmenting at lower resolution and then using them at higher resolution using resolution multipliers don't work as other items bleed in. FastSAM paper has some interesting ideas on doing this with CNNs which I guess SAM2 have superseded. However, the complication in the pipeline is not worth the result as I find vision LLMs are able to do almost the same task within the same OCR prompt.

wahnfrieden · 2024-08-10T16:58:51 1723309131

Apple APIs such as Live Text, subject identification, Vision. Run them on a server, too