Question for ya if you don't mind: I had to do some PDF scraping a while back as...

4pkjai · on Dec 13, 2022

Oh yes I ran into this issue many many times. The way I dealt with it is a bit insane. I classify bank statements using images or text on the first page. Then I run custom code for that document type.

I also have a "pretty good" fallback algorithm if the statement cannot be classified.

shyn3 · on Dec 13, 2022

Usually banks have a template so the edge cases aren't so edgy. Had to do this with Canadian banks and each one had their own template, but once you parsed it, generally, it worked until they updated their template again.

4pkjai · on Dec 13, 2022

True, Canadian banks are quite nice to work with. US, Indian and South African banks are hell!