> Flash 1.5 accepts whole PDFs just fine. Sometimes models cannot extract the te...

ajcp · 2024-08-11T14:39:51 1723387191

Ah, yes, I've found pre-processing the PDFs to sanitize against things like that has been helpful. That's a whole other process though.

sumedh · 2024-08-12T00:46:07 1723423567

What steps does that involve?

ajcp · 2024-08-12T14:43:19 1723473799

Essentially what you're already doing, with one more step :) Get PDF > convert (read: rebuild) to TIFF > convert to PDF.

In my case all documents to be sent to the LLM (PDFs/Images/emails/etc) are already stagged in a file repository as part of a standard storage process. This entails every document being converted into a TIFF (read: rebuilt cleanly) for storage, and then into PDF upon export. This ensures that all docs are correct and don't maintain whatever went into originally creating them. I've found any number of "PDF" documents are not PDF, while others try and enforce some "protection" that makes the LLM not like the DOCS

sumedh · 2024-08-17T00:30:23 1723854623

Interesting, I will try the TIFF approach for some of the problems Pdfs I have.

Thanks