Is the PDF format itself broken, or just the awful Adobe Reader? There are dozen...

ajross · on March 10, 2016

Early PDF was quite sane. It was the Postscript imaging model turned into a binary bytecode format with almost all the programmability features removed.

Later on it got wonky (though never even close to the extent to which Flash did!) with all the hypertextification features. But basic PDF is actually one of the Great File Formats in computer history.

wlesieutre · on March 11, 2016

Hypertextification features? Ha!

Try 3D model viewer: https://youtu.be/n8KgxaNYRe4?t=27

icebraining · on March 11, 2016

The sane version is the one defined as the PDF/A ISO standard. Stuff like pulling remote resources, embedding executable code, etc are all forbidden.

https://en.wikipedia.org/wiki/PDF/A

ajross · on March 11, 2016

I didn't realize that this standard existed. Thanks for the link, that's very helpful to know. I've always viewed "modern PDF" as an ad hoc thing defined by the intersection of whatever was supported by the popular free renderers.

jfoutz · on March 11, 2016

the javascript stuff made me nuts when i was working on a save as pdf project.

lqdc13 · on March 10, 2016

The standard is 1000 pages long. Most reader implementations are written in C/C++.

They are of course exploitable in different ways.

Adobe sometimes does not follow its own spec.

People publishing PDFs sometimes use that non-standard behavior to display some graphics. This is especially true with many research papers that only render on Adobe Reader.

cozzyd · on March 11, 2016

In particular, other viewers often display zero-width lines, which is annoying for colormaps. Those can't safely be saved as bitmaps without oversampling either, as not all viewers can be made to avoid interpolating.

wepple · on March 11, 2016

The PDF format is unbelievably complex, far more than is necessary for the average sales brochure or report.

Given that nearly all reader implementations are written in C/C++, it's always going to be an easy target. Sandboxing hash helped a lot, but there's just a lot to go wrong and always will be.