Not that I know of, and in fact real-live HTML emails often contain complete gibberish (made-up tags, etc.) that doesn't even validate; we have a bunch of fix-up rules for this stuff.
CSS is a complicating factor as well: some CSS is unsafe and must be stripped, which requires a real (error-tolerant) CSS parser to go along with your real HTML parser.
And identifying remote images: you'd think that was easy right? But do you know how many ways there are to reference a .png image in HTML and CSS?
At one point Facebook emails were even hiding web bugs in BGSOUND tags -- sneaky!
Huh. I would think that the only ones benefiting from the current state of affairs are scammers and spammers, strange that the big players haven't gotten together and done something about it.
CSS is a complicating factor as well: some CSS is unsafe and must be stripped, which requires a real (error-tolerant) CSS parser to go along with your real HTML parser.
And identifying remote images: you'd think that was easy right? But do you know how many ways there are to reference a .png image in HTML and CSS?
At one point Facebook emails were even hiding web bugs in BGSOUND tags -- sneaky!