A couple years ago Evan Phoenix (of rubinius) and I collaborated (by which I mea...

garethadams · on Sept 6, 2012

Sure, I mean `me@yahoo` is an RFC-compliant email address, it refers to a local server 'yahoo'. However in your online app it's almost certainly an error if this email address turns up in a registration form.

Don't test for an RFC-compliant address if you don't want to accept all RFC-compliant addresses. Being able to send an email is a much better test because it matches what you're going to use the email address for in your app.

jph · on Sept 6, 2012

But your way, if the user makes a typo like "foo@@bar,com" then he will expect to receive an email but wont. Better UX in my opinion is to do validation before sending, both client-side and server-side (esp. good for REST JSON APIs), then if these are all good send the welcome email.

drbawb · on Sept 7, 2012

I wouldn't mind using a simple regex or validator to check the e-mail addr for validity.

It wouldn't be RFC-compliant, but it would catch 99% of typos.

Instead of being an error when the e-mail fails validation though, it would say something like: "your e-mail does not appear valid; please double check your entry. You will be sent an activation e-mail; click [Continue] if you're sure the address is valid."

Basically if it fails the "99%" test, then if that fails, let the user decide if their e-mail is in the 1% or not.

saurik · on Sept 6, 2012

I see that the API you feel most people would want is validate_2822_addr (aka validate_addr), which validates an addr-spec (as people should really not be typing angle addresses with display names into forms asking for their e-mail address ;P).

However, that specification, and the implementation you provide, is really designed for parsing e-mail address headers (as you say: for an MTA/MUA), and so contains a bunch of properties specific to structured MIME fields that really has nothing to do with e-mail addresses.

Instead, if you are verifying an e-mail address that someone types into a form, you probably are looking for "the kind of e-mail address that SMTP would accept for delivery", and that is covered by a different standard with a different and unrelated grammar.

Specifically, you implemented RFC 2822, the successor to RFC 822 that has now been obsoleted by RFC 5322, the standard on "Internet Message Format" (in essence, MIME). The related RFC 2821, the successor to RFC 821 that is now obsoleted by RFC 5321, is for SMTP.

For an example of the kinds of differences this would cause, RFC 5322 (with errata) believes that ""@example.com is invalid (by errata), but hello(ignore)@example.com is (MIME comment); RFC 5321, on the other hand, believes the exact opposite validity.

(edit: When I realized that I should probably write a blog post about this, given how much time I've put into implementing this stuff recently, I realized that there was more to say on this general subject, and I'm including it below.)

That said, I will go even further: these formats are designed for escaping e-mail addresses in the context of a larger standard and protocol, one that might already have special characters. This is why they contain so much quoting support.

This is then why the grammer is often so highly restricted for things that don't need to be quoted: given that an @ cannot be found in a ___domain name, you really shouldn't need to quote anything to the left of the @ to get a valid e-mail address.

However, "(" is a special character in a MIME field (begins a comment), and thereby if you want to include it in the local part of an e-mail address, you will need to escape it somehow; the same is true of things like whitespace, commas, or angle brackets.

The user typing the e-mail address into the form, however, isn't dealing with these restrictions: asking him to escape special characters in his e-mail address seems silly: one might as well be asking people to HTML escape their username in the username field.

That said, there is then a separate RFC 3696 which talks about the semantics of contemporary e-mail addresses and how one might go about validating them, and it includes the idea of quoting in its implementation (so maybe it believes that RFC 5321 is king).

tedunangst · on Sept 7, 2012

Best comment ever. :) I feel that's something that's always missed in these discussions. Users are not entering RFC compliant email header values into your form. Maybe my next web app will make people base64 encode their name, and submit it in =?B?utf-8?..?= format.

jph · on Sept 6, 2012

I'm the author of the Rails wrapper for the big chunk of regex code; you're correct, it is for use cases that are akin to an MUA/MTA.

http://github.com/sixarm/sixarm_ruby_email_address_validatio...

We use a combination of client-side JavaScript validation and server-side validation in Rails. Typical server-side validation is for REST JSON API calls by third-party apps, and also for parsing freeform text fields like "tell your friends about us".

Original code is by Tim Fletcher & Cal Henderson.