Ask HN: List of words – List of regexes tool?

_jomo · on Dec 21, 2014

There is this insane email validating RegEx [0]. The page says:

> I did not write this regular expression by hand. It is generated by the Perl module by concatenating a simpler set of regular expressions that relate directly to the grammar defined in the RFC.

There's also the famous xkcd Regex Golf [1]. Peter Norvig writes:

>So that got me thinking: can I come up with an algorithm to find a short regex that matches the winners and not the losers?

And he described his steps to create a RegEx using a list of words that must be matched and those that must not be matched [2]

[0]: http://www.ex-parrot.com/~pdw/Mail-RFC822-Address.html

[1]: https://xkcd.com/1313/

[2]: http://nbviewer.ipython.org/url/norvig.com/ipython/xkcd1313....

e3pi · on Dec 21, 2014

If how to create these giant matchstick regexes interest you, there is a wonderful(famous?) perl script generating a regex 6,598 chars long, more optimized and faster than earlier attempt at 4,724 bytes, in Jeffrey Friedl's book, Mastering Regular Expressions, 1st edition, Oreilly, pp 312-316, Appedix B: Email Regex Program.

alansammarone · on Dec 21, 2014

Well, this is somewhat vague. There's more than one way of matching any string - you'd have to be more specific about what exact form you want your regexes to have.

e7mac · on Dec 21, 2014

True. I was thinking about a tool that would give you a list of regexes, ranked by some factor that aims to get at the regex pattern that'll be useful. For example, if I gave it, [[email protected], [email protected]] it would give all kinds of regex patterns, but ideally point to [a-z]@[a-z].com

logn · on Dec 22, 2014

I think you need to consider more specifically what you want. Another possible regex would be /name2?@(?:som)?e(mail|thing)\.com/ or even /(name@email\.com|[email protected])/

I was able to find this: https://github.com/noprompt/frak