Does anyone know of a tool that would input a list of strings and output a list of regexes that match the strings? An intelligent regex pattern matcher ? If not, what do you think about building something like this?
There is this insane email validating RegEx [0]. The page says:
> I did not write this regular expression by hand. It is generated by the Perl module by concatenating a simpler set of regular expressions that relate directly to the grammar defined in the RFC.
There's also the famous xkcd Regex Golf [1].
Peter Norvig writes:
>So that got me thinking: can I come up with an algorithm to find a short regex that matches the winners and not the losers?
And he described his steps to create a RegEx using a list of words that must be matched and those that must not be matched [2]
If how to create these giant matchstick regexes interest you,
there is a wonderful(famous?) perl script generating a regex 6,598 chars long, more optimized and faster than earlier attempt at 4,724 bytes,
in Jeffrey Friedl's book, Mastering Regular Expressions, 1st edition, Oreilly, pp 312-316, Appedix B: Email Regex Program.
Well, this is somewhat vague. There's more than one way of matching any string - you'd have to be more specific about what exact form you want your regexes to have.
True. I was thinking about a tool that would give you a list of regexes, ranked by some factor that aims to get at the regex pattern that'll be useful. For example, if I gave it, [[email protected], [email protected]] it would give all kinds of regex patterns, but ideally point to [a-z]@[a-z].com
I think you need to consider more specifically what you want. Another possible regex would be
/name2?@(?:som)?e(mail|thing)\.com/ or even /(name@email\.com|[email protected])/
> I did not write this regular expression by hand. It is generated by the Perl module by concatenating a simpler set of regular expressions that relate directly to the grammar defined in the RFC.
There's also the famous xkcd Regex Golf [1]. Peter Norvig writes:
>So that got me thinking: can I come up with an algorithm to find a short regex that matches the winners and not the losers?
And he described his steps to create a RegEx using a list of words that must be matched and those that must not be matched [2]
[0]: http://www.ex-parrot.com/~pdw/Mail-RFC822-Address.html
[1]: https://xkcd.com/1313/
[2]: http://nbviewer.ipython.org/url/norvig.com/ipython/xkcd1313....