Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: List of words – List of regexes tool?
4 points by e7mac on Dec 21, 2014 | hide | past | favorite | 5 comments
Does anyone know of a tool that would input a list of strings and output a list of regexes that match the strings? An intelligent regex pattern matcher ? If not, what do you think about building something like this?



There is this insane email validating RegEx [0]. The page says:

> I did not write this regular expression by hand. It is generated by the Perl module by concatenating a simpler set of regular expressions that relate directly to the grammar defined in the RFC.

There's also the famous xkcd Regex Golf [1]. Peter Norvig writes:

>So that got me thinking: can I come up with an algorithm to find a short regex that matches the winners and not the losers?

And he described his steps to create a RegEx using a list of words that must be matched and those that must not be matched [2]

[0]: http://www.ex-parrot.com/~pdw/Mail-RFC822-Address.html

[1]: https://xkcd.com/1313/

[2]: http://nbviewer.ipython.org/url/norvig.com/ipython/xkcd1313....


If how to create these giant matchstick regexes interest you, there is a wonderful(famous?) perl script generating a regex 6,598 chars long, more optimized and faster than earlier attempt at 4,724 bytes, in Jeffrey Friedl's book, Mastering Regular Expressions, 1st edition, Oreilly, pp 312-316, Appedix B: Email Regex Program.


Well, this is somewhat vague. There's more than one way of matching any string - you'd have to be more specific about what exact form you want your regexes to have.


True. I was thinking about a tool that would give you a list of regexes, ranked by some factor that aims to get at the regex pattern that'll be useful. For example, if I gave it, [[email protected], [email protected]] it would give all kinds of regex patterns, but ideally point to [a-z]@[a-z].com


I think you need to consider more specifically what you want. Another possible regex would be /name2?@(?:som)?e(mail|thing)\.com/ or even /(name@email\.com|[email protected])/

I was able to find this: https://github.com/noprompt/frak




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: