Hacker News new | past | comments | ask | show | jobs | submit login

Unfortunately, there is a hint as to how this has happened:

"This strategy is no longer practical: users have come to rely on backreferences for at least occasional use, and backreferences are part of the POSIX standard for regular expressions."

What better excuse is there for a poor implementation than standards compliance? In many ways, using regex with backtracking by default is like programming in Lisp without tail-call optimizations. If the POSIX standard is going to require backreferences, then it should also require a non-backtracking implementation for regular expressions without backreferences, just like the Scheme specification requires implementations to support tail-call optimization.

The comparison is valid because they both can create a quadratic runtime from a linear time algorithm.




> If the POSIX standard is going to require backreferences, then it should also require a non-backtracking implementation for regular expressions without backreferences

At least in some regexp engines, this is possible. There is a concept of possessive (as opposed to reluctant, or as it's often called, "non-greedy") quantifiers. Given the string "abb", consider the following regexes:

/[ab]+b/ -- matches "abb"

/[ab]+?b/ -- matches "ab"

/[ab]++b/ -- DOES NOT MATCH!! (Because there is no back-tracking.)

More info: http://www.regular-expressions.info/possessive.html

One regexp engine which implements this (see section 4): https://raw.githubusercontent.com/k-takata/Onigmo/master/doc... -- this is the engine used by modern Ruby versions.


Yes, I recall that he talks about the multi-engine approach with respect to both PCRE and RE2 somewhere in these articles:

https://swtch.com/~rsc/regexp/

Sorry I don't have the exact reference handy. Apparently the author of PCRE tried to implement an NFA/DFA engine for some regexes as well. And I think RE2 might also have different strategies for different regexes as well, but I forget the details (it might be related to capturing).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: