It doesn't actually take that many lines of code to implement a linear time NFA ...

mamcx · on July 23, 2016

if is that "simple" why then are few? Or is because I don't know where to look at?

Maybe regex are just enough and few bother to have something faster?

I have found some start code at https://t0yv0.blogspot.com/2011/02/home-made-regular-express.... The thing is that I don't know how much else is necessary to have a well made library...

cyphar · on July 21, 2016

I'd look at GNU sed if you wanted a very featureful substitution engine that IIRC is also linear time.

chubot · on July 21, 2016

Hm it seems to be a copy of the GNU libc regex engine. And coreutils, grep, and awk also package the same engine in user space! That's really annoying.

But yes it looks like it uses a struct re_dfa_t for matching, and some special flags for backreferences? That seems to indicate that they are using linear time maybe with a fallback? Hm.

I think the general turn of events was that libc supported BRE at first, which was implemented using the DFA algorithm. Then Perl implemented a backtracking engine and Perl syntax for REs, and Python/Ruby/etc. wanted to be compatible with Perl, so they also implemented backtracking as well. Because Perl has other features like forward and back assertions that require backtracking.

And then perhaps libc got EREs with backreferences which they bolted onto the existing infrastructure?

Anyway, thanks for the pointer... I may look into these more in the future.

musl libc also uses the TRE regex engine, which apparently uses the DFA algorithm.