Hacker News new | past | comments | ask | show | jobs | submit login

It doesn't actually take that many lines of code to implement a linear time NFA engine. Most of the code is actually in the regex compiler. That is, there are only a few actual "instructions" or node types in a regex engine (alternation, concatenation, etc.). The rest is just compiling the bizarre syntax to a those nodes/instructions. (And dealing with Unicode if you need that.)

The whole awk implementation is 958 lines, so it can't be that bad. It supports a fairly featureful regex language (though no capturing AFAICT).




if is that "simple" why then are few? Or is because I don't know where to look at?

Maybe regex are just enough and few bother to have something faster?

I have found some start code at https://t0yv0.blogspot.com/2011/02/home-made-regular-express.... The thing is that I don't know how much else is necessary to have a well made library...


I'd look at GNU sed if you wanted a very featureful substitution engine that IIRC is also linear time.


Hm it seems to be a copy of the GNU libc regex engine. And coreutils, grep, and awk also package the same engine in user space! That's really annoying.

But yes it looks like it uses a struct re_dfa_t for matching, and some special flags for backreferences? That seems to indicate that they are using linear time maybe with a fallback? Hm.

I think the general turn of events was that libc supported BRE at first, which was implemented using the DFA algorithm. Then Perl implemented a backtracking engine and Perl syntax for REs, and Python/Ruby/etc. wanted to be compatible with Perl, so they also implemented backtracking as well. Because Perl has other features like forward and back assertions that require backtracking.

And then perhaps libc got EREs with backreferences which they bolted onto the existing infrastructure?

Anyway, thanks for the pointer... I may look into these more in the future.

musl libc also uses the TRE regex engine, which apparently uses the DFA algorithm.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: