Weirdly, I've "known" this since I started writing Perl in the mid-'90. Not sure where I originally read it (or was told it). Funny how that works.
I try to write my regexes such that they anchor at the front of the strong or the back, or they describe the whole string; never an either-or anchoring type situation like this example.
Spaces at beginning of string (100,000 iterations):
Rate onestep twostep
onestep 62500/s -- -2%
twostep 63694/s 2% --
real 0m3.093s
user 0m3.066s
sys 0m0.018s
Spaces at end of string (100,000 iterations):
Rate twostep onestep
twostep 55249/s -- -9%
onestep 60976/s 10% --
real 0m3.453s
user 0m3.421s
sys 0m0.022s
Spaces in middle of string (only 500 iterations because I don't want to sit here for four hours):
Rate onestep twostep
onestep 7.11/s -- -100%
twostep 16667/s 234333% --
real 1m10.741s
user 1m10.207s
sys 0m0.228s
You can see in my other post that this doesn't always work. For example, Python's regex engine chokes on `\s+$` if you use `re.search`, but works fine if you use `re.match`.
It's likely that Perl has an optimization for `\s+$` but not `^\s+|\s+$` (the former regex only ever matches at the end of the string, which is amenable to optimizations).
I did see your other post, and upvoted it. This rule of thumb has served me well between different regex dialects and implementations, but it's not surprising that there are some specific cases that are "broken" for lack of a better word.
I haven't done much Python but the documentation for re.search() and re.match() is very clear: use search to find an expression anywhere in a string, use match to find an expression at the beginning of a string. It appears to ignore anchors in both cases? Left undetermined then is how to anchor an expression at the end of a string. You say re.match() works, but this is pretty confusingly described, and it's easy to see how the ambiguity can lead to problems for even the most experienced programmers.
Anchors aren't ignored. For re.match, `^abc$` is equivalent to `abc`, so the anchors are just redundant. (N.B. `^^abc$$` will match the same set of strings as `^abc$`.) For re.search, `^abc$` is equivalent to `re.match(^abc$)` but `abc` is equivalent to `re.match(.STAR?abc.STAR?)`.
But yes, this is a very subtle semantic and different regex engines handle it differently. My favorite approach is to always use `.STAR?theregex.STAR?` and don't provide any additional methods. Instead, if the user wants to match from the beginning to the end, they just need to use the well known anchors. (Go's regexp package does this, and Rust borrowed that API choice as well.)
I try to write my regexes such that they anchor at the front of the strong or the back, or they describe the whole string; never an either-or anchoring type situation like this example.
Spaces at beginning of string (100,000 iterations):
Spaces at end of string (100,000 iterations): Spaces in middle of string (only 500 iterations because I don't want to sit here for four hours):