Nick explained on Reddit why the regex was used[1]:
> While I can't speak for the original motivation from many moons ago, .Trim() still doesn't trim \u200c. It's useful in most cases, but not the complete strip we need here.
This would have probably been my train of thought (assuming that I consider regex to be a valid solution):
Trim() would have been the correct solution, were it not for that behavior. Substring is therefore the correct solution. Problem is, IndexOf only accepts a char array (not a set of some form, i.e. HashSet). You'd need to write the <Last>IndexOfNonWhitespace methods yourself. Use a regex and make sure that it doesn't backtrace, because it's expressive and regex "is designed to solve this type of problem." The real problem/solution here isn't substring, it's finding where to substring.
I consider regex too dangerous to use in any circumstance, but I can certainly see why someone would find it attractive at first.
Oh totally. I assumed that unicode bs immediately. And anyone would make this mistake easily. That's the point -- gotta have it imprinted in the brains, that regexes are for finding things in files, not for your production code.
I've used them myself, but I'd like to think that when i type that regex in i stop and thing whether i will be feeding raw user inputs into it.
Compressing multiple forms of non-unicode whitespace to single space. Used for cleaning text from input fields that often contains unwanted characters from copy/paste.