Why is it that in recent years, with this and the more recent ls quoting fiasco, maintainers of longstanding UNIX utilities suddenly got the urge to fix what isn't broken?
Actually I recently found that coreutils and ls behave fairly well with funny filenames:
Here is an invalid utf-8 byte and then a valid utf-8 sequence
$ x=$'\xce\xce\xbc'
$ touch "$x"
You can list it:
$ ls
?μ
And here 'ls' does better than other tools that display filenames. It shows the invalid byte and then keeps decoding with error recovery:
$ ls --escape
\316μ
However GNU stat (which I think is also in coreutils) does something similar, but weirdly messed up:
$ stat *
File: ''$'\316''μ'
(it looks like it's outputting a valid shell string, except with extra quotes)
-----
Most command line tools are not aware of stuff like this. For example you can touch "x$ANSI_TERMINAL_CODES" and if you do "bash x??" or "python x??", then your terminal will change color because of the escape codes printed back to the terminal.
I just changed Oil to use a well-defined format I called QSN (quoted string notation):
It adapts Rust's string literal syntax to express arbitrary byte strings precisely and losslessly. (JSON can't express arbitrary byte strings.)
The QSN encoder does UTF-8 decoding with a specific error recovery mechanism. So it's basically like what ls and stat do, but it's more precise.
(If anyone is interested in QSN, please contact me. I think it's more generally useful in a lot of places. It's something we already do but it's precise like JSON.)
Not broken at all. I really don’t understand the hate for this change.
I deal with a lot of filenames with spaces and think this change is a great improvement for listing such files. With this change it’s much easier to see where one filename ends and the other begins. Before this change, I had to use the `-1` option to ensure that each filename was listed on a line by itself. Now the filename listings are much more readable and it takes less cognitive effort to take it all in.
The way it handles filenames with ASCII apostrophes/single quotes works particularly well (wraps the filename in double quotes instead of single quotes) and makes it very easy to copy and paste filenames to and from the terminal.
Best of all, this change only applies when standard output is a TTY device so this does not break any shell scripts (even though parsing `ls` is a bad idea in any case) and is still compliant with the POSIX specification[1] which states that “If the output is to a terminal, the format is implementation-defined”.
The old behavior was the bad one; the new behavior is good. And that link explains why very well.
I guess I should have figured that oblique references to "ls quoting fiasco" is shorthand for "I don't understand what's wrong but I'm angry about it..."
(On the other hand I would say the grep -a issue is bad both before and after because either way it relies on autodetection. The fundamental issue there is that there is too much variance in encodings, which isn't easy to fix. Luckily UTF-8 is growing in popularity, and it doesn't have this issue because it doesn't require metadata for extremely common operations like "find ascii substring".)
”The old behavior was the bad one; the new behavior is good”
If you’re a human, yes. If you’re a script, it breaks you in half. If you’re a script that has to run on various versions, then maybe it’s time fix yourself and use find. You’re a sophisticated script after all, not one of these who require a human with a debugger. Modern culture may not appreciate that little ‘compat’ thing, but it is essential if you want something to continue to work and not just stop and wait for someone’s educated guesses. Good software doesn’t point fingers at you, it just works. I remember how recently I wanted to check network interfaces on some machine and commanded ‘ifconfig’. Now it’s called ‘ip a’, and there is no ifconfig. I can guess the reason – ifconfig was bad and ip is good. There is also an eternal “FAT” label issue in unetbootin app, which resurrects every time Apple changes its fdisk output format (in every release, as it seems). The workaround is to run it with a cli option – a very thing that unetbootin was created for to skip. This is what makes systems so much fun. Without all these cool things, we would just sit there and cry over our uselessness.
ed: I read below that ls does that in interactive mode only, maybe it’s not that bad then.
You can't just expect compatibility about random things - that's why we have formal contracts: standards, specifications, documentation. A human-readable output of any app, in particular, should never be assumed to be stable, or to have a specific format (even if observations imply it), unless its docs specifically say otherwise.
It would seem that calling ls with the -N switch would disable the quoting. Might be easier to just have ls aliased to "ls -N" rather than recompiling. Unless the -N switch also does something else you don't like.
the point is that this breaks an unbelievable amount of already deployed scripts. The new functionality should be optional, and accessed via switches and aliases if you like it.
this was very much a change only a few people liked, that they decided to force down literally everyone else's throats. it's very poor stewardship.
Okay. The GNU page says the quoting is only done when the output is a terminal, so it shouldn't generally affect scripts. Although considering that the world of unix shells is fairly complex, I wouldn't be surprised if there was some kind of a weird but explainable situation where some kind of a script would still break, so I suppose it could be a problem somewhere.
There was a change in ls which caused it to quote file names with spaces or special characters instead of (I think?) escaping them. Some people were upset because they considered this a breaking change. I'll just add that ls only does this if it's running interactively so I wouldn't expect it to break scripts.