There was a time in Powershell’s life when stdout redirection to a file (>) would write UTF-16, then that changed to UTF-8 with BOM and now it’s UTF-8 no BOM. During those times some PS versions would allow to change this globally, some would not. Even if you could change it globally, the names for the same encodings have changed.
PowerShell 5.1 “UTF8” actually meant “UTF-8 with BOM”
PowerShell 7 “UTF8” meant no BOM.
If you read a file in PowerShell that does not have a BOM it decodes it as ANSI. Yes, ANSI, not Ascii. Which is a hilarious choice because you can’t even write an ANSI file in PowerShell. The closest encoding you can use is “Default” which will use the system default but your systems default might not be ANSI.
PowerShell is a hot mess. Go read the official docs for character encoding[1] if you don’t believe me.
I have CI scripts that have to use .NET calls[2] in PowerShell to write ssh keys to disk because it’s impossible to write utf-8 no BOM (or ascii) with out carriage returns (\r) being inserted before every new line (\n) even when the incoming string didn’t have them.
File encoding is just ONE of the reasons I hate PowerShell and why it’s so obviously clear people who were not qualified to design a shell, designed a shell.
I'm not sure if we should blame the designers of PowerShell (Jeffrey Snover and his team). If you followed the history of their project, it seemed like they had very good ideas, but it was very hard to get buy-in for anything CLI-oriented in Longhorn/Vista-era Microsoft[1]. They've all had experience with Unix shell and their ideas for object based pipes were truly innovative as far as I know.
I can't speak for Jeffrey and his team, but I feel like a lot of their decisions came from trying to get corporate behind the shell and present it as a shell for Windows. They avoided picking political battles outside of their main goals (a modern shell to replace cmd.exe and the object-based pipeline model). What we've got are a set of decisions that aligned with Windows and Microsoft practices of that day and age:
Microsoft is focusing on .Net as their general platform? We'll implement PowerShell on .Net.
Windows's standard for Unicode text files is either UTF-16 LE or having a BOM for any other Unicode transformation? We'll do UTF-16 by default and always add a BOM if you choose UTF-8.
Windows is using CRLF? Well, we'd pipe CRLF-delimtied text by default.
Visual Basic and C# programmers expect functions to have names like "GetChildItem" instead of something like "dir" or "ls"? No problem, we will set the canonical command name to be a long, programming-language-like name and set up aliases that look (but don't behave) like Unix and cmd.exe commands.
The result was not pretty, but I still appreciate the ideas we got from PowerShell. nushell took these ideas and implemented them in a more modern way.
I work regularly in PowerShell and this bugged tf out of me. The result of my digging was: .NET libraries emit textual output in <whatever the offending encoding was, UTF-16 LE or something> -- and since PowerShell is implemented in .NET, it necessarily inherited that default.
Things have improved since then now that everything is implemented in .NET (Core), so if you're working with PoSH 7.x, this is no longer a headache. (I just tested in 5.1, and it seems fixed now, as well.)
Of course, by now, I've just developed muscle memory to add '-enc UTF8' to the end of everything that writes text to disk.
> File encoding is just ONE of the reasons I hate PowerShell and why it’s so obviously clear people who were not qualified to design a shell, designed a shell.
I find that unfair towards their creators. They invented a truly innovational shell, whereas everybody else continued the text-based approach. The object-oriented approach is so much simpler when it comes to process results of Get-ChildItem (ls) or Get-Process (ps) when you get objects with properties, instead of just text. Especially when in text-based shells it matters how you call e.g., ps (aux or -efH or whatever).
Now, I find your statement unfair, because in Linux world, you just create a new shell and whoever wants to use it, can.
At Microsoft, a successor to cmd.exe had to be shipped with Microsoft, otherwise it would never have been adopted. Most big companies would never allow a third-party open-source shell on their Windows servers. Therefore, you must navigate a big ocean of politics and powers, guarantee nackwards-compatibility and meet expectations of thousands of companies. This inevitably leads to behaviour like encoding that is frustrating to use. Until you read the docs, which state quite comprehensively what you have to expect - as you discovered yourself.
It's also worth pointing out that those Microsoft docs differentiate between the two major versions of powershell using the names "Windows Powershell" and "Powershell".
The original project name was "Monad", but once it was nearing release, it got into the hands of the notorious naming prodigies in Microsoft marketing. Then it got changed to Microsoft Shell (MSH) and finally to Windows PowerShell. Great times.
Windows Powershell is up until 5.1 because it runs ONLY on Windows.
Powershell 6 was named Powershell Core due to being built with .NET Core. It was not fully compatible with 5.1 and lacked some functions based on the WinApi.
PowerShell 7 is still built on .NET Core, but has a much better compatibility with 5.1. They dropped Core because nobody cared and it's shorter and works equally well on all platforms.
So Powershell is the v7 and Windows Powershell is the old 5.1. Nowhere is that confusing.
Also worth noting that PowerShell is completely independent from Windows now. Not only can you install it in almost every other platform, since it's .NET you can embed the runtime in any .NET application. It's incredibly powerful.
Not sure what you mean? I've hosted powershell in C# code by referencing Microsoft.Powershell.Sdk - it pulls in the dll at compile time and runs on whatever .Net runtime that is installed and matches the project target.
Wait - you might just have helped me to fix my Invoke-WebRequest testing 3rd party api we have to integrate with. Unexpected character encountered while parsing error line 0 position 0 seems like there is a BOM I wasn't expecting to send in my request and receiver definetly was not expecting to receive it.
Edit: yup, just did small cmd line client and all works. Thanks.
UTF8 with bom is a terrible idea. “Why does the name of the first column in my csv file not match?” is a question I answer at least once a month. It’s as if they’re trying to EEE plain text.
Hey let's not pretend like it's better in the world of Linux/Unix/etc. Now that is one hot mess of everything, just as much as Powershell. It's just those have things we're used to, and kinda learned to (mostly) abstract away without the presence of an over-bearing behemoth such as Microsoft which makes such a thing almost impossible.
As a comparison (to most of us semi-knowledgeable windows / powershell users), ask any semi-knowledgeable Linux user about the difference between Ksh/Bash (or that weird one that Ubuntu pushed and aliased to one of the above for extra confusion). Or ask them if any of them remember how tf to use SED or AWK without looking up the docs because those damn things are made by satan in the depths of hell. Or how to use parameters inside shell scripts, and the various incantations they have to slice and dice the params.
PowerShell 5.1 “UTF8” actually meant “UTF-8 with BOM”
PowerShell 7 “UTF8” meant no BOM.
If you read a file in PowerShell that does not have a BOM it decodes it as ANSI. Yes, ANSI, not Ascii. Which is a hilarious choice because you can’t even write an ANSI file in PowerShell. The closest encoding you can use is “Default” which will use the system default but your systems default might not be ANSI.
PowerShell is a hot mess. Go read the official docs for character encoding[1] if you don’t believe me.
I have CI scripts that have to use .NET calls[2] in PowerShell to write ssh keys to disk because it’s impossible to write utf-8 no BOM (or ascii) with out carriage returns (\r) being inserted before every new line (\n) even when the incoming string didn’t have them.
File encoding is just ONE of the reasons I hate PowerShell and why it’s so obviously clear people who were not qualified to design a shell, designed a shell.
[1] https://learn.microsoft.com/en-us/powershell/module/microsof...
[2] https://stackoverflow.com/questions/5596982/using-powershell...