Hacker News new | past | comments | ask | show | jobs | submit login
Some sanity for C and C++ development on Windows (nullprogram.com)
125 points by ingve on Dec 31, 2021 | hide | past | favorite | 137 comments



The title of this article had me do a double-take: C and C++ development on Windows is great. No sanity is needed.

But that's not what the article is about. What the article is about is that the C runtime shim that ships with Visual Studio defaults to using the ANSI API calls without supporting UTF-8, goes on to identify this as "almost certainly political, originally motivated by vendor lock-in" (which it's transparently not), and then talks how Windows makes it impossible to port Unix programs without doing something special.

I half empathize. I'd empathize more if (as the author notes) you couldn't just use MinGW for ports, which has the benefit that you can just use GCC the whole way down and not deal with VC++ differences, but I get that, when porting very small console programs from Unix, this can be annoying. But when it comes to VC++, the accusations of incompetence and whatnot are just odd to me. Microsoft robustly caters to backwards compatibility. This is why the app binaries I wrote for Windows 95 still run on my 2018 laptop. There are heavy trade-offs with that approach which in general have been endlessly debated, one of which is definitely how encodings work, but they're trade-offs. (Just like how Windows won't allow you to delete or move open files by default, which on the one hand often necessitates rebooting on upgrades, and on the other hand avoids entire classes of security issues that the Unix approach has.)

But on the proprietary interface discussion that comes up multiple times in this article? Windows supports file system transactions, supports opting in to a file being accessed by multiple processes rather than advisory opt-out, has different ideas on what's a valid filename than *nix, supports multiple data streams per file, has an entirely different permission model based around ACLs, etc., and that's to say nothing of how the Windows Console is a fundamentally different beast than a terminal. Of course those need APIs different from the Unix-centric C runtime, and it's entirely reasonable that you might need to look at them if you're targeting Windows.


A lot of the article is about string handling, I very much agree with that part of the article, having worked on a lot of legacy code built over decades before the introduction of UTF-8 compatibility.

It gets worse with that old code if you try to share modules between windows and Linux applications.

Additional complications come from trying to support TCHAR to allow either type of char for libraries.

Anyway, I have ended up supporting monstrosities of wstring, string, CString, char *, TCHAR mushed together, constantly marshalled converted back and forth.

And more: https://docs.microsoft.com/en-us/cpp/text/how-to-convert-bet...


Agreed! And TCHAR was the dumbest of all of them. “Not all of our libs/tools/editors support Unicode yet so use TCHAR in the meantime and then one day when our stack supports it fully then you can throw the switch and all your char*’s will be wchar_t*’s and I’m sure that’ll go really well in your codebase.”


> And TCHAR was the dumbest of all of them. “Not all of our libs/tools/editors support Unicode yet so use TCHAR

The reason for TCHAR is not "libs/tools/editors" not supporting Unicode, but the operating system itself. With TCHAR and related types, the same source code could target both Windows 95 and Windows NT, you just have to change a single #define (ok, IIRC there are actually three #defines: UNICODE, _UNICODE, and another one I can't recall at the moment) and recompile.


> The title of this article had me do a double-take: C and C++ development on Windows is great. No sanity is needed.

we certainly have a different opinion on what "great" means. It takes less time to rebuild my whole toolchain from scratch on Linux (~15 minutes) than it takes to MSVC to download those friggin debug symbols it seems to require whenever I have to debug something (I sometimes have to wait 30-40 minutes and I'm on friggin 2GB fiber ! and that's seemingly every time I have to do something with that wretched MSVC !)

Thankfully now the clang / lld / libc++ / lldb ... toolchain works pretty well on Windows and allows a lot more sanity but still, it's pretty slow compared to Linux.


My main gripe with writing C++ on Linux is the dependency management. If you need to do stuff that's not covered by the standard library, like interfacing with GTK or X11 you are in a world of pain. You need to probably install a distro-specific package in a distro-specific way to get the headers/symbols, use some build tool to configure those distro-specific include/so locations, and hope to god that the distro maintainers didn't upgrade one of those packages (in a breaking way) between the source commit and the time of build.

If you suffer through this, you have an exe that works on your version of Ubuntu, maybe on other versions of Ubuntu or possibly other Debian-based distros. If you want it to also work on Fedora, it's back to tinkering.

Tbh i think the only sane-ish way of building to dockerize your build env with baked-in versions.

In contrast, you pick and SDK and compiler version for Windows, and as long as you install those versions, things will work.


Versus no dependency management at all? This reasoning falls apart once you need to use a 3rd party library on Windows. There’s no standard way of sharing such a thing so you always wind up packaging the whole thing with your program, and handing the whole mess to your users.

Granted, writing an RPM is a special kind of hell, but at least you don’t have to package everything with your program. But actually you can still do that - I’ve done that plenty of times in embedded. You can always ship your program with its dependant libraries the way you always have to on Windows. In fact it’s a lot easier because most 3rd party libraries were originally coded on Linux and build more sanely on Linux. And RPATHs are pretty easy to figure out.

Linux gives you options.


Yeah, you're right, but you probably need a lot less stuff that's not in the SDK. My go to solution for including dependencies, is just checking in all the dependency .lib, include files into Git LFS (I think this is a rather common approach from what I've seen on Git). For your typical Linux C/C++ project, unless it's made by best in class C++ devs, building can be a pain, most likely because it depends on very particular lib versions, building a largish project from GitHub for Ubuntu 21.10, where the original dev used 20.04 is usually not possible without source/Makefile tweaks. And I don't particularly love the idea of using root access to install packages to just build some random person's project.

IMHO, C++ dependency management kinda stinks, regardless of platform.


> IMHO, C++ dependency management kinda stinks, regardless of platform.

Indeed, and the fact that it's platform-specific in the first place certainly doesn't help!


Writing an rpm isn’t difficult… is that a commonly held belief? Maybe I don’t know what I don’t know, but I’ve found wrapping my head around deb packaging much harder than rpms.


When I last wrote one I was very new to it and rpm.org (where most of the public docs on the format reside I guess) was down/abandoned. It looks like rpm.org and the docs are back now? I had a bunch of special requirements for the lib I was making and not having docs for the format, especially all the funny macros, was pretty frustrating.


> If you need to do stuff that's not covered by the standard library, like interfacing with GTK or X11 you are in a world of pain [...] If you suffer through this, you have an exe that works on your version of Ubuntu, maybe on other versions of Ubuntu or possibly other Debian-based distros. If you want it to also work on Fedora, it's back to tinkering.

GTK is known to break their ABI across major versions (GTK1->GTK2, GTK2->GTK3, GTK3->GTK4) but as a C ABI it should be compatible between minor versions and everything can be assumed to have GTK2 and GTK3 available anyway. X11 as a protocol has always been backwards compatible and Xlib on Linux pretty much never broke the ABI since the 90s. Here is a screenshot with a toolkit i'm working on now and then running the exact same binary (built with dynamic linking - ie. it uses the system's C and X libraries) in 1997 Red Hat in a VM and 2018 Debian (i took that shot some time ago - btw the brown colors is because the VM runs in 4bit VGA mode and i haven't implemented colormap use - it also looks weird in modern X if you run your server at 30bpp/10bpc mode)[0].

Of course that doesn't mean other libraries wont be broken and what you need to do (at least the easy way to do it) is to build on the oldest version of Linux you plan on supporting so that any references are to those versions (there are ways around that), but you can stick with libraries that do not break their ABI. You can use ABI Laboratory's tracker to check that[1]. For example notice how the 3.x branch of Gtk+ was always compatible[2] (there is only a minor change marked as breaking from 3.4.4 to 3.6.0[3] but if you check the actual report it is because two internal functions - that you shouldn't have been using anyway - were removed).

[0] https://i.imgur.com/YxGNB7h.png

[1] https://abi-laboratory.pro/index.php?view=abi-tracker

[2] https://abi-laboratory.pro/index.php?view=timeline&l=gtk%2B

[3] https://abi-laboratory.pro/index.php?view=objects_report&l=g...


Major versions of Gtk are, for all intents and purposes, different toolkits entirely. They are always parallel-installable, they don't conflict with each other.


Well, except for the part that development stops in previous versions and they do not get any real updates while they "hog" the "Gtk" name so any forks that may want to continue development in a backwards compatible way as if the incompatible change never happened cant really be called "Gtk" without being misleading.


The problems you've described are the reason we have tools like CMake, no? CMake's reusable find modules handle the heavy lifting of coping with the annoying differences between Linux distros, and for that matter other OSs.

> you have an exe that works on your version of Ubuntu

This is indeed a downside of the Linux approach, it's the price we pay for the significant flexibility that distros have, and the consequent differences between them. Windows has remarkably good binary compatibility, but it's a huge engineering burden.

> Tbh i think the only sane-ish way of building to dockerize your build env with baked-in versions.

This is an option, but bundling an entire userland for every application has downsides that the various Linux package-management systems aim to avoid: wasted storage, wasted memory, and less robust protection against inadvertently using insecure unpatched dependencies.


The de-facto standard for dependency discovery is pkg-config. CMake being its own little world with its own finding system is annoying and part of the reason why the ecosystem has not migrated from autocrap to CMake en masse. Thankfully Meson came along, which does everything correctly.


pkgconfig is everything but a standard. It barely works on windows and macOS which are the most common platforms.


About the distro-specific include locations, I try to use pkg-config where possible instead of directly specifying the directories and include flags.


> It takes less time to rebuild my whole toolchain from scratch on Linux (~15 minutes) than it takes to MSVC to download those friggin debug symbols it seems to require whenever I have to debug something

Fun fact: you can do the same in gdb and some distributions (e.g. openSUSE) have it enabled by default. Though you also get the source code too.

I was messing around with DRM/KMS the other day and had some weird issue with an error code, so i placed a breakpoint right before the call - gdb downloaded libdrm and libgbm source code (as well as some other stuff) and let me trace the call right into their code, which was super useful to figure out what was going on (and find a tiny bug in libgbm, which i patched and reported).


Yes, it's a relatively recent innovation, but it's pretty awesome. Symbol server has always been one of the things I actually liked about Windows development, which didn't require installing debug packages for every DLL before the bugs happen and you catch them. https://sourceware.org/elfutils/Debuginfod.html

NixOS has had a similar thing for a while called "DwarfFS" where a FUSE filesystem instead resolves filenames back to the package that needs to be installed, which was around for a while before debuginfod, but very NixOS specific. I'm happy this is now so much more widely available as of recently.


Really? I've never had to wait more than 5 minutes, and only the first time since the symbols are cached. On the other hand, the Visual Studio debugger actually works, even on large and complex multi-process systems like Chromium. My experience debugging C/C++ with GDB/LLDB and any frontend using them has been so poor that I've essentially given up trying them in all but the most desperate circumstances.


I agree that downloading symbols can be oddly slow but you can just turn it off, or only turn it on for specific modules. It can be helpful to have symbols for library code to troubleshoot bugs but typically you only need your own symbols and they are already on your computer with your binaries.


Debug symbols are stored locally with MSVC, and booting into debug mode only takes a few seconds longer than non-debug, sounds like you are doing something wrong.


I just use the LLVM toolchain -- on Windows and Linux. You really can't beat clang and lld, the ecosystem and tooling is fantastic.

If something absolutely requires MSVC ABI compatibility, I use "clang-cl".

God bless LLVM developers.


That is why they are now lagging behind everyone else on C++20 support.


"than it takes to MSVC to download those friggin debug symbols it seems to require whenever I have to debug something"

I think you can just unclick the radio button in debug settings that requires that?


> Thankfully now the clang / lld / libc++ / lldb ... toolchain works pretty well on Windows

Off topic, but I wonder if anyone knows whether it's possible to use rustc with lld, instead of link.exe? I tend not to have Visual C++ on my home systems. Is it as simple as the Cargo equivalent of LD=lld-link.exe?


You would need libraries. E.g. the C runtime and system import libraries (msvcrt.lib, vcruntime.lib, kernel32.lib, etc).


Microsoft is certainly evil and all, but I use MSVC on windows and clang on linux and the Windows tools for my project are much smarter and faster when it comes to compiling. Not counting times when the windows machine decides it has more important priorities to attend to rather than doing what I need.


I couldn’t disagree more with your opinion. Symbol servers and pdbs are a tremendous advantage to C and C++ development on Windows. Are you sure you have equivalent experience with both Windows and non-MSVC toolchains?


> I half empathize. I'd empathize more if (as the author notes) you couldn't just use MinGW for ports, which has the benefit that you can just use GCC the whole way down and not deal with VC++ differences

MinGW GCC doesn't make a difference to the article. It uses the same C runtime library as VC++ and has the same problems with defaulting to ANSI codepages and text-mode streams. In fact, as far as I know, there is no native open-source alternative to the VC++ runtime that isn't a full alternative programming environment like Cygwin.


> how the Windows Console is a fundamentally different beast than a terminal

Yes, and almost entirely in ways that are bad.

I think Microsoft have partially recognised that they're tied to compatibility with a set of choices that have lost the popularity wars and now look wrong. That's why they've produced the two different sorts of WSL, each of which has awkward tradeoffs of its own. And Windows Terminal to replace the console. But eventually I think they may be forced to:

- drop \ for /

- switch CRLF to LF as the default

- provide a pty interface

- provide a C environment that uses UTF-8 by default

It's been weird working with dotnet core and seeing the "cross platform, open source" side of Microsoft, who develop in a totally different style. It's like watching a new ecosystem being built in the ruins of the old.


> drop \ for /

API calls accept / as path separator (and interpret it correctly). Shell is a different beast though.


cmd accepts /, but you need to enclose path into quotes, otherwise it tries to interpret it as an option switch.

So you would also need to rewrite all command line utilities to use something like `ipconfig --all` instead of `ipconfig /all`.


Ah, quoting in cmd is its own kind of hell. Not so fun fact and actually the only part of Windows APIs that I truly hate (and I've worked with many of them): command-line arguments are passed to the program as a single flat string. Splitting into the argc,argv array is left to the CRT startup code.


That comes from compatibility with MS-DOS ways of dealing with arguments.


I just tested in PS and found that it eat's / as well


The PTY side has been covered for a couple of years now with the introduction of ConPTY.


The open source side of ASP.NET Core, the rest of .NET tooling has a different agenda regarding cross platform support.


>C and C++ development on Windows is great. No sanity is needed.

C++ is bearable (except the bloated piece of crap that is Visual Studio). but C is almost nonexistent. For many years their compiler lagged in standardized C features (this only somewhat improved recently) and you cannot use vast majority of system APIs, which use COM interfaces.


You can use COM from C... it's just even more painful.


Exposing a COM object from C is even worse than consuming one because you need to manually implement the polymorphism. I did it from Rust once in a toy project; it was certainly interesting making it work, but I would never want to do that in a production application.


> avoids entire classes of security issues that the Unix approach has.

I wonder what these might be. You mean potential race conditions regarding file operations?


When you open a file in Linux, you hold on to the reference of the underlying inode, which means if you load myLib 1.0, then update to myLib 1.0.1 using apt, then all the previously open programs will be stuck on the old version. This is a security issue at best, since unless you restart, there's no way of making sure nobody uses the old lib anymore, but more frequently a source of crashes and bugs, since myLib 1.0 and 1.0.1 might not be perfectly compatible. If I update Ubuntu, Chromium is almost 100% guaranteed to crash, (since the newly started processes use different lib versions), but I've seen other crashes as well.

In summary, I can't recommend you continue using your Linux machine after an update without a restart, since you are open to an entire category of weird bugs.


there are various tools that will tell you if you have old libraries in use by walking /proc

and the Chrome thing sounds rather strange as I thought Chrome forked processes from a "zygote" (a prototype process) rather than re-exec'ing() the binary (which should retain the handle to the deleted library inodes)

not to mention the shared library naming scheme should prevent this sort of incompatible change from occurring


In Windows you have to restart anyway, so your problem seems to be that you have a choice in Linux?


Both have updates in which you do or don't need to restart to apply them. Both also have methods of hotpatching all the way to the kernel level depending how much it's worth it to someone as well.


Windows handles not holding onto the inode is a major, MAJOR pain in the ass.

I hate the fucking "file is busy, try again?" dialog so fucking much.


>Microsoft robustly caters to backwards compatibility.

This includes bugs too. I ran into an undocumented bug in select(1) IIRC that they couldn't fix since it would break backwards compatibility. I spent like a day trying to figure out why my program wouldn't work correctly on Windows.


> C and C++ development on Windows is great. No sanity is needed.

So, you might as well be insane? :D


> On Windows, C and C++ is so poorly hooked up to operating system interfaces that most portable or mostly-portable software — programs which work perfectly elsewhere — are subtly broken on Windows.

I believe that the problem is about the definition of "elsewhere". The elsewhere is called POSIX. The portability is based on POSIX. So the article can be summed as "Windows is not POSIX compliant".


> The UTF-8 encoding was invented in 1992 and standardized by January 1993. UTF-8 was adopted by the unix world over the following years due to its backwards-compatibility

I'd like to add some historical background the UTF-8 problem mentioned there. UTF-8 was standardized in 1992. But the first Linux using UTF-8 as the default encoding was RedHat in 2002 [1]. 10 years later!

On the other hand, Windows started using Unicode before it was standardized, and back then there was no UTF-8. Or as Raymond Chen put it, "Windows adopted Unicode before the C language did" [2].

[1] https://www.cl.cam.ac.uk/~mgk25/unicode.html#linux

[2] https://devblogs.microsoft.com/oldnewthing/20190830-00/?p=10...


*nix got lucky by always treating filenames mostly as bag of bytes (except for / and NUL), which delegated encoding to applications and typical passthrough from argv to fopen worked fine. Arguably WinNT way is more ambitious and reliable, but I think ignoring UTF8 existence until 2019 was huge mistake.


But the first Linux using UTF-8 as the default encoding was RedHat in 2002

And then Microsoft spent the next 20 years explaining/complaining to everyone (and this includes Chen) why they couldn’t make UTF-8 a code page because it would mean revalidating all of those -A functions for 4-byte characters when they were originally intended for only 3-bytes?


There was a time window of a few months between the invention of UTF-8 and the release of Windows NT with UNICODE support. If the two involved parties would have known of each other (and assuming the Windows devs would have understood what a great deal UTF-8 was), I guess Windows could have used UTF-8 right from the start.


>If the two involved parties would have known of each other [...], I guess Windows could have used UTF-8 right from the start.

Probably not. The author of this thread's article has incorrect history about UTF-8 being widely accepted in 1993. At least 4 different systems independently chose UCS2/UCS4 instead of UTF-8 in the early 1990s:

- Microsoft Windows NT 3.1

- Sun Java in 1995

- Netscape Javascript in 1995

- Python 2.x

Why? Because before 1996, the Unicode Consortium initially thought 16-bit 65k characters was "more than enough" based on the CJK unification first recommended by language experts from Asia. The wikipedia page on Unicode revision history shows the explosion of extra CJK chars in Unicode 3.1 which exceeded count of 65k didn't happen until 2001: https://en.wikipedia.org/wiki/Unicode#Versions

The prevailing standard in 1993 was UCS-2 and not UTF-8. Considering that a huge revision of an operating system like Windows is a multi-year effort, they were really looking at what the standard was circa ~1991. So Rob Pike's napkin idea of UTF-8 in 1992 and then the later presentation at USENIX in January 1993 -- is really not relevant for a July 1993 release of Windows NT 3.1. The author writing in Dec 2021 has hindsight bias but MS/Java/Python just went with Unicode 1.x UCS2 which was logical at that time.


Even by 1999, when Qt 2 was released, the new "Unicode-aware" QString class used UTF-16 internally, not UTF-8. This remains the internal Qt encoding even today (see https://wiki.qt.io/QString#Unicode_methods_in_QString), although the support for UTF-8 has improved substantially, to the point that Qt assumes that 8-bit strings not provided an alternate encoding are UTF-8, that 8-bit filenames can be expressed as UTF-8, and so on (see https://doc.qt.io/qt-6/unicode.html).


In 1991, the decision was way beyond made: https://betawiki.net/wiki/Windows_NT_3.1_October_1991_build


There're several mistakes in the history. Just like any article, it is based on the biases of the author.


Ken Thompson was the one scribbling on a napkin. Rob was the one telling the story later on.


The addition of Unicode support to Windows NT was probably done a great deal in advance of its actual release--it's not the sort of thing you can throw in the last minute.

It's also worth noting that dealing with UTF-8 correctly is more than just declaring "the encoding of this set of characters is UTF-8"--it does require you rethink how string APIs are designed to work well. If the alternative is a 16-bit fixed-with character format, UTF-8's variable-width format doesn't necessarily look like a wiser idea. It's not until 1996, when Unicode moves from 16-bits to 25-bits, that UTF-8 actually looks like a good thing.


They did build a POSIX interface, mostly so they could mislead the DOD into buying NT to meet POSIX needs. https://en.wikipedia.org/wiki/Microsoft_POSIX_subsystem


They didn't mislead. I don't think the DOD mandated full POSIX compliance, and they really only wanted POSIX-style fine-grained ACLs.


Yes. It did. It's just another subsystem over the kernel. Kernel did not care about POSIX. The POSIX subsystem is yet another abstraction layer, not too different from CygWin on MinGW for the use case. Since it was limited, it would not solve the portability problem that the author mentioned.


It also means mixing windows and posix code was impossible. If you chose posix, you were locked in an extremely limited text-only world with no good path to any extras windows could offer.

This architecture made it possible to claim posix compatibility on paper, while making it clear microsoft would make your life hell if you tried to actually use it


Except you're missing the part that it was later replaced with SFU.

https://en.m.wikipedia.org/wiki/Windows_Services_for_UNIX

However most people didn't care enough to use them.

Having WSL now in the box to do what many of us have been doing with VMware, is more an attack on Apple to cater to those that do not pay Linux OEMs to develop for Linux than anything else out of their good hearts.


Windows is what annoys me about C's portability story. C prides itself on supporting every platform, including exotic and dead ones, but only hypothetically. The spec is all about the potential. All the portability promises end the moment you try to compile anything. It doesn't work, but that is no longer C's problem. It's simultaneously the most portable language in the world, and one that barely works on the most popular desktop OS.


"C the language" is very portable and useful, "C the standard runtime library" much less so because it's essentially stuck in the 80's. Good for very simple UNIX-style command line tools, but not much else.

But for non-trivial projects that's not much of a problem because it's quite trivial to ignore the C stdlib functions and call directly into OS APIs (via your own or 3rd-party cross-platform libraries).


Yup, and to complement, a very important technique is to design platform-specific layers such that they call the (portable) application code - not the other way around. The more typical approach is attempting to find abstractions for I/O and other platform specific facilities that can be implemented by all targeted platforms. The issue with that is that it is very hard to really make use of the platform this way, or even implement all abstractions for all platforms without quirks. If instead the platform layer calls the (un-abstracted) portable application code, there is no code that needs to be pushed through misfitting abstractions.

I learned this approach from Handmade Here / Handmade Community and it's probably the deepest architectural lesson I learned anywhere, while it seems to be not widely known. I'm almost tempted to say that the conventional approach is wrong and misguided. However, an issue with the platform-calls-code approach is that the code usually can only communicate asynchronously with the platform (one of the issues with code-calls-platform is the one-shot synchronous nature of function calls). That might not work always, for example, I wanted RDTSC counters or Copy-Paste functionality, and I still do this by finding a platform abstraction that the code can call into directly.


I know RDTSC was just an example, but I really wouldn't worry about going Platform->App->Platform for features that have no architectural concerns or impact (in this case, some number connected to performance, probably a cycle count, that's monotonically increasing).

I/O would be the major no-no.


I agree on that point. To extend, I said RDTSC but was actually thinking of QueryPerformanceCounter() (my bad). Going App->Platform becomes problematic when an abstracted facility cannot be implemented as a simple function call but requires a "protocol" that involves multiple function calls. QueryPerformanceCounter() is a simple case here, it only requires an initialization step using QueryPerformanceFrequency(), but it could already be seen as an absraction leak. Either that, or the abstraction has to be designed to include calls and data structures that are only used on certain implementations. This requires foresight when designing the abstraction and it makes the abstraction inconvenient to use.


Yes, maybe something requires some setup in the top Platform layer, but QPC is also pretty light on that front. And you should only apply the frequency when displaying the value for humans anyway, which makes it quite easy to wrangle the values from QPC and CLOCK_MONOTONIC, etc., at the display site, IME.


I wonder if people would still call C portable if there were more non-POSIX-compliant operating systems


There are plenty of them, the FOSS culture just tends to ignore them.


Microcontrollers usually are not POSIX-compliant. Android and iOS are not POSIX-complaint.


The ANSI/ISO C standard library and the POSIX standard are two different things though that just happen to overlap here and there.


My point is that a lot of people treat the POSIX standard as if it was the C standard library. It's understandable because of how limited the actual standard library is, but it leads to people having a view that an OS is doing things wrong if it's not following POSIX.


I think the proliferation of POSIX-like OSes is mostly down to OS projects wanting there to be useful software that can run on their OS, so they minimize/eliminate the work of porting. It has led to the phenomenon you describe though -- instead of thinking an OS is just too much work to port to, it might be thought of as being "wrong".


Worse is when the OS does follow POSIX but people haven't read the fine print that certain POSIX APIs are implementation dependent, e.g. signal handling and threads.


D was initially designed (around 2000) to be agnostic about UTF8, UTF16, and UTF32. The language has full support for all three.

But it gradually became clear that UTF8 predominates.

D best practice now is to do all processing in UTF8. When encountering UTF16 or UTF32, promptly convert it to UTF8, do the processing, and then convert to UTF16 or UTF32 on output. In practice, the only time this is needed is to interface with Windows.

All the D standard library code that interfaces with Windows converts to/from UTF8 to UTF16.

If I was doing a do-over for D, there wouldn't be core language support for UTF16 or UTF32, only library support.


UTF-32 was an interesting choice. Was that done for the sake of completeness or was there a specific use case for it?


It was done for completeness.

I was accused of being an anglophile at one point for not doing everything in UTF-32 :-)


If I were to develop native Windows app now, I would still go for C-Win32-GDI, instead of sorting through all those jungle of stacks that had been coming and going in the last 20 years.


If were to develop a native Windows app 20 years ago, I'd have picked Qt, in these 20 years I'd have relatively simple changes to make and in exchange I'd get portability to many more places and great performance

https://www.qt.io/blog/2018/05/24/porting-from-qt-1-0

I'd do the same today, definitely !


Unless you're writing something low-level like a driver or representation engine (game engines or CAD/CAM or something like that), why would you still have to write native Windows applications?

I do see how a native application generally behaves and performs better, but it's a whole lot of extra work vs. a less-than-native application. I'd even opt for a generic backend (say, Rust or Go based) and simply having the 'interface' be OS-native.


It's definitely not "a whole lot of extra work".

The same attitude as yours has spawned an entire culture of repulsive inefficiency and egregious waste. Hundreds or even gigabytes of memory for a bloody chat client which still manages to lag noticeably on 2019-era hardware? How is that even acceptable?

...all because of some misguided and incredibly selfish notion that "developer experience" somehow trumps that of however many users there are (and because a lot of this is corporate shit, we have no choice but are forced to use it.)


Pride in one's craft. It starts to make a lot of sense when you are optimizing for anything other than profit.


You can develop native windows apps in Rust these days - even through the new old win32 APIs


You can, and it's actually more pleasant to deal with than Rust+GTK on Windows, if only for the compile time.


Technically anything with a bridge or FFI would be native enough I suppose ;-)


Life sciences laboratory devices, DAW, factory automation, air gaped dashboards, for some examples.


So essentially the representation engines (specialised science on Windows, DAW on Windows), but factory automation or air gapped dashboards, I doubt either needs Windows or 'native' applications on a desktop? Maybe a Windows-based HMI or something, but other than that I haven't seen any implemented like that.

Factory automation at TSMC and the likes is mostly just Java, web and webbased interfaces all in a closed network. Same for most modern logistics factory automations.

Classic PLC-based HMI interfaces from Siemens and the likes do still come with Windows as a dependency, but those have normal HTTP interfaces as well. Besides, this is such a niche with so much money going around they hardly depend on the ability to work inside of Microsofts limits. Calls into private APIs galore, hence the limits on OS patches and upgrades that might break those undocumented interfaces...


Most drivers for such hardware are based on DLLs or COM, and they aren't going to change any time now.


As far as I can see most are using the same protocols they always have over the buses they have always supported, no special driver applies. But even then, why limit the end-user application to a Windows desktop native application in C or C++ when you can also just have it run a communication server and do the client-side things in a more universal implementation. Makes for better compartmentalisation anyway, like the example of the closed networks of chip manufacturers. The chip machines and control systems run linux, but the desktops can be anything with a TCP/IP stack and a browser.


One reason would be support.


I'd choose Delphi instead for native Windows apps. This is where it still shines.


The classic Win32 GDI API isn't the worst thing imaginable. It has been the basis of untold amounts of successful GUI software and it still works, although much of the latest capability of Windows is hard to reach using it.

The best GUI development experience I had on Windows aside from that was WTL/ATL. I kept hoping for a long time that this framework would reemerge as a first class GUI platform on Windows. I think Microsoft missed a tremendous opportunity by neglecting it. It actually made writing COM components enjoyable and appealing.


And with Wine you have some hope for portability.


qt would be a good candidate too.


I can recommended PureBasic. You can call win32 API directly from PureBasic if you need to. Easy to both build and call static and dynamic libraries.

https://www.purebasic.com/


I want to point out that this article doesn't acknowledge this major change, regarding ANSI/Unicode API variants in Windows:

  > "Until recently, Windows has emphasized "Unicode" -W variants over -A APIs. However, recent releases have used the ANSI code page and -A APIs as a means to introduce UTF-8 support to apps. If the ANSI code page is configured for UTF-8, -A APIs operate in UTF-8. This model has the benefit of supporting existing code built with -A APIs without any code changes."
https://docs.microsoft.com/en-us/windows/apps/design/globali...

It seems not everyone is aware of this, too


>I want to point out that this article doesn't acknowledge this major change

But it does? In "How to mostly fix Unicode support" section.


> I’m excluding Cygwin and its major fork, MSYS2, despite not inheriting any of these flaws. They change so much that they’re effectively whole new platforms, not truly “native” to Windows.

Ah, but Cygwin doesn't actually change so much that we can't scale some of it back, to have sane C and C++ development on Windows.

https://www.kylheku.com/cygnal/

Build a Cygwin program, then do a switcheroo on the cygwin1.dll to deploy as a Windows program.

I developed Cygnal to have an easy and sane way of porting the TXR language to Windows.

You can control the Windows console with <termios.h> and ANSI/VT100 escapes. Literally not a line of code has to change from Linux!

In Cygnal, I scaled back various POSIXy things in Cygwin. For instance:

- normal paths with drive letter names work.

- the chdir() function understands the DOS/Windows concept of a logged drive: that chdir("D:") will change to the current directory remembered for the D drive.

- drive relative paths like "d:doc.txt" work (using the working directory associated with the D drive).

- PATH is semicolon separated, not colon.

- The HOME variable isn't /home/username, but is rewritten from the value of USERPROFILE. (I had problems with native Windows Vim launched from a Cygwin program, and then looking for a nonexistent /home due to the frobbed variable, so I was sure to address this issue).

- popen and system do not look for /bin/sh. They use cmd.exe!

- when exec detects that "/bin/sh" "-c" "<arg>" is being spawned, it rewrites this to cmd.exe. The benefit is that interpreter programs which expose command invocation functions which they implement from scratch by forking and execing /bin/sh -c command are thereby retargetted to windows.

- the /cygdrive directory is gone

- /dev is available as dev:/ so you can access devices.


> https://www.kylheku.com/cygnal/

Your SSL certificate is invalid and your server uses TLS 1.0 which is insecure and disabled by default in recent browsers.


> your server uses TLS 1.0

That is correct. Basically, the OS upgrades kept breaking my highly customized setup, resulting in downtime and wasted time, so eventually I dropped out of the program. I need to rebuild a new server and migrate everything over. With two small kids now and full-time job, if I have any block of free time (basically late at night or early morning), I prioritize putting it into working on code. It's not just a web server setup and custom things around it, but Postgress databases, e-mail setup and other stuff. Every item could have some issue chewing up hours.

> which is insecure

Secure/insecure isn't a simple Boolean. It's less secure than newer versions of the protocol, but it's not "insecure" like, say, a plain password sent over telnet.

> Your SSL certificate is invalid

I don't believe that is the case. Firstly, I've never had an problem connecting to the site using https from any browser. Secondly, most online SSL checking tools do not report a problem other than remarking on it being self-signed.

For instance geocerts.com says "The hostname (www.kylheku.com) matches the certificate and the certificate is valid."

I found one which doesn't understand wildcard names: it reports that www.kylheku.com doesn't match *.kylheku.com. That's a bug in the checker. One other flags some unspecified issue with the Alternative Names property (which I didn't even specify when generating the cert).


Sanity only required for those that fight against the platform and insist in carrying their UNIX habits to every platform they touch.

Embrace C++ Builder, Visual C++ or Qt.

No sanity medicine required.


With the exception of Qt, none of those platforms are, in any way, shape or form, cross-platform. Visual C++ is Windows-only, and C++ Builder supports only Windows and iOS (??). So you don't solve the problem, merely invert it. Instead of everything-except-Windows, you now support only-Windows.

I also don't see how using VC++ for example circumvents API issues, since it's just a compiler. I can imagine that Qt might solve some issues, since it offers a very wide set of abstractions. As such, it can probably do things like convert string formats on-the-fly between getting it from the OS and passing it to your application.


This thread is about developing on Windows....


I've done a lot of Win32 development in C and C++.

People here are saying don't expect it to be Unix, and that's fine. When I write for Windows, everything is PWSTR and that's fine.

But I have seen a lot of people get tripped up on this. People who are not well versed in the issue have no idea that using fopen() or CreateFileA() will make their program unable to access the full range of possible filenames, varying with the language setting. I've absolutely seen bugs arise with this in the 21st century.

The old "ACP" is meant for compatibility with pre-unicode software, but that hasn't been a big concern for something like 20 years. It is a much bigger issue that people are unaware of this history and expect char* functions to just work.


I have done a lot of Win32 as well, and recently started a new project after a long haitus. The problem is there are some very occasional things that use char*, and so you have to keep that in mind: for example, some parameters in wWinMain, many cross-platform libraries that I want to link to, or some of the C++ library functions mentioned in that article. Converting between char* and wchar_t requires being a bit careful not to cause a buffer overflow. It's a lot of mental overhead to deal with it all, but you're right that if you're mainly just calling Win32, then sticking with the wide version of functions and strings will mostly be fine. Microsoft could certainly make this easier by offering a "wWin32" project that only has the wide version of libraries and documentation to prevent mistakes.


> Microsoft could certainly make this easier by offering a "wWin32" project that only has the wide version of libraries

Using API functions ending with ...A (for the ANSI version) like CreateFileA is actually rather lowlevel style and means that you know what you do (the same holds for the ...W versions) and explicitly want it this way.

Normally, you use CreateFile and Visual C++ by default sets the proper preprocessor symbols such that CreateFileW will be used. If you pass a char*, you will get compile errors.

In other words: the infrastructure is there to avoid this kind of errors - and it is even used by default. But if you make it explicit in your code that the ANSI version of an API function (CreateFileA) should be used instead of using CreateFile, don't complain that the compiler does obey and this turns out to be a bad idea.


This used to be the case, but not anymore, I think. If you look at MSDN docs these days, they explicitly document ...A and ...W functions separately. If you create a new Win32 project in Visual Studio, it's halfway through - sometimes it will call the ...W functions explicitly, other times it will use the macros. But overall it feels like the intent is to push devs towards explicitness.

The macros in question were always a bad idea, because they applied to all identifiers. So e.g. if you had a method named CreateFile in your class somewhere, and somebody did #include <windows.h> along with the header defining that class, you'd get a linker error.


To me, it's stylistically very weird to include the W in source code that humans will read and write, because the fact that it's the unicode version is immaterial, and should appear on literally every API call. If macros are a problem, I feel like the ideal situation would be to declare without the W, and use some compiler or linker hint to append the 'W' to the symbol. (Kind of reminds me of how this topic is approached when doing interop from .NET)


Not every call - only those that take TCHAR arguments, or structs with TCHAR fields.

Why do you think it's immaterial, though? It was back when A-funcs were considered legacy, and W-funcs were the way to go. But now the official stance is that A-funcs are for use with UTF-8. Being explicit about whether you're handling UTF-8 or UTF-16 strikes me like a good idea.


As you say, at various points the A funcs were talked about as legacy or deprecated. WinCE did not have them. During the win7 era refactor of things like kernel32 and advapi, they were also kind of deprioritized, left in the compatibility wrapper and not the first class interfaces. I haven't followed in recent years but when WinRT was coming on the scene they were talking about not including them?

But I think of the canonical name as not having either A or W. Rather what you get for TCHAR is a per compilation unit setting, and the A or W in the linker name is sort of an implementation detail. That is to say I'd internalized the world the macros presented, and labelled that as "what makes sense". In particular it is distracting from the semantics of the APIs to have the string type show up in the name.

If they could do c++ style name mangling in plain C I feel like they might've and it'd be somewhat more "pure". The macro thing feels like sort of a hack to achieve this.


You're absolutely correct, this is all how it used to work. The flip to UTF-8 support and the use of A-funcs for that is very recent; the ability of the app to force itself into this mode first showed up in 2019. The naming doesn't really make much sense anymore, but then that wouldn't be the first for a 35-year old API set...


You can set UTF-8 system-wide for 3+ years; I've been doing it on all my boxes and never had issues. I suspect in a few years this will just be the default: https://en.wikipedia.org/wiki/Unicode_in_Microsoft_Windows#U...


The only flaw Windows has is that you are not allowed to redistribute the cl.exe compiler. For my game engine it's a 35MB zip!

https://developercommunity.visualstudio.com/t/allow-redistri...


I'm doing it with clang. As a bonus I get better performance than I did with msvc.


I tried looking at all other compilers, the only that fit my size constraints was tcc... that would mean dropping std::string and only use ASCII char*... not a big deal for my game because my .ttf fonts dont have anything else but for other projects throwing UTF-8 out of the window might be a show stopper!?

Also considered Rust, but that is so bloated/slow! Can't stop anyone from using Rust for the game in the future though, because any .so/.dll will be able to hot-deploy.


std::string is basically just the following:

    struct std_string {
      union {
        char small[16];
        struct { size_t len; char* buf; } big;
      } value;
      unsigned char is_big;
    };
There is nothing it does regarding support for non-ASCII characters over what you get from buf and len. And for UTF-8, you don't even need len, plain old strlen from the 1980s works fine on valid UTF-8, so plain char* from C works just as well.

And the union is just a performance optimization for small strings (is_big is probably not an additional field in good impls, but I separated it here), it's logically identical to just the buf and len.

Which is all just to say, go for it with tcc!


Yes, but the UTF-8 logic, how complex is that?

I don't even know what union does, but I can imagine you have many smaller chunks?


The union has nothing to do with UTF-8, so let's ignore it. If you want more details about it, search for "c++ small string optimization", but the one-sentence version is that it's just a way to avoid a heap allocation for strings <= 16 bytes long (including any NUL terminator).

So ignoring that irrelevant optimization, std::string is basically:

    struct std_string {
      size_t len;
      char* buf;
    };
For storing valid UTF-8, the len is unnecessary, since a NUL byte is not valid UTF-8. You can still tell how many bytes of UTF-8 you have by using strlen, because when you find a 0 byte, it is never part of the string, it's always the NUL terminator. So the len is not strictly necessary for valid UTF-8 -- leaving us with just char*.

And not trying to dodge your question about UTF-8 logic, but my point was you can dodge that whole question, because std::string provides the same amount of UTF-8 logic as char* -- that is, none at all. If you've been getting by on std::string, then you can get by on char*. If you only need to support UTF-8 input and output, and you don't need to manipulate strings (replace characters, truncate them, normalize them for use as keys in a data structure, etc.) or only need to do simple substring searches for ASCII characters, then you can just use char* or std::string. UTF-8 has a great design, which was consciously chosen to make all of that possible.


I don't understand, shipping 50mb of cl.exe is fine but shipping clang isn't ?


Shipping cl.exe is illegal, I tried looking at clang but it was alot bigger than 50MB, and it was unclear how to untangle it from installers and other dependencies:

https://stackoverflow.com/questions/65807034/clang-llvm-zip-...


I'm shipping it as part of https://ossia.io (statically linked to the app) ; my installers are between 50 and 100mb for something that ships clang+llvm, boost, qt, ffmpeg and a lot of other things (in addition to its own code):

https://github.com/ossia/score/releases/tag/v3.0.0-rc7

you just need to build llvm/clang statically and target_link_libraries(<couple stuff>) in cmake (and ship the headers if you want to do useful things, this actually takes much more space uncompressed but it'd be the same whatever the compiler)


That sounds like a lot of work, I just want a zip with the compiler ready to extract into any project folder.



Thx! So back to MinGW it is!

I'm surprised this is so hard to find, why is there no official redistributable compiler for Windows?! What is Microsoft so afraid of?

This is the only disadvantage Windows has compared to Linux!

I guess you can make this alot smaller if you remove the cross compiling parts?


it's only "mingw" because it uses the mingw headers. It uses the microsoft modern C runtime (ucrt) and runs just fine under normal cmd.exe shell with c:/windows/formatted/paths, and does not require e.g. MSYS.


Should new programming languages built around internal use of UTF-8 be opting their binaries into this behavior by default?

(I'm mostly thinking about Rust, which I think has to do some contortions on Windows. Probably also Go, to the extent it uses native APIs; arguably Python too.)


The Rust standard library has contortions in order to support invalid UTF-16. To do this it uses an encoding called "WTF-8"[0], which is an extension to UTF-8.

If the standard library dropped this requirement and only supported valid Unicode then it could simply use normal UTF-8 instead of WTF-8.

[0]: https://simonsapin.github.io/wtf-8/


The problem right now is that forcing the codepage to UTF-8 only works on Windows 10 build 1903+. So quietly forcing this opt-in may result in situations where everything works fine for the devs, but some of their users see breakage for no clear reason.

Long-term, though? Absolutely.


Can anyone elucidate whether the new UTF-8 codepage allows working with filenames that aren’t valid UTF-16 (i.e. containing mismatched surrogate code points)?

In general I think it’s best to stick with the native "W" functions and/or wchar_t on Windows.


It does not. Invalid UTF-16 (aka unpaired surrogates) are replaced by the Unicode REPLACEMENT_CHARACTER (�) However, in practice such filenames are only produced by malicious software and many large products won't work with them anyway (e.g. Microsoft's own VSCode).


I've recently started dabbling in some C development on Windows, particularly drivers, and a lot of the newer APIs Microsoft provides are C++ only headers so unless I add `extern "C"` to everything myself (and thus now also need to maintain that with any updates Microsoft issues) how can I interact with those APIs or use those headers in pure C? I don't see how and it's quite frustrating.


I understand that _not_ recommending an alternative to the C-family keeps the post focused, but it does feel odd not to name one language that works well. Perhaps that's an indication of something...




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: