The Alpine Linux docker image is only a couple megabytes[0]. There are quite a few people (including myself) using it as a light-weight OS for their containers.
There is also my own 'minifs' https://github.com/buserror/minifs ; it's targeted at embedded, but it works fine on x86s too. Haven't ported it to musl yet (mostly due to the fact that musl wasn't yet merged into crosstool-NG) but that's high on my list.
One thing minifs does is that it has a 'cross linker' took that jettisons every piece of code/library that is not actively linked by executable; also, instead of installing everyone's crap everywhere, I build the distro the other way around, and 'pick and chose' the bits that packages need until they work.
There's also Void Linux, which has musl and glibc versions of packages and offers both musl-based and glibc-based download images: http://www.voidlinux.eu/
This isn't perfect, but kinda works (even with autoconf). I also tried to add musl support into LLVM/clang, but I've been too busy recently, and won't be able to work on it for a while.
A side note: Clang is such a beauty whose structure is so easy to understand yet very extendible. There are actually few things to be done on clang to support musl. Just implement a proper frontend, and you're mostly done. But it's kinda difficult to patch codes which assume glibc, and the fact that musl refuses to export __MUSL__ macro makes the job even harder.
Excuse the noob question, but why are there different libc implementations at all? I mean, their featureset is defined by standard, so all implementations should strive for the maximum performance - so where comes the bloat from?
Support for multiple archs is of course a valid source for percieved bloat, but that should matter only at compile time?
I think you'd be surprised at how much variation there is in the feature set. libc isn't just the C standard library as defined by the language standard. It's the OS standard library too, and includes POSIX stuff and some de facto standard stuff and some OS-specific stuff. The musl author has a comparison table: http://www.etalabs.net/compare_libcs.html
Historically, the biggest reasons for alternate libcs have been to make tradeoffs for use on embedded systems (uClibc) or political/license differences.
> Historically, the biggest reasons for alternate libcs have been to make tradeoffs for use on embedded systems (uClibc) or political/license differences.
And also the toxic maintainership of glibc in the Drepper era.
Have you probed around the musl code? I welcome the competition to glibc and think they have some good goals. It's remarkably stark in terms of documentation. memset.c has some good comments, being as how it's not exactly how most developers would implement it initially. memcpy.c looks as if it has none at all...
musl's printf is more clean than glibc's, no question, but it does less though. Only because I've wasted time to look at many of them, it doesn't look bad, but it just doesn't look great either. glibc's is really ugly for 2 big reasons, it has extra sweeteners and then it does some exotic looking shit to try and be fast. I'd probably grok musl's faster but I'd not want to jump in and hack on either of them or debug them.
Some non-canonical constants, stdout instead of STDOUT.. It's a nit, but I remember at least 2 major TLS holes in the last 18 months that were pretty much a result of non-canonical C style; it's a dangerous thing when you do it and start to collaborate.
It looks clean enough from 10,000 feet, and I do admire the competition and welcome it, but it's just another collection of relatively undocumented stuff that implements some mildly exotic performance optimizations, just like glibc, only it doesn't support nearly as many platforms. It means everything in the world if you've run into a glibc bug, I get that, but it's amazing how few people have done that. Both could really benefit from a detailed and complete set of "how and why" documentation...
Omg that was a great article on printf. I had more Wtf's than I've had in a while despite how many times I've called out C and UNIX implementations for their complexity. Here's another good one in return:
There's still hope for Unix, if DragonFly BSD is any measure. A shame no one feels the need to follow its lead.
And yes, that PHK op ed is well known. It's been posted here before, though if I have to be honest PHK messes up his nomenclature, even if his points are mostly correct.
No Oberon, but maybe we can at least get some good capabilities like EROS if the work on Capsicum goes through.
Just looked up DragonFly BSD. Prior I just heard it was doing concurrency or something differently. Looking at the details... damn, I'm impressed. It's still a UNIX architecture with the good and bad that comes from that. Yet, they're applying rather than ignoring many good engineering choices for dealing with various UNIX problems such as concurrency, integration, OS-level faults, & kernel debugging. The HAMMER filesystem is also a nice development given it's not GPL or Oracle-related (right?). A BSD alternative to ZFS is by itself worth a whole project.
So yeah, there's still hope for UNIX. Looks like that hope goes two ways:
1. UNIX variants like DragonFly that intentionally break things or get rid of crud to better themselves over time.
2. People on such projects who stumble onto superior architectures while looking for improvements and start building a better non-UNIX.
I'll have to update myself on DragonFly in near future. Good that you know about EROS and Capsicum: ahead of most ;). Check out CheriBSD on CHERI processor if you like that. Far as software, JX Operating System and GenodeOS are two clean-slate one's with architectures you might find interesting. Interesting in terms of foundations to build better stuff on.
Oh yeah, there was a talk about CheriBSD at this year's BSDcan. Gotta look more into it. GenodeOS looks fascinating from first glance. I'm aware of the various JVM-based OSes, but I have no personal interest in them. Thanks anyway.
glibc is a project with a 30 year history of legacy code, and they made it portable between different UNIX kernels (it predates Linux).
It has maintained strict ABI compatibility since glibc 2.0, released in 1996; a lot of functions have multiple different implementations with versioned symbols, just so existing binaries built against older versions continue to run.
Naturally all of this compatibility gunk does not lend itself to simple and maintainable code; perhaps call all of that accidental complexity "the tax of the UNIX philosophy".
People on other sites I shared it on said the glibc was worse than many they worked with. They're from back in the day, too. So, for now I'll call it the tax of the GNU UNIX philosophy.
The BASIC, Pascal, Modula-2, and Ada code looked much less hacked together even when it was portable or old. Still looked weird & dated but more readable. So, like one commenter said, it's also because of C. It's design and culture lead people on the path of much dirty hackery to... implement a printf statement.
Note: Situations like this argue for real macros and metaprogramming like in LISP. A pseudo code of what each aspect does plus its implementation would make it more comprehensible.
they made it portable between different UNIX kernels (it predates Linux).
In practice, the only platforms that really use it are GNU/Linux and GNU/Hurd. Even Android uses Bionic libc. Though I guess there's also novelty projects like GNU/kFreeBSD.
perhaps call all of that accidental complexity "the tax of the UNIX philosophy".
I do so feel related to that article. I compile distros all day for embedded systems, and I am certain that perhaps 80% of the build time is /just/ autocrap tools checking for the bazillion's time if 'strlen' exists on the system, and a myriad of other things that are either irrelevant, or constant anyway.
The /other/ problem is that most of the time spent fixing build issues is never fixing actual code; most of the time is spent trying to go around these autocrap tools failing to work as intended, and as the article describes, it's a nightmare.
All the while, with a bit of care, you can build most programs with a half page makefile, with very little problems; in fact, very often when I've banged my head for a while trying to fix an autotool nigthmare, that's exactly what I do!
All the bugs discovered by musl, and how many are glibc-related:
That's just a crazy number of bugs.
The page doesn't say how they were found. By building test suites during the creation of musl? Do conformance suites not already exist?? Or were these (hypothesized) tests just more rigorous?
One use case is embedded systems. The code space available can vary from a few kilobytes to a few mbs as well as memory overhead. Sometimes you won't even pull an entire library with even though it's small, you'll statically link in only the functions you used.
You will after a bit run into issues using Musl with a wrapper script, in my experience, and you are better off using a Musl based distro. You can run one in a chroot, I used to use Sabotage like that quite a bit, or if you use Docker you can just use "FROM alpine" and everything will be nice and statically linked, or you can do an install in a VM.
musl is great, but it would great if there was something at the compiler level that is better than both gcc and LLVM/Clang.
LLVM/Clang is a step in the right direction but it's quite bloated because of support for C++, Objective-C etc (for C standards at least); not to mention it's written in C++.
Being a Python/C hybrid enthusiast, I looked into taking something like Eli Bendersky's pycparser and making it featureful (preprocesser parsing at the minimum), but haven't done anything in that direction. Maybe some way of combining pycparser and tcc.
What's wrong with being written in C++? It's a high-level, fast language with higher-level abstractions and better typesafety than C and decades of expertise built up around it.
Sure, that means it has some legacy bits that need to be supported, but to some degree that's the price of having a mature codebase.
Having compiled LLVM and rooted through its internals trying to implement a plugin, it looks pretty well-architected and internally consistent. Codebases that cover as much as LLVM+Clang do being that clean is a rare sight, IMO. And it's not slow.
Actually, I love C, C++ and Java all the same. They are great tools for writing great code. But regarding the comments I've stumbled upon, I think you're right.
Nothing wrong with C++ per se, if you're a C++ programmer.
First, I assume this is a musl thread, so I'm assuming C++ people might not be interested in it in the first place (I'd be surprised if C++ communities discuss the bloat of glibc let alone seeking its minimal C alternatives like musl instead of going for C++ solutions like boost and what not).
Second, having hung around people who discuss the bloat of glibc, and strive for white-box and minimalism, of which musl is a result (this includes, but not limited to, communities like suckless, and cat-v.org), speaking of musl and clang in the same line would be odd at the least, considering clang is ~600k+ lines of C/C++, when there is a C-only tcc compiler at ~60k+, so it's likely clang is doing something (a lot of things) that C programmers have no interest in.
Although I should admit these minimalist C communities might not have a good opinion of Python either (that's one of my personal interests).
Part of those 600k lines of code come in the form of various things that tcc doesn't do -- things like loop unrolling, support for SSE, support for architectures other than intel-based, support C11, etc. So, you end up with a compiler that is significantly slower and less useful than the clang suite. So while there are things the C community wouldn't support, like other languages, there are quite a few things that the C community would consider vital these days.
I'm not sure how well maintained those backends are, and I've read TCC isn't really stable enough to use in production, between the incomplete x64 backend and just general bugs... Solving those may yet push TCC a fair bit beyond 60k LOC.
Clang itself is ~400k once you exclude things like the static analyzer and all the various tooling built on top of it that's shipped with clang. Either way, that number is sort of a drop in the bucket compared to LLVM. LLVM is millions of lines of code, very little of which could be dropped simply by only supporting C. Even a C-only programmer wants things like optimization passes and backends for every platform they plan on running on.
That's one way to look at it. Another way to look at it is that Clang/LLVM are the open source compiler that aren't under the GPL and musl is the open source libc that isn't under the GPL.
There is pcc which looked like it had a chance for a while. There was at least a lot of excitement from the OpenBSD and NetBSD communities a while ago. I know it was imported but then I think it was removed. Development continues on but at a slower pace.