How to compile C apps with musl and Clang

vezzy-fnord · on June 27, 2015

The more awareness musl gets, the better. See also Sabotage Linux, a musl-based distro: https://github.com/sabotage-linux/sabotage

raphaelss · on June 27, 2015

There's Alpine Linux too: http://www.alpinelinux.org/

nickpsecurity · on June 28, 2015

I'll check out these distro's in the near future. I at least got a laugh out of what I saw on alpine's homepage:

"Simple. Small. Secure." right next to "ISO 298MB."

I know I'm getting old when people think a 298MB system is simple, small, or secure. Lol...

jzelinskie · on June 28, 2015

The Alpine Linux docker image is only a couple megabytes[0]. There are quite a few people (including myself) using it as a light-weight OS for their containers.

[0]: https://github.com/gliderlabs/docker-alpine

nickpsecurity · on June 28, 2015

Now that is more like it!

sams99 · on June 28, 2015

for example, my redis container (samsaffron/redis) is 20mb, the official(tm) redis container is based on ubuntu is 100mb

buserror · on June 29, 2015

There is also my own 'minifs' https://github.com/buserror/minifs ; it's targeted at embedded, but it works fine on x86s too. Haven't ported it to musl yet (mostly due to the fact that musl wasn't yet merged into crosstool-NG) but that's high on my list.

One thing minifs does is that it has a 'cross linker' took that jettisons every piece of code/library that is not actively linked by executable; also, instead of installing everyone's crap everywhere, I build the distro the other way around, and 'pick and chose' the bits that packages need until they work.

andolanra · on June 27, 2015

There's also Void Linux, which has musl and glibc versions of packages and offers both musl-based and glibc-based download images: http://www.voidlinux.eu/

esjeon · on June 28, 2015

I've been digging musl + clang too. Here's my wrapper script that I use to compile a number of projects: https://github.com/esjeon/musl-clang

This isn't perfect, but kinda works (even with autoconf). I also tried to add musl support into LLVM/clang, but I've been too busy recently, and won't be able to work on it for a while.

A side note: Clang is such a beauty whose structure is so easy to understand yet very extendible. There are actually few things to be done on clang to support musl. Just implement a proper frontend, and you're mostly done. But it's kinda difficult to patch codes which assume glibc, and the fact that musl refuses to export __MUSL__ macro makes the job even harder.

mschuster91 · on June 28, 2015

Excuse the noob question, but why are there different libc implementations at all? I mean, their featureset is defined by standard, so all implementations should strive for the maximum performance - so where comes the bloat from?

Support for multiple archs is of course a valid source for percieved bloat, but that should matter only at compile time?

wtallis · on June 28, 2015

I think you'd be surprised at how much variation there is in the feature set. libc isn't just the C standard library as defined by the language standard. It's the OS standard library too, and includes POSIX stuff and some de facto standard stuff and some OS-specific stuff. The musl author has a comparison table: http://www.etalabs.net/compare_libcs.html

Historically, the biggest reasons for alternate libcs have been to make tradeoffs for use on embedded systems (uClibc) or political/license differences.

kryptiskt · on June 28, 2015

> Historically, the biggest reasons for alternate libcs have been to make tradeoffs for use on embedded systems (uClibc) or political/license differences.

And also the toxic maintainership of glibc in the Drepper era.

vezzy-fnord · on June 28, 2015

Here's an article from the musl wiki that explains differences from glibc: http://wiki.musl-libc.org/wiki/Functional_differences_from_g...

All the bugs discovered by musl, and how many are glibc-related: http://wiki.musl-libc.org/wiki/Bugs_found_by_musl

If you want to know why someone would want to replace glibc, then I give you a challenge: figure out how glibc implements printf(3). Good luck.

The answer is: http://blog.hostilefork.com/where-printf-rubber-meets-road/

Nelson69 · on June 28, 2015

Have you probed around the musl code? I welcome the competition to glibc and think they have some good goals. It's remarkably stark in terms of documentation. memset.c has some good comments, being as how it's not exactly how most developers would implement it initially. memcpy.c looks as if it has none at all...

musl's printf is more clean than glibc's, no question, but it does less though. Only because I've wasted time to look at many of them, it doesn't look bad, but it just doesn't look great either. glibc's is really ugly for 2 big reasons, it has extra sweeteners and then it does some exotic looking shit to try and be fast. I'd probably grok musl's faster but I'd not want to jump in and hack on either of them or debug them.

Some non-canonical constants, stdout instead of STDOUT.. It's a nit, but I remember at least 2 major TLS holes in the last 18 months that were pretty much a result of non-canonical C style; it's a dangerous thing when you do it and start to collaborate.

It looks clean enough from 10,000 feet, and I do admire the competition and welcome it, but it's just another collection of relatively undocumented stuff that implements some mildly exotic performance optimizations, just like glibc, only it doesn't support nearly as many platforms. It means everything in the world if you've run into a glibc bug, I get that, but it's amazing how few people have done that. Both could really benefit from a detailed and complete set of "how and why" documentation...

nickpsecurity · on June 28, 2015

Omg that was a great article on printf. I had more Wtf's than I've had in a while despite how many times I've called out C and UNIX implementations for their complexity. Here's another good one in return:

http://queue.acm.org/detail.cfm?id=2349257

Makes me wish systems like Oberon got famous instead lol...

vezzy-fnord · on June 28, 2015

There's still hope for Unix, if DragonFly BSD is any measure. A shame no one feels the need to follow its lead.

And yes, that PHK op ed is well known. It's been posted here before, though if I have to be honest PHK messes up his nomenclature, even if his points are mostly correct.

No Oberon, but maybe we can at least get some good capabilities like EROS if the work on Capsicum goes through.

nickpsecurity · on June 28, 2015

Just looked up DragonFly BSD. Prior I just heard it was doing concurrency or something differently. Looking at the details... damn, I'm impressed. It's still a UNIX architecture with the good and bad that comes from that. Yet, they're applying rather than ignoring many good engineering choices for dealing with various UNIX problems such as concurrency, integration, OS-level faults, & kernel debugging. The HAMMER filesystem is also a nice development given it's not GPL or Oracle-related (right?). A BSD alternative to ZFS is by itself worth a whole project.

So yeah, there's still hope for UNIX. Looks like that hope goes two ways:

1. UNIX variants like DragonFly that intentionally break things or get rid of crud to better themselves over time.

2. People on such projects who stumble onto superior architectures while looking for improvements and start building a better non-UNIX.

I'll keep checking up on it to see what happens.

nickpsecurity · on June 28, 2015

I'll have to update myself on DragonFly in near future. Good that you know about EROS and Capsicum: ahead of most ;). Check out CheriBSD on CHERI processor if you like that. Far as software, JX Operating System and GenodeOS are two clean-slate one's with architectures you might find interesting. Interesting in terms of foundations to build better stuff on.

vezzy-fnord · on June 28, 2015

Oh yeah, there was a talk about CheriBSD at this year's BSDcan. Gotta look more into it. GenodeOS looks fascinating from first glance. I'm aware of the various JVM-based OSes, but I have no personal interest in them. Thanks anyway.

the_why_of_y · on June 28, 2015

glibc is a project with a 30 year history of legacy code, and they made it portable between different UNIX kernels (it predates Linux).

It has maintained strict ABI compatibility since glibc 2.0, released in 1996; a lot of functions have multiple different implementations with versioned symbols, just so existing binaries built against older versions continue to run.

Naturally all of this compatibility gunk does not lend itself to simple and maintainable code; perhaps call all of that accidental complexity "the tax of the UNIX philosophy".

nickpsecurity · on June 28, 2015

People on other sites I shared it on said the glibc was worse than many they worked with. They're from back in the day, too. So, for now I'll call it the tax of the GNU UNIX philosophy.

The BASIC, Pascal, Modula-2, and Ada code looked much less hacked together even when it was portable or old. Still looked weird & dated but more readable. So, like one commenter said, it's also because of C. It's design and culture lead people on the path of much dirty hackery to... implement a printf statement.

Note: Situations like this argue for real macros and metaprogramming like in LISP. A pseudo code of what each aspect does plus its implementation would make it more comprehensible.

vezzy-fnord · on June 28, 2015

they made it portable between different UNIX kernels (it predates Linux).

In practice, the only platforms that really use it are GNU/Linux and GNU/Hurd. Even Android uses Bionic libc. Though I guess there's also novelty projects like GNU/kFreeBSD.

perhaps call all of that accidental complexity "the tax of the UNIX philosophy".

I don't know why anyone would call it that.

the_why_of_y · on June 28, 2015

IIRC it was initially developed for some version of SunOS.

Edit: here's the list of supported platforms, from the 1.09 tarball:

		alpha-dec-osf1
		i386-bsd4.3
		i386-force_cpu386-none
		i386-gnu (for Hurd development only)
		i386-isc2.2
		i386-isc3
		i386-sco3.2
		i386-sco3.2v4
		i386-sequent-bsd
		i386-sysv
		i386-sysv4
		i960-nindy960-none
		m68k-hp-bsd4.3
		m68k-mvme135-none
		m68k-mvme136-none
		m68k-sony-newsos3
		m68k-sony-newsos4
		m68k-sun-sunos4
		mips-dec-ultrix4
		mips-sgi-irix4
		sparc-sun-solaris2
		sparc-sun-sunos4

> I don't know why anyone would call it that.

It was a tongue in cheek remark, given how projects that don't want to pay that tax get accused of being in violation of said philosophy :-)

buserror · on June 29, 2015

I do so feel related to that article. I compile distros all day for embedded systems, and I am certain that perhaps 80% of the build time is /just/ autocrap tools checking for the bazillion's time if 'strlen' exists on the system, and a myriad of other things that are either irrelevant, or constant anyway.

The /other/ problem is that most of the time spent fixing build issues is never fixing actual code; most of the time is spent trying to go around these autocrap tools failing to work as intended, and as the article describes, it's a nightmare.

All the while, with a bit of care, you can build most programs with a half page makefile, with very little problems; in fact, very often when I've banged my head for a while trying to fix an autotool nigthmare, that's exactly what I do!

WalterGR · on June 28, 2015

    All the bugs discovered by musl, and how many are glibc-related:

That's just a crazy number of bugs.

The page doesn't say how they were found. By building test suites during the creation of musl? Do conformance suites not already exist?? Or were these (hypothesized) tests just more rigorous?

laichzeit0 · on June 28, 2015

One use case is embedded systems. The code space available can vary from a few kilobytes to a few mbs as well as memory overhead. Sometimes you won't even pull an entire library with even though it's small, you'll statically link in only the functions you used.

gct · on June 28, 2015

Must everything be an "app"? Whatever happened to programs or god forbid unabbreviated applications

IshKebab · on June 28, 2015

Programs were called apps long before the iphone.

stefanmielke · on June 27, 2015

Did someone acessed the website through mobile? I can't go past the title.

Edit: Using an iPad 2.

aninteger · on June 28, 2015

It works fine on the default "older" web browser on the Samsung S4.

ksherlock · on June 28, 2015

yeah... use reader view.

justincormack · on June 28, 2015

You will after a bit run into issues using Musl with a wrapper script, in my experience, and you are better off using a Musl based distro. You can run one in a chroot, I used to use Sabotage like that quite a bit, or if you use Docker you can just use "FROM alpine" and everything will be nice and statically linked, or you can do an install in a VM.

fizixer · on June 27, 2015

musl is great, but it would great if there was something at the compiler level that is better than both gcc and LLVM/Clang.

LLVM/Clang is a step in the right direction but it's quite bloated because of support for C++, Objective-C etc (for C standards at least); not to mention it's written in C++.

Being a Python/C hybrid enthusiast, I looked into taking something like Eli Bendersky's pycparser and making it featureful (preprocesser parsing at the minimum), but haven't done anything in that direction. Maybe some way of combining pycparser and tcc.

koko775 · on June 27, 2015

What's wrong with being written in C++? It's a high-level, fast language with higher-level abstractions and better typesafety than C and decades of expertise built up around it.

Sure, that means it has some legacy bits that need to be supported, but to some degree that's the price of having a mature codebase.

Having compiled LLVM and rooted through its internals trying to implement a plugin, it looks pretty well-architected and internally consistent. Codebases that cover as much as LLVM+Clang do being that clean is a rare sight, IMO. And it's not slow.

pmelendez · on June 27, 2015

> What's wrong with being written in C++?

Many C devs love to hate C++. I guess is something like some C++ devs love to hate Java. It's a bit of a heritage issue :)

C++ does the job for me... it's is almost as powerful as it is complicated but I find that a fair trade off (YMMV)

andresmanz · on June 27, 2015

Actually, I love C, C++ and Java all the same. They are great tools for writing great code. But regarding the comments I've stumbled upon, I think you're right.

fizixer · on June 28, 2015

Nothing wrong with C++ per se, if you're a C++ programmer.

First, I assume this is a musl thread, so I'm assuming C++ people might not be interested in it in the first place (I'd be surprised if C++ communities discuss the bloat of glibc let alone seeking its minimal C alternatives like musl instead of going for C++ solutions like boost and what not).

Second, having hung around people who discuss the bloat of glibc, and strive for white-box and minimalism, of which musl is a result (this includes, but not limited to, communities like suckless, and cat-v.org), speaking of musl and clang in the same line would be odd at the least, considering clang is ~600k+ lines of C/C++, when there is a C-only tcc compiler at ~60k+, so it's likely clang is doing something (a lot of things) that C programmers have no interest in.

Although I should admit these minimalist C communities might not have a good opinion of Python either (that's one of my personal interests).

Sanddancer · on June 28, 2015

Part of those 600k lines of code come in the form of various things that tcc doesn't do -- things like loop unrolling, support for SSE, support for architectures other than intel-based, support C11, etc. So, you end up with a compiler that is significantly slower and less useful than the clang suite. So while there are things the C community wouldn't support, like other languages, there are quite a few things that the C community would consider vital these days.

Solarsail · on June 28, 2015

As a (mostly ignorable) sidenote, TCC does support a couple of non-intel ISAs. At least 4-5k LOC beween arm-gen.c and C67-gen.c, compiled from an IR https://github.com/LuaDist/tcc/blob/master/il-opcodes.h If this repository's anything to go by: https://github.com/LuaDist/tcc

I'm not sure how well maintained those backends are, and I've read TCC isn't really stable enough to use in production, between the incomplete x64 backend and just general bugs... Solving those may yet push TCC a fair bit beyond 60k LOC.

plorkyeran · on June 28, 2015

Clang itself is ~400k once you exclude things like the static analyzer and all the various tooling built on top of it that's shipped with clang. Either way, that number is sort of a drop in the bucket compared to LLVM. LLVM is millions of lines of code, very little of which could be dropped simply by only supporting C. Even a C-only programmer wants things like optimization passes and backends for every platform they plan on running on.

cwyers · on June 28, 2015

That's one way to look at it. Another way to look at it is that Clang/LLVM are the open source compiler that aren't under the GPL and musl is the open source libc that isn't under the GPL.

aninteger · on June 28, 2015

There is pcc which looked like it had a chance for a while. There was at least a lot of excitement from the OpenBSD and NetBSD communities a while ago. I know it was imported but then I think it was removed. Development continues on but at a slower pace.

justincormack · on June 28, 2015

It is still in NetBSD but it still cannot compile the whole kernel so you need gcc too for now but there is support for a mixed build.

OpenBSD decided against it.

ksherlock · on June 28, 2015

Have you tried using the plan 9 [from user space] c compiler?

fizixer · on June 28, 2015

I got lost in their naming convention, what is it, 2c? 8c? 9c?

mappu · on June 28, 2015

The leading number just refers to the target architecture. 5x is arm32, 6x is x86_64 and 8x is x86_32.

andrewchambers · on June 28, 2015

It explains it in the man pages.

GFK_of_xmaspast · on June 28, 2015

Real classy sample program there.

sriku · on June 28, 2015

Seems impossible to read the article on an iphone.