Hacker News new | past | comments | ask | show | jobs | submit login
A Way Out for A.out (lwn.net)
335 points by bitcharmer on March 24, 2022 | hide | past | favorite | 92 comments



This is very good example how engineering discussion should proceed:

The project lead (Linus) pushed back first with reasonable-looking point, but it is hit back by a thorough research. And finally a clever workaround broke it through.

It was very energizing to me who has habit to rely on non-engineering tools to push through these kind of discussions.


And engineering in general, it is a great day when someone notices that something is unused and pushes for its removal.


Speaking as someone who maintains a fair bit of legacy infrastructure for a living, the hard part is knowing what is unused and what isn't. That's the whole rationale behind Linus' "don't break user space" policy.



Y'know, people always cite Chesterton's Fence, but they never talk about ways to figure out why the fence was there... that's the blog post I wanna read.


It's a process and culture thing: maintain descriptive context and history about all changes, and ensure that references and hyperlinks within those are temporally stable (that is, they'll still refer to the same content when they're examined by a future observer).

"Chesterton's Permalinked Fence" is generally straightforward to reason about.


Good thought. Maybe people should learn to put some signage on the damn thing.

“Beware of the bull”


Always comment your fences.


> knowing what is unused and what isn't

Or what I run into “Is this used …. and does it even ‘work’?!?!”


My "favourite" answers to those two questions at work are "yes" and "not really..." -- that's always a sign I'm going to have a fun week or two...


At my previous job, one of the nicest bits of kit was the moribund code logging system. You put a function call into the moribund logging library from a bit of code you thought was unused. The first time a process hit that line of code, it would fire off a UDP packet to a server, indicating the process name and the line hit. (It's nice to still have test coverage on the otherwise unused code, so it's great to see which processes are actually hitting the code.)

Another system watched the version control system for who added that line of logging, and sent them status emails. After the moribund logging had been in production for two weeks, the emails turned into nag emails to either remove the code, or at least remove the moribund logging if you decided against removing the code.


Gotta love raising those PRs that are nothing but red!


Somehow along the way I think the color convention for showing diffs got mixed up. I think it should green for removals, red for additions :)


If only the python devs had a similar attitude.


A quick trivia question: when you compile a program using GCC and do not specify the name of the output binary, an A.out file gets created. Is that A.out binary using the A.out format, or is it an ELF binary?


The name and the format are distinct. The linker output (the a.out file) will be in whatever is considered native by the compiler, which for most people is the format of the system they are running the compiler on.

“a.out” is barely a format: just a block of executable, a block of data initialization and an integer to say how large the (uninitialized) heap should be. Later some symbols and a bit of debugging info was added. But remember back then a lot of stuff was still written in assembler and machines were not that powerful, so something simple was not only sufficient but better.

Separating code and data wasn’t even really needed but back then it wasn’t clear that (almost) every machine would be a von Neumann architecture. There were still plenty of Harvard architecture machines (split I/ D) space. Of course these days we’ve partially circled back with write protected executable blocks and execute permission forbidden to data and stack.

When I started designing bfd in 89 or 90 I started with a.out. There were already other formats (predominantly forms of COFF) because the limitations of a.out had started to be a problem.


An aside on BFD, which doesn't seem to have come up, and I suspect relatively few people have read the History node of the ‘Binary File Descriptor’ Library manual on the back-constructed name: “The name came from a conversation David Wallace [gumby] was having with Richard Stallman about the library: RMS said that it would be quite hard--David said "BFD". Stallman was right, but the name stuck.”. [I gather that's ‘Big’, ‘Deal’, and something like an adjective.]


That’s basically correct. The first time I gave a talk on the subject, I was asked what the acronym stood for. I had to come up with an explanation on the spot and casually said “Oh, it’s Binary File Descriptors” as if that had always been the name. And that’s how it got an official name.


Both of these anecdotes just made my day :)


same!


I was just thinking about this in the shower. The death of the Harvard architecture combined with the slow adoption of w^x seems to vindicate the problem, but I think there's a deeper truth to it: Turing complete machines can emulate any other Turing complete machine. Harvard architecture is immune to stack-smashing, but it's not immune to privilege escalations - Fundamentally, there's a usability tradeoff of "Loading new executable code from the data segment" that makes true Harvard architectures inoperable. We've settled on a happy if imperfect medium.


Van neumann is more general and provides more flexibility (what if you need more data space and don't need all that code space?).

There are still new harvard architecture microcontrollers being designed. It provides some security in the field as well as simplifying the chip layout.


It'd be pretty annoying for a JIT compiler to have to manage too. Java can do a garbage collection with barely any memory left - cool! But, it'd be a shame if similar tricky stuff was written just to e.g. decompile code just to make way for more code space. Pages with only one of 'write' or 'execute' privileges work well enough


Harvard architecture is pretty common in MCU and ASIC space, and especially in tiny little low-powered processors that might have a tiny bit of flash that instructions are read out of and an even smaller amount of SRAM for data. The ARM9 architecture is effectively Harvard for example.


These use cases could be served by asking the operating system or the hardware to copy data to the code memory. Weakens absolute protection, but it would be an escape hatch to enable JIT compilers.


That's half the definitional difference between Harvard and Von Neumann architectures, though - That never the twain shall meet.

There's a second insight on the tip of my tongue, here: Harvard architectures can emulate Von Neumann ones, and vice versa. And problems, specific problems, can be solved by either. But it seems to me that there's something interesting about emulation too - That Harvard architectures can open themselves up to precisely the same vectors of attacks if you enable "Copy this data block to code memory".


But not if instead of copying, they support a "Run this data as a program in virtual machine with access to these segments of memory" operation. The behavior would be functionally identical to emulating the architecture and interpreting the data, except that it wouldn't be slow and could be done recursively with no cumulative overhead.


If you couldn't transfer data memory to code memory at all, how would it be possible to write an ordinary compiler? If it's possible to run a compiler on the system, just have your JIT do whatever the compiler does.


Write it to the filesystem (ELF format presumably) and execute the file.

Some Harvard architecture implementations provide ways of shoving data into executable space and back (like special copy instructions) but it's not general purpose. Generally you compile on Von Neumann and flash to Harvard.


Well you write a bytecode compiler and run an interpreter for it out of code space. It could be a JIT compiler, just not one that generated native code.

This approach was pretty common in the 80s Lisp days IIRC.


Java already compiles to bytecode, which executes on the JVM. But this is too slow for most practical applications. JIT compilation to native code is crucial for execution of Java programs with competitive speed.

It would be possible to compile everything to machine code ahead of time (pretty much what GraalVM does), but that would compile everything even if it would never end up being executed*. Without JIT compiler, it would not be possible to recompile code to take advantage of tracing information gathered at runtime. Also, newly loaded Java code would not be able to be optimized at all.

There are workarounds for all of these. For example, the JIT compiler could generate a new executable with the optimized code and migrate execution state to a new process. But it seems very clunky.

*: Not really an issue with microservices and stripped-down binaries. A huge deal for IDEs and environments with plugin architecture though.


In a standard harvard architecture that isn't possible.


WASM is a Harvard architecture, so it's not as dead as one might think. :)


I believe this is for the same reason some microcontrollers do it, to whit there is no page table and much less page access bits. These days the browser is a greater thread vector than ordinary application code.


Wasm is a Harvard architecture mostly because it wants to support implementations based on static recompilation. Obviously, self-modifying code is a problem for static recompilation, and the easiest way to prevent self-modifying code is just to not expose code in the address space at all. There's also the historical fact that wasm is a successor to asm.js, and asm.js is a Harvard architecture by virtue of the fact that JS Typed Arrays don't allow you to execute code stored in them.


Wouldn't GPU somewhat be Harvard architecture - e.g. code + data is split (then again, it's not general purpose CPU so this is way off, but it's coming closer every year)...


That also was my question. The wikipedia article on the format addresses it in the second paragraph

> "a.out" remains the default output file name for executables created by certain compilers and linkers when no output name is specified, even though the created files actually are not in the a.out format.

https://en.wikipedia.org/wiki/A.out


It has been ELF for a while now.


Since around 1995 to be specific.


I remember manually migrating my Slackware install from a.out to ELF around that time. Maybe it was 1996.


Me too, then later the same again with libc5 to glibc. Obviously learned a ton about the system by manually reconstructing it that way (using only source tarballs, no packages).


I upgraded my TAMU-based system to ELF. It seems there are a few of us "crazier people" that the article mentions still around.

For several years I didn't get any distribution updates but installed new versions of packages from source instead. It was fighting GNOME updates that finally made me throw in the towel.


'96 feels about right. I, too, was on a.out briefly on Slackware before the big ELF migration. Was actually rather smooth. Not nearly as bad as the lib32 to lib64 pain train


Aix uses COFF, which makes it interesting being the only UNIX with import libraries similar to Windows.


On Linux. Did FreeBSD switch?


Yeah; looks like in the FreeBSD 3.0 release (October 1998), mentioned in old versions of the FreeBSD handbook[1]:

> FreeBSD comes from the ``classic'' camp and used the a.out(5) format, a technology tried and proven through many generations of BSD releases, until the beginning of the 3.X branch. Though it was possible to build and run native ELF binaries (and kernels) on a FreeBSD system for some time before that, FreeBSD initially resisted the ``push'' to switch to ELF as the default format.

[...]

> So ELF had to wait until it was more painful to remain with a.out than it was to migrate to ELF.

> However, as time passed, the build tools that FreeBSD derived their build tools from (the assembler and loader especially) evolved in two parallel trees. The FreeBSD tree added shared libraries and fixed some bugs. The GNU folks that originally write these programs rewrote them and added simpler support for building cross compilers, plugging in different formats at will, and so on. Since many people wanted to build cross compilers targeting FreeBSD, they were out of luck since the older sources that FreeBSD had for as and ld were not up to the task. The new GNU tools chain (binutils) does support cross compiling, ELF, shared libraries, C++ extensions, etc. In addition, many vendors are releasing ELF binaries, and it is a good thing for FreeBSD to run them.

> ELF is more expressive than a.out and allows more extensibility in the base system. The ELF tools are better maintained, and offer cross compilation support, which is important to many people. ELF may be a little slower than a.out, but trying to measure it can be difficult. There are also numerous details that are different between the two in how they map pages, handle init code, etc. None of these are very important, but they are differences. In time support for a.out will be moved out of the GENERIC kernel, and eventually removed from the kernel once the need to run legacy a.out programs is past.

It looks like default support of a.out was removed in 5.0 (January 2003), but you can still load a kernel module (shipped with the generic kernel) or compile a custom kernel with support built in. At least on i386/amd64.

[1] https://docs.freebsd.org/doc/4.9-RELEASE/usr/share/doc/handb...


From a quick look at the FreeBSD 13.0-RELEASE Release Notes ( https://www.freebsd.org/releases/13.0R/relnotes/ ) and the a.out-related commits that it links to, I'd say that FreeBSD is close to having a.out fully walked to the door. (They suggest installing a ~2-decade-old version of FreeBSD if you still need to work with a.out...)


I see that in the release notes, but they still have the kernel module available in GENERIC, and the config available. Maybe they don't do anything though :)


Is it even possible to create a.out format binaries on the latest GCC? "objdump -i" doesn't show it as an option on my Ubuntu.


yes. please learn -g3 and -Wl and --oformat parameter (gcc). objdump -i only display (info).


Wl (linker) oformat (output format try a.out)


Why a.out format, of course… on my Sun 3/80.


The NetBSD kernel still has support for a.out. On BSD, a.out is something simpler than ELF that works without any problems.^1 On Linux, a.out may have caused problems when building shared libraries thus motivating the switch to ELF.^2

1. ELF is OK, but times have changed since ELF was created. The same constraints no longer exist. Personally, I prefer static binaries and can afford the space for them. It is good to have options for people who prefer shared libraries as well as those who prefer static binaries.

2. https://people.freebsd.org/~nik/advocacy/myths.html


> It seems to be a universal rule that somebody always has to come along to ruin the party. In this case, just as it seemed like there were no further obstacles to the removal of a.out, James Jones showed up to let it be known that he was still using a.out:

I find it surprising that the Kernel doesn't have a formal and easy way to know who is using what for the purpose of dropping support except to go fishing for complaints and hoping by chance nobody bites.

There should be something like a list of legacy features on kernel.org where people can easily subscribe to a mailing list which will ask for them to know if they still need the feature when there are plans to phase it out. A phase-out list so to speak.


Anyone can start using Linux as part of their product or environment, and so today there are literally billions of resulting end users effectively using Linux systems with uncountable configurations.

Unless you want at least every developer and system administrator in the world to subscribe to that mailing list and regularly read it, I don’t know how this can really solve the problem.

Add to that the fact that the classification of a “feature” is rather fuzzy…

There are already announcements and discussions through various channels, but at the end you can never be sure. Fortunately systems don’t have to always be updated immediately (aside from for security concerns).


I see the workflow something like this:

1) Some old feature, like an obscure filesystem, is causing some pain to maintain.

2) Therefore people suggest to add it to the phase-out list.

3) After some discussion it is decided it is a worthy candidate for removal and so it is placed on the phase-out list.

4) Some people starts subscribing, but not too much. Turns out it's some obscure distribution for an Atari emulator.

5) Time pass and years later these people have moved on to better things, they reply to the reminder email saying it's fine now.

6) Feature gets removed safely.


Windows and macOS solves this by the much hated "user tracking" (aka analytics).


I'm not sure if they "solve" this. In this current case, this was solved by working with the person that need the feature, rather than just tracking their usage of said feature.


"The" person. How do we know there's only one? Well, in this case, the story was sexy enough to be transmitted all the way to HN. Also two decades have passed. But in general, the 'post removal notice on some list and hope people notice' isn't a good method.


Apple solves this by removing whatever it is they don't like anymore and expecting developers to get on with migrating to the new thing that replaces it.

Microsoft hasn't solved it, Windows is drowning in its own moat.


Halfway through the article I was wondering if it wouldn't be possible to write an ELF wrapper for a.out binaries. I was pleasantly surprised to not be the first person to consider that, and that it turned out to be easy to do.


I like the binfmt_misc suggestion in the LWN comments. The fact that I could just run a jar file like it was an executable made me unreasonably happy when I first learned about it. It was one of those things (like X11 being network transparent) that made Linux seem almost magical to me as a 90's teen. Anyway, this seems like a perfect use case for that functionality.


Even crazier: you can use binfmt_misc with qemu-user to transparently run Linux executables compiled for other architectures. Debian (and Ubuntu) packages the required configuration as "qemu-user-binfmt".


For what it's worth, qemu-user-static is more general, e.g. for a foreign container image that won't find the shared library. (Wheeled out when it was said people needed to build ppc64le images on x86 boxes and it was impossible, rather than trivial.)


That's exactly what Docker does behind the scenes to run foreign arch images, see https://github.com/tonistiigi/binfmt


Yeah, and a.out format is trivial enough that it wouldn't be much code to load these things.



No, that only works on ELF and was intended to make it easier for the Linuxulator to distinguish between FreeBSD and Linux ELF binaries.


I wrote a french translation to this article[0] without noticing I was in violation of their copyright (my bad).

Quickly sent a mail to LWN and got their authorization in the next 10 minutes.

Big up to the team and awesome content! This motivated me to buy a subscription to support them.

  [0] - https://linuxfr.org/users/linkdd/journaux/lwn-une-porte-de-sortie-pour-a-out-63380a9d-20e0-44de-979d-74afcb6f3910


LWN.net is absolutely amazing and well worth every cent.

Just to pick one thing, their kernel index has been so useful when I run into an area of the kernel I need to learn about: https://lwn.net/Kernel/Index/


> The use case is running an old set of tools to build programs for the Atari Jaguar.

This seems really weird as an a.out removal opposition. If you're already using ancient tools to compile programs for ancient hardware, why do you need the latest kernel for it? Why not keep a VM with a system dedicated for that task which you'll never need to update or change in any way?

There's likely a single digit number of people who actually have that use case, right?


Because new kernels are better than old kernels.


I think the question of my GP is: Aren't better new kernels better for most people than slightly worse new kernels that are better for only very, very few people? :)


If your point is that a.out support makes kernels worse for everyone else, you can simply compile it out and it will not affect you.

The issue is keeping the code in tree. Its inclusion doesn’t affect the quality of the kernel, its just a maintenance burden.


In what way specifically are they better for the use case of compiling Atari Jaguar software using decades old toolchain?


If I want to run these binaries on my daily driver, it’s much easier if I don’t have to start up a VM. I want the latest Linux kernel on my daily driver for hopefully self-evident reasons.


I'm curious how much overhead and developer-time keeping a.out support adds?

I'm generally of the mindset to keep something until there's reason to remove it.


Perhaps it's potentially an attack vector for security holes, and removing it means no more worries?


ELF seems like a bigger attack surface due to its complexity.


It would be impossible to quantify, but all code has a maintenance cost. Removing a.out was brought up several times. That strongly implies its overhead crosses the threshold to be noticed.


> Still, Linux used a.out for some time, until support for the newfangled ELF format was first added to the 0.99.13 development kernel in 1993.

When I got Slackware 2.0 as my first Linux distribution, the CD-ROM box had a sticker asserting "now with ELF support!" kind of statement.


Excellent and publicly available book on this subject if you find it interesting: https://www.iecc.com/linker/


Wow... This level of backward compatibility + dedication to support (even if for that rare case) reminds me of the good old days of Windows!


I'm surprised nobody asked for the title to be changed from "A.out" to "a.out".


Kees Cook is a legend. Actually wrote a new ELF wrapper for a.out binaries !


Once again, XKCD[0] proved to be right

[0] https://xkcd.com/1172/


Thank you sir. I had to laugh so damn hard. It‘s such a nice parody about these discussions in the kernel which I actually love. I would appreciate if more software producers would stop breaking or removing features because they think in percentages of users who will be pissed etc. on the other hand it can become quite silly.

I had a good laugh thanks again :)


What… what’s bad about a.out? It seems to be the default executable produced by GCC? I read the article but I couldn’t quite understand what the problem is to begin with. Is it a seldom used or outmoded executable format that is used for nothing but the default output or what?


From the Wikipedia entry on a.out:

> "a.out" remains the default output file name for executables created by certain compilers and linkers when no output name is specified, even though the created files actually are not in the a.out format.


So you just rename the file, right? I honestly don’t understand why this is a big deal. Though in general I have always thought it was weird that UNIX defaults to that name instead of the name of the main source file.


GCC a.out is an ELF file (except in extremely rare circumstances). "a.out" as a executable format is like ELF, but different and practically no longer used. This is a proposal to remove the latter


Ah! Thanks.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: