I am one of the programmers that like to make dissectable code. In fact, I've included a whole manual, with my most famous project, that is entirely about dissecting the code. [1]
And I will certainly do the same in my current project.
Some of the most delightful moments I've had are when someone emails me with questions that indicate they've read my code, such as mentioning specific comments or asking about something in that dissection manual.
I consider it a badge of honor if people think my code is good enough to study.
I think if we sat over beers we’d find some common ground.
I sometimes like to think of good code as short stories. Badly factored code is like a novelization of a story. Too long with too many side quests and you only find out at the end that the butler didn’t do it. Whether that pleases or infuriates is a gamble.
Excessively terse code that requires broad life experiences to unpack? It’s interpreted by every observer to mean something different. Like a haiku, or even a song, the story being told is unclear. Beautiful, moving, but unclear.
I much prefer short stories. A little flavor text, but no time for more than two clever ideas at once.
I've often thought that code should tell a story about what the system is supposed to do and if the story is not clear, that will probably be reflected in a system that does not work well.
code is such a weird thing. Its business logic, its infrastructure, its short stories, its documentation. I also think of it like investments and debts
Code is code. Humans are the ones with the obsession of abstracting and categorizing things to make ourselves feel better. In reality, none of those things actually exist
Although that person was actually mostly right about me; that project is the only real success I've had in my line except getting married and staying married. (And quite frankly, that's just because my wife seems to like me more than I deserve.)
Family > computers. Besides, what portion of people get any of their programs pulled into even a single other project?! An accepted PR is more than 99% of the world will ever contribute to software development!
I still think it's literature, just that most of it is trash literature. But then again, I was an English major and was trained on how to dissect literature. I realized a while ago that there was a book and a method I learned in English Lit 101 that I think informed how I read code. It's called The Critic's Hornbook, by William Dowling [0]. In short, it's a methodology for meticulously breaking down the meaning of every word, phrase and reference in a text, without resorting to any subjective interpretation. He showed us how, with proper understanding and research, you can find a lot of latent information about what's going on in the text. I find myself doing something similar when I read code, and when I write code.
Code reading is also archeology. There are always layers of code accumulated over time, and you can start to see the minds of the people from years ago. And if you understand the constraints they were under and the assumptions they had, then it can help to understand "Why the hell did they do it _this_ way?"
From a business perspective, maybe it should be trash literature. (Edit: I mean in the "It was morning and the sun came up, which made it less cold." sense.) When I code, I try to write all of the comments first, then fill in the implementation that matches the comments. It's typically very dry prose, and you get things like "Walk through the collection and pick out all of the orange widgets", but at the end of the day you're not trying to stir emotions with this stuff, you're trying to make the next person have less of a headache understanding it.
Clear and concise prose does not "trash literature" make. Vide Hemingway.
Code absolutely should be readable and as obvious as possible, but that itself can often be a sign of virtuosity; you've broken down the problem into such simple steps that reading it is easy.
To paraphrase Mark Twain, writing a short and clear program often takes much longer than a convoluted one. "Trash literature" is often full of unnecessarily adorned prose.
> it's a methodology for meticulously breaking down the meaning of every word, phrase and reference in a text, without resorting to any subjective interpretation
How many words have only one meaning? In the presence of words with multiple meanings, how does one non-subjectively choose the meaning to apply to a particular usage when dissecting a particular work? I'd love to hear more.
It’s been a long time since I did it at school, so I think what I do with code is more “informed by” that process than a duplication of it. But one specific thing I remember was that you look at context. One example I remember from school was reading The Love Song of J. Alfred Prufrock, and how by describing the fog as yellow and a few other details, we could infer that the setting was almost surely London, although that’s not stated. Another fun thing was reading old literature at looking up word after word in the OED to understand what a given word meant at the time that the text was written. This was a mind-blower. All these sentences that seemed very pretty but kind of confusing suddenly became very clear when you interpolate the older meaning. And then every clue you get builds up more context that you can use to infer other aspects about setting, character, etc. So, I think there is a fair amount of inference, but it should always be logically and objectively grounded.
I have a crazy theory that when we read code, we actually read it as a narrative, because that's how our brains are wired. And we can't help it. It's baked in.[0]
Good, readable, code tells a story. (Here is the protagonist(s), the value(s) you're starting out with, next is the journey, the transformations on the data, then the conclusion at the end is the return value.) This is why I like functions more than objects. Each function is a self-contained story. Bad code is like reading a stream of run-on sentences with no coherent beginning, middle or end. It's like hearing a story from someone who gives you all the irrelevant details of whatever they're trying to tell you and doesn't get to the point.[1]
I often like to point out that our instinctive need to see narratives (or aversion to the inexplicable) is so strong that we construct stories even onto situations where we know (in other moments) that the story is wrong.
Gambling is a great example, and it's easy to see when people are flipping coins while insisting that recent past flips determine the next flip because of "a hot streak" or "the other one is overdue".
As someone with formal education in both literature (including lots of analytical essay writing ) and computer science, this resonates strongly with me and is a great addition to my toolbox for thinking about code.
I'm wondering how useful it would be to deploy among my colleagues who don't have similar backgrounds.
Not yet. I was working up to it at my last job, but I was laid off. What I’d love to do is something like John Ousterhout’s software design class. He admits that he based the format on an English writing class. People would write code, then their classmates would read it and make comments and then they’d revise it. I’d love to give this a shot in the real world.
If you’re up for discussing this, I’d be happy to talk about it more. My email is on my profile.
They're similar because they are both written by humans looking to project meaning into the page/screen. You're analyzing the authors intentions and perspective, not just what was written.
Not too sure about this part. Lot of program are written then used, but never revised. Code is a tool and not an end in itself.
Honestly, do you want to revise your code on and on ? I didn't. I still don't want to. Then when I have to, I'm sad that I didn't add more comment. Or better code. Like code I don't have to improve. Or come back to... Well. F* ;)
But there's more to literature than just being written. A list of ingredients isn't literature. The time table for the train isn't literature. Newspaper articles aren't literature.
This is a bit weird to me since I never thought someone would just go and read source code of some program from start to finish. It would be something like reading an encyclopedia starting from the letter A.
That said, I personally DO read code. Besides code reviews that were mentioned in the article I read the code that my code depend on. Especially when I want to better understand some API, I read not only the headers, but the actual implementations of the functions that I'm using.
Sometimes I read code to better understand some algorithm. I only understood how Transformer architecture worked when I read some of its implementations.
So yeah, reading code is cool. We just don't read it as we read books.
> someone would just go and read source code of some program from start to finish.
Well, one has to start reading from somewhere, and the start seems like a good way to start, is it not? Well, actually, no, it's not, because you have to waddle from main through like ten levels of procedure calls until some actual work starts being performed, but if you don't know where exactly the interesting parts are the only other options are either try to grep (doesn't work reliably in repositories where the authors like short and cryptic names) or to read source files until you find the relevant place.
> It would be something like reading an encyclopedia starting from the letter A.
I actually did something similar as a child, except I'd just open it on a random page and start reading. There is nothing really wrong with that as a recreational reading IMHO.
Yeah the encyclopedia analogy is actually better than GP probably intended because both encyclopedias and source code can take on a branching, fractal information flow. I obviously don't read an encyclopedia start to finish linearly, but as a kid I would start on a random Wikipedia article that caught my attention, and just keep clicking blue links, backtracking whenever I've gone to far. Reading a new codebase is usually not so deep, but it's a similar process of previous definitions.
I used to do this with the paper encyclopedia too... Id sit on the floor surrounded by books open to a page with slips of paper, pencils and whatever else was around as markers. I'd try to keep track of where i came from so i could backtrack through my side quests.
When tabbed browsing came around it was a relief, the browser did all that for me well enough. Unsurprisingly I frequently find out how many tabs is too many for my current computer (less frequently than 20 yeas ago tho since browsers and computers are a lot better than they used to be.)
I remember there was a blog post saying that everyone should write an ARCHITECTURE.md file as well for their projects. It’s content can be a great starting point(s).
yeah, weird. I read a lot of code. I often read code to exactly understand what's going on, how specific functions are used or how the API was intended to fit together. That may mean that I read random code on github, code of my dependencies etc.
But it's not a continuous reading of code. I don't read the code of my dependency from start to finish.
I think perhaps this is not even a valid concept...
The simple case of a program with a main maybe has a "start"? (Although due to program loading, there is quite a lot of code that executes before main, often not your code but sometimes static initialization, e.g.).
In the case of async programming (queues, services, interrupt handlers, etc.), a "start" is pretty arbitrary, and needs to be defined some other way (start of a data scenario e.g.)...
And in all sceanrios except pure batch processing, a "finish" doesn't exist, except in the sense of a "quit" or termination scenario, but not in relation to rest of the code base...
This is actually one of the major features of Knuth's tangle/weave: you write the code as a book or article and give information to a compiler that puts everything in the right place for the program.
> The biggest lesson so far is that code is very dense. A half hour presentation is just enough time to present maybe a dozen meaty lines of code and one main idea. It is also almost certainly the case that the presenters, who have to actually really dig down into a piece of code, get more out of it than anybody. But it does seem that a good presentation can at least expose people to the main ideas and maybe give them a head start if they do decide to read the code themselves.
Which exposes the disconnect: A computer program is not about expressing an idea; it's about doing a thing. It's a very specific set of instructions that explain how to do a thing; often having to handle corner cases that a human would not need explained to them.
I agree with you to some degree, but it needs to be said that _good_ code is very expressive. It should reveal the developer's intent clearly. "Doing a thing" is fine for amateurs or a quick script, but for large enterprise application development, it doesn't go nearly far enough. Good names, good function arrangement, plus everything else we should be studying needs to be applied well. If done well, it certainly should not take a half hour presentation to present a dozen lines of code. That tells me that the code is too complicated, too dense, too many levels of abstraction together, too much I have to keep in mind to understand some main idea. Very dense code is, more often than not, bad code.
I don't think reading codebases is an exercise worth doing often, but all the same good code should read like well written prose. It's difficult to appreciate that if an individual hasn't poked through a number of codebases of varying quality.
I think the problem with this assertion is that over time "good" code often ends up littered with important conditionals to handle cases that upset the general readability and expressiveness of the initial delivery. Imagine a FinTech working on some kind of trading platform. It seems simple and clear at first, but over time more and more safety mechanisms, edge cases, regulatory obligations, and all kinds of other things need to be added -- and often they need to be added _now_, which means that readability is not the primary concern. I think that's the point of the article; readability is often not the most important thing.
I don't think that suddenly catapults the code into "bad" code, and in fact it's this kind of accumulated wisdom that makes full rewrites so famously expensive. The initial core of the idea might be able to be expressed in a simple and beautiful way, but over time it turns out that almost nothing is truly simple, and complexity accumulates. But it's good complexity, it's important to the business, and it doesn't mean that it's bad code.
Complexity and quality can be completely orthogonal. When they aren't, complexity and quality of code are proportional, which is a smell. Readability is often cited in lists of what makes good code, and rightfully so, but it isn't the most important thing. The most important thing is the ETC principle; that the code is easy to change.
You bring up a good point about something that needs to be added NOW, which is a project management/business/cultural concern and something that needs to be addressed. Compromising code quality for speed is a classical trade of and is probably the reason most professional developers on HN hate their projects.
Funny you bring up that example! I do work at a FinTech org and my 2020 was spent working on a trading platform frontend. (Hell of a year...)
I agree that complexity and quality can be unrelated. I think that quality is often misinterpreted as beauty, or brevity, or cleverness — and those are not the same as quality, in my opinion. Often a long function with a bunch of error and edge case handling is seen as ugly, and thus low quality, and what I’m getting at is that an ugly function can also be quite high quality.
And heh yeah it was on my mind because I just spent a few years at a FinTech too — and a lot of that code is incredibly sensitive, and must contain all kinds of “ugly” condition handling that I don’t think is really low quality, it’s just a complicated problem space that requires a ton of attention to detail. And details can be less fun to read, I think we all can get seduced by code golfing and making things prettier, which is again not the same thing as better.
(Which is I think the point of the article — readability and prose is perhaps key in literature, but not always in software.)
The real trick is doing both. Code that reads like a short story while including error handling and edge cases. This is achieved in a practical way by first keeping it as simple as possible only implementing strictly necessary abstractions. When the code reaches a "tipping point" then refactor. Rinse and repeat. If the code is structured reasonably that refactor should be mostly limited to the trouble spot.
Lots of good responses here. One thing I'd like to point out:
In my prior job I joined early enough that I was able to keep the codebase very readable. As the team grew, I started running "office hours" to help allow newcomers to onboard. We were a globally distributed team, so it was hard to have the casual interruptions that happen when most of the team is in the same place at the same time.
In my current job I inherited a very crufty codebase, and I've spent a lot of time improving readability. Working with .editorconfig helped; and that initiative took ~1 month!
There are a lot of habits that can be learned by reading code; but mostly the reading is to look at style instead of function or "ideas."
One example of a good habit: My prior role involved a file synchronization product. We followed a naming convention whenever an object represented a file or directory. Merely naming a variable "file" or "directory" would be very confusing, because it lost a lot of context: Is "file" the entry in the SQLite database? Is it the in-memory type that we used to communicate known state about the file? Is it an object that's used to get things about the file from the file system, like last time accessed?
But, and this is where code != literature: The file synchronization product wasn't an "idea." The program was a very detailed set of instructions on how to synchronize files. It handled all the corner cases, because the computer has no ability to make assumptions when it follows these instructions.
100%. A lot of new programmers who didn't program back in the day have been used to this idea with abstractions and "clean code" and that causes so much issues these days. Instead the mental model should be how does the computer achieve the thing that I am trying and how can I make that happen with the fewest lines of instructions as possible. A computer program is an instruction set as you rightly said. It is not a long winded story with many "nouns" (objects).
> Instead the mental model should be how does the computer achieve the thing that I am trying and how can I make that happen with the fewest lines of instructions as possible
This is true for performance-critical code which will be written once and never modified or reviewed. For "living" code - code in a system which will be updated and extended as it is operating - it is crucial to at least pay attention to the human-readability of your code, in order to improve the efficiency of the future developers. Both extremes of the "inscrutable optimization" and "prolix verbosity" spectrum are incorrect - the correct midpoint is a judgement call.
It's often debated but in my mind, you know you're talking to a "senior" dev when nearly every response they have to a question is "well, it depends..."
Every software project is different since software is a physical manifestation and codification of human process. And human process is... messy as all hell and doesn't like to be constrained by a lot of rules no matter how much effort and management goes into things.
So, I prefer to advise teams build software that can be replaced easily, however that works. Isolate the points of change, make the system components replaceable (because they will be), but keep mind of where things need to be focused on efficiency and less on abstraction. This applies to a single code file all the way up to large scale distributed systems.
In larger orgs, this helps since inevitably we all fall into the trap of Conway's law, and re-orgs inevitably lead to refactoring of systems along new ownership/communication lines. So, the right way to do something will always conform to the unique situations in which a system is developed -- "it depends"/
I think abstraction is key to efficiency. In order to have efficient systems we need to be able to prove properties of the software we write are correct with regards to our specifications. This is what abstractions do for us: they enable us to think in new semantic layers that are precise in the mathematical sense.
This isn’t my idea, I just agree wholeheartedly with Conal Eliot.
Resiliency is what we get when we want reliable systems but we cut corners with proving the correctness of the software we write: we add process supervisors, memory managers, tooling, exception handlers, etc: things that all cost us some efficiency in order to make our systems reliable enough to be useful.
If we want efficiency, at scale, we need tools (languages really) that allow us to write proofs and generate code from these specifications.
The trick for me though is that the foundational assumption on which to build our "proofs" change often and quickly. This creates drag on any software project to keep up, and becomes untenable if the abstractions are wrong (i.e. the interfaces for components aren't adaptable). This is what I meant by designing with the assumption that one component could be replaced. To me it's a about the challenge of designing to reduce the need to "cut corners with proving ... correctness", if that makes sense.
Totally makes sense. It's an expensive enough process, still, that doing it at the scale that "non-verified" software is written at would be basically impossible. It makes sense that we've invested enough times in resiliency over the years that computer systems work as well as they do, let alone at all. We've squeezed a lot of efficiency out of our systems even without correctness.
However I suspect we're beginning to reach a tipping point where it's becoming too expensive to avoid correctness and continue down this path of resiliency. At the scale of data-centers even small gains in efficiency have big effects. It's hard to get those kinds of gains without focusing on correctness.
I'm hoping we'll find practical ways to compile dependently-typed programs and build theorem proving tools that scale to modern software practices and teams.
What does "build software that can be replaced easily" mean, if not "ensure the abstraction for this component makes sense so that people can replace it if needed"?
Together with your followup qualification of "keep mind of where things need to be focused on efficiency and less on abstraction" I don't understand what you're trying to say at all.
The idea that software needs to be designed to be replaced (as opposed to maintenance/improvement) seems to be very specific to some particular company's organization practices ("re-orgs"). Perhaps my objection to your generic "advice" is that your "it depends" is not "meta" enough to cover the cases where these Conways or whatever re-org crap isn't a recurring thing.
For example, a person who's worked on too many failing projects might advise others to just write crap while looking productive by gaming LoC/commit/bugs fixed metrics, because all projects eventually fail anyway, and nobody cares about code quality if they discontinue the project.
Another person who's stuck maintaining the crap they wrote themselves 10 years ago might go around telling others to make sure you get the first version right so that you don't end up in a situation like them.
I'm not really sure which situation is more prevalent TBH. "It depends" indeed.
I think most people naturally write pretty readable code. When I have seen messy code, it is usually because of poorly thought out abstractions rather than a clear instruction set with a clear function name. "What does the computer need to do?" is a powerful idea imo.
The compiler doesn't know how to reduce three database round trips into one database round trip.
Modern compilers are awesome, but they still can't read your mind to understand your intent and suggest changes that would objectively change the program, but do what your intent is in a much better way.
Perhaps someday artificial intelligence could, but I think that is a long ways off.
The compilers job is to turn source code into machine code. You are still writing human readable source code. I am not suggesting we start writing everything in numeric machine code.
> The compilers job is to turn source code into machine code. You are still writing human readable source code. I am not suggesting we start writing everything in numeric machine code.
Human-readable is a spectrum. If something can be more human-friendly while being compacted into an equivalent instruction set at compile time, then what is the benefit of writing extremely terse code?
Less maintenance the fewer lines of code. While it shouldn't be unreadable, it should be succinct. Basically using abstractions sparingly, letting the data flow through the system and not wrapping things within each other are all great ideas. If you are instantiating a bunch of objects, you can simplify things.
Maybe there are domains/languages/paradigms where using the smallest possible number of lines to achieve a given compiled instruction set really makes things more efficient, but to me it sounds more like code golf. Using 1 very terse line to accomplish the same thing that 3 readable lines can accomplish should require the same exact amount of maintenance but would be much harder to read and modify.
Code should be more like a children's popup book with little tabs to pull on to make things move around.
When I look at a foreign piece of code I want to see all the runtime values inline in my editor and a time-travel debug session with a slider and a stack trace of the function under my cursor, and a trace of the _value_ under my cursor showing every transform that happened to it, and all common data structures and their algorithms visualized and animated in appropriate diagrams. And available in all our tools instantly - like when viewing code on Github.com.
I really think you could figure out what is going on with most code if you simply move a slider back and forth and watch everything that changes. And it would be much more fun with a physical knob to turn (e.g. https://www.tourboxtech.com/ or https://www.binepad.com/product-page/bnr1-v2). Knob-driven development :D
Heh. I think you have the absolute opposite viewpoint from Rob Pike, who said of syntax highlighting [1]:
> When I was a child, I used to speak like a child, think like a child,
reason like a child; when I became a man, I did away with childish
things.
I'm with you! That sounds awesome. I enjoy not only syntax highlighting but also type annotations and doc references on hover. I'd love to see even richer and more interactive stuff.
The main argument appears to be that code is best dissected and understood, whereas literature is not? There's a fair number of philosophical texts which would disagree. Even "simple" engineering/physics/mathematics/science books often require a deep understanding, even though they're describing something, not providing directions.
It's also often important to understand the context in which a piece of literature was written, just as it is important to understand libraries and the runtime context.
In other words, I think it's fine to think of code as literature, just perhaps not the same literature the author frequently reads.
> The main argument appears to be that code is best dissected and understood, whereas literature is not?
There is no reference to "best". The issue is why codebases are not read as often as developer culture has doggedly engrained that it should be. A codebase is not enjoyable as a narrative or mechanically instructive, as you have implied due to the large number of confounding factors and approaches available. Literature is primarily a form of communication, while code is primarily a means to perform. It does not fit as "useful" literature.
Most of the time, reading a codebase is like reading a manual that's faded and torn and was 500 pages long (now 486). You might learn something, (fast inverse sqrt and such) but this is as incidentally useful as a pithy wisdom buried on page 322 of a VCR Manual. Nothing about why they wanted to specify 2 buttons in tandem to eject the cassette^ or why it has cooling off instructions^^.
^ The build system/factory was making a similar device that required 2 buttons so management saved some money.
^^ The rewinding mechanism tends to heat and kill the whole device for a time. The original rewinder didn't fit the case 4 weeks from production, so a primitive replacement from a vendor was used which has this problem.
The right analogy is not to literature, but schematics of the architectural or mechanical kind. Those can also be beautiful, works of art, and objects of study. Sometimes a clever blueprint will allow seeing some aspect of construction that isn't otherwise obvious. But the prints are not usually the centerpiece for a discussion in their own right. The object depicted - a building or whatever - is the main focus.
Going back to the literature analogy, reading groups are for fun (usually). Spark ideas and discussion. Code is too low level. It's like discussing the author's grammar and word choice -- sometimes necessary to understand the meaning of the text! - but most would prefer to discuss high level themes in the work.
Even then I would say code is not the blueprint but an artifact. Specifications of the precise, formal variety are more akin to blueprints [0]. They allow us to see the design of a system at a high level while eliding unnecessary details.
Source code for programs are limited in that they can only express ideas about their local state. Barring high-level languages with dependent-type systems it is nearly impossible to model and check that your transaction consensus protocol works as intended without getting into a ton of gory detail... much more so if you want to check global properties of the concurrent system you're designing.
I think Sussman was right: we often start with a pure model of what we intend to design but the artifact we produce, the source code of the actual program, is usually full of details that only matter to get the program to produce the expected behaviour when executed... and all that extra detail tends to drown out the core ideas.
Along these lines, the perfect programming language would be one that eliminates all boilerplate that can be eliminated, one that leaves behind "pure design".
"The result of the execution of the blueprint is the building."
> As I prepared my presentation, I found myself falling into my usual pattern when trying to really understand a piece of code—in order to grok it I have to essentially rewrite it. I’ll start by renaming a few things so they make more sense to me and then I’ll move things around to suit my ideas about how to organize code. Pretty soon I’ll have gotten deep into the abstractions (or lack thereof) of the code and will start making bigger changes to the structure of the code. Once I’ve completely rewritten the thing I usually understand it pretty well and can even go back to the original and understand it too. I have always felt kind of bad about this approach to code reading but it's the only thing that's ever worked for me.
This has also been my experience over the last thirty-odd years, whenever I have to take over maintenance of someone else's code.
Interesting to consider this in the light of how Knuth intends literate programming. Specifically, his version of "literate" is specifically dissecting the code for presentation. Sometimes you show broad strokes of the overall structure of the program you are presenting, but most of the time you dive into the details.
It is funny as most of the time the big complaint is that there is still code there. It is not uncommon for the code to start with a section showing a basic C file with some standard includes at the top. And those will sometimes just have a shorthand C comment embedded in them for why they are there. This seems to frustrate many that think the entire thing has to read like a novel, but it makes perfect sense in the "dissected program" presentation of the code.
There was a fun video I saw once of Knuth "critiquing" the code of his class. He called out liking the way first person was defined for use in the prose section and would spend time talking about the different stylistic choices of things. I'll have to try and find that again.
I found https://www.youtube.com/playlist?list=PL94E35692EB9D36F3, which is a giant list of all lectures. I can't remember the specific one I was thinking of. There are a few there that are titled writing and literate programming. I'm guessing it is one of those, but I don't have time right now to watch them and confirm.
I think it would be more accurate to say the "most code is not literature" since it is not documented in a fashion which makes it readable --- and it would be more interesting if the examination/re-writing of the code was done in a literate fashion.
The largest project I ever did (typesetting back-end for an on-line interactive ad design system) was done as a literate program, and over the course of years of maintenance was far easier to work with than other smaller code bases which I hadn't taken the effort to so structure and document.
I like to distinguish between Programming and Software Development, even though I don't have precise definitions for them. Programming is what you find in code samples and books and tutorials, where the code is clean is readable, and is what most of us fall in love with. Software Development is all about diverse and messy and malicious inputs and outputs, and the code is filled with the hairy edge cases[0], and it is not fun to read, but it is what most of us get paid for.
I argue that (good) literate programming can make code literature. Your program is told as a sort of story and ought to contain background info, thought processes, and rationale for decisions. Additionally macros can be used to reuse code at build ('tangle') time that your underlying language may not otherwise support.
For example if I have some constants i need throughout a python program, I might put it in something like constant.py during normal programming, but if I were literate programming I could have a section called "Program constants" and dive into each one, referencing them with macros throughout the program. You do have to be careful because the macros can make debugging confusing since you are now working with two languages instead of 1.
You can also put stuff that is irrelevant to humans out of the way of the most important stuff.
I realized something perhaps similar which is that math is not literature. The only way to really learn anything is to go back and forth between reading and doing. Reading through a giant piece of code feels a bit like reading a math book and not doing the exercises. You'll get some good ideas and overview, but you won't really get "it".
I recently wrote an ODE IVP integrator in C#. In the process I read the code in Numerical Recipes, Scipy and DifferentialEquations.jl. That combined with my own house style and some gyrations for allocation-free code resulted in the final product. A lot of what I took was just how to abstract the code into a few different layers to make it a bit more pluggable so that now I've got swappable BS5 and DP5 integrators. That layer structure came mostly from scipy. Some of the details of DP5 and the PI stepsize control came from NR. Not a whole lot came from the julia code since it is honestly kind of a mess due to what needs to happen for performance there. I did dig into it enough to find out that they're just using bisection with left/right endpoint control for events, so I used that which was much simpler than trying to sort out what endpoint control with Brent's method looks like. Oh I took some tips of the overall shape of the events API from the julia code so that the events get the integrator itself injected into it, so events can tweak the integrator shape. Still got some details left that I need to clean up and flesh out the events API, but it is way better code than the slightly OOP'd wikipedia-quality code that I was using. It should be flexible enough to drop other RK methods into it and even drop non-RK IVP solvers into it.
Reasonably happy with the result and it is a lot better for having looked at other code and not continued to pursue NIH syndrome.
> Seibel: I’m thinking of the preface to SICP, where it says, “programs must be written for people to read and only incidentally for machines to execute.”
I’m wondering if the initial statement in SICP spoke from a time that was similar to a 1977 McGraw-Hill textbook, “Introduction to Computers”, that I picked up from a thrift store recently. A significant portion chapter on developing computer programs is dedicated to methodology; planning, the development life cycle, systems design, instructing users, processing methodology. The next chapter is about flowcharts and tables.
I mention all this to point out how intrigued I am by what looks like the emphasis that was placed back then on clarity from the human’s perspective, be it programmer, user or anyone else involved in the software’s development and use.
Another thing that’s interesting to me is how old code like Fortran looks like it’s made up of mostly just words, compared to modern languages I’ve looked at that use a lot of abbreviations and special characters.
My knowledge of programming and its history is excruciatingly minimal, but this blog post and that excerpt sort of remind me of what I just mentioned. I’m not sure why exactly, or if there’s any credence to it.
> I mention all this to point out how intrigued I am by what looks like the emphasis that was placed back then on clarity from the human’s perspective, be it programmer, user or anyone else involved in the software’s development and use.
I'm not yet born in the 1970s, but I think the fact that "code is written for people to read" is something that people in the earlier decades knew intimately, and that we have "forgotten" in recent years.
It hinges on a very simple fact - we know that machine code is for machines to execute, and they had machine code for as long as machines existed. And yet in subsequent decades, people spent so much resources in designing/inventing programming languages, so much resources in writing interpreters and compilers -- it's got to be for a good reason, right?
Once you think of it that way, the reason is obvious. Code written in programming languages are for people to read (and write), otherwise we'd just work on the binary executable.
I guess these days in the stacks of abstraction, we don't even know what's running on the bare metal any more, and for novices it might feel like "abstractions all the way down", and the point that the abstractions were originally for human consumption might have been lost or forgotten.
The approach in 1977 was that the problems we address with computer programs were static. It was up to the system developers to understand the problems well enough to learn their requirements and then implement the requirements as code. Writing the code was the last thing you did, after the problem was entirely understood, documented, flowcharted, signed off by the client, etc.
Now we understand that the problems often change as the code is developed, or in response to the code we develop. Priorities and requirements change as the work proceeds. We start writing code early, solving small parts of the problem, and building up a solution almost organically, the way a tree will grow around obstacles in its pursuit of sunlight.
The name Fortran is a contraction of "Formula Translation" it was developed to be usable by mathematicians and scientists. COBOL is another language that looks very wordy, as it was developed to be used by businesspeople.
I've spent a lot of time trying to make sure my code is commented and easily readable; and I haven't found any of that time to be wasted. I've often had to go look at code that I wrote two years, or five years, or twenty+ years previously, and I've greatly appreciated the work I put in when I wrote it.
Well, it's not wrong, its just that it's premised on the definition of 'literature' being 'high brow writing for the elite' in the sense of something that can be dissected and analyzed for the sole purpose of revealing something about the human condition. It's true that computer program code is not that kind of literature and no fine arts major is likely to ever write a graduate-level thesis on it expounding the symbolic allusions to the Christian Bible and how it represents society's fear of death and desire for sex.
Source code is, however, literature in the sense that it is written for people to read. Not for computers to read. People. Sometimes other people, often the original author at a later date, but people nevertheless. By definition, that makes it literature.
completely agree. Even writing a book on "software literacy".
A small side note - most of us will know a manager who "used to code but stopped when went into management". But none of us know anyone who stopped reading and writing English (German, Japanese) when they went into management.
The difference might be that we don't have many companies that can be managed with code, or that a lot of people in management don't need to be there.
Edit: VW is an interesting case in point. It does not matter what a manager says, what the design document lays out. what matters is what the code says. And VW CTO knew this so he made the commits to the code base that tricked the emissions tests. That is management-by-code.
It is a new kind of literature, with the key characteristic being "dynamically sequenced reading".
Unlike literature, code is supposed to be read in an order most conducive to the reader's goals, rather than the author's goals.
Also, unlike literature, often new code is built upon old in an explicit and direct way. Although even in literature, one builds on top of old ideas.
I think Peter Norvig once commented on why Knuth's Literate programming didn't catch on. His guess was that, people usually have a specific purpose when they are looking into a piece of programming, and they want to read it in an order most suitable to them.
With LLMs, there could be dynamic code exploration helpers, I think. They could take our goals, and then introduce parts of the code that are relevant to the reader, with extended guidance...
And Practical Common Lisp, another popular one on HN. The ___domain name took me by surprised and I struggled to remember why it seemed so familiar; it turns out that PCL can be found in its entirety here, and I had used it years ago to learn CL:
Seibel mentions here that of his Coders at Work interviewees, only Knuth and bradfitz did a lot of reading of other people's code, and later his insight that Knuth's reading style was not passive, and more like a "scientific investigation". This post goes into more detail about a concrete example of "reading", very similar to the one in the post (about the Fortran compiler for Bunker Ramo 300, that he read without knowing the operation codes): this one is about a program where again Knuth reconstructs "a listing of Whirlwind opcodes based on the program listing". I think seeing the actual details of what "reading" code entails—the specific margin notes etc–is very illuminating.
It is also probably no coincidence that Knuth came up with literate programming (not only is he a writer at heart, he was a seasoned reader of other people's code, that too machine code!), and also somewhat addresses the complaints about Knuth's own examples of literate programming seeming unreadable: his threshold for what counts as "readable code" is different; code is not expected to be read like reading a novel but more like (his way of) reading a technical paper.
What's interesting to me isn't code itself, it's APIs and user interfaces. For any given API there's multiple ways to implement it. But they often perform the same.
If my goal was "Good code" I'd probably review and debate the value of each possibility. But I'd rather have any random trash behind a well designed API with unit tests, and then optimize it, instead of make a beautiful implementation with no real abstraction or tests or profiling.
I'd rather have average code in Dart or Nim or Rust than great code in C, unless the performance was too critical for that.
I'd rather have a python stdlib call than some excellent original code.
And if I think the actual purpose of the code is stupid, I'm probably not going to care about how it's written.
Reading code seems in theory like a worthwhile activity. But without hard science how can we be sure the time wouldn't be better spent reading actual literature and reminding ourselves that there's more to the world than code?
Programmers write code a lot, and read it too, it's part of the job... Do we really need to do even more of that?
If there's something particularly interesting I'll read it... but most code is not written like literature. It's written to implement something without anything surprising or innovative. Literature generally makes you feel something other then "Yep those are definitely words".
When it comes to reading code, the real question isn't just why—it's about when we do it, what we read, and how we go about it. As programmers, we all know that reading code is valuable, and we often tell ourselves we should do more of it. But let's be real, life happens and we don't always get around to it. One big reason? Time—or the lack of it. Unlike writers who can afford to read day in and day out, our job is to write code.
So, when do we actually read code? Usually, when we need to. It might be diving into an API's documentation or snagging a code snippet from Stack Overflow to fix something. But here's the catch: once we figure it out, we tend to forget it. That's why if you asked me what code I read last week, I'd probably draw a blank.
Sure, we do some reading here and there, but I think we should step up our game. When we solve a problem, we pat ourselves on the back and move on. It worked, right? But here's the thing: we can learn a lot by peeking at how the experts do it. Think of it like reading a book. We don't just skim the words; we dive into the plot, dissect the characters, and think about what makes it tick. Learning from others' code isn't all that different. When I started learning F#, I was blown away by the variety of solutions out there, all doing the same thing in completely different ways. Comparing them with my own attempts taught me a ton.
Bottom line, we should always keep that student mindset. Don't just glance at code—study it. Pick it apart, understand its structure, and figure out why it's built the way it is. Whether you're a seasoned pro or just starting out, approaching code with fresh eyes can unlock a world of ideas. So, let's make time for it, even if it's just a little bit. It's an investment in our growth, our skills, and our identity as programmers.
I think most code doesn't really stand on its own very well. In order to understand it, you need to know a number of other things that might not be obvious (especially if the code in question has bugs, or has been rewritten a number of times and has some historical relics from earlier designs that are no longer relevant):
What is this code supposed to do?
What is the general execution model?
What are the rules the implementation must follow in order to maintain correctness?
The way code is written, it starts first as a collection of ideas, and then it takes concrete form as code. Reconstructing the ideas from the code may be possible, but sometimes it requires some extra help.
Some things are best explained in English (or some other preferred native language) as ordinary written text. Some things are best explained as state transition diagrams. And some things can be expressed most clearly and concisely as code.
Bad part is - as a young dev I was into Knuth and Uncle Bob and formatting code in beautiful way so someone who reads it is not amazed but gets the ideas what and why instantly.
Worse for me was realization that we don’t speak the same language with people and making ‘a + b*c’ does not mean really much for people and they rather nag that I am not using spacing uniformly.
So I declared bankruptcy on intelligent code writing and became brute as others so I don’t even care how variables are named I will just debug it 5 times or more with different inputs and I will know for sure what the code does.
Keep in mind Knuth and Bob Martin come from old school where you had to wait long for getting your code run - nowadays I can run locally code multiple times and find out what it is doing on the spot … so my point code or beautiful code means a lot leas in daily work than 40 years ago…
What's interesting to me isn't code itself, it's APIs and user interfaces. For any given API there's multiple ways to implement it. But they often perform the same, and many of them are pretty OK.
If your implementation is bad sometimes it's because your API is bad. Or the whole project shouldn't be done at all. Or you're dealing with essential complexity and no matter how many hours you refactor, nobody but a mathematican will think it's easy to understand, and if you really want to improve at it, shouldn't you be studying math instead of reading code?
I'm not here to write amazing code, I want to make applications that can be maintained. If you pay attention to code all day, you're doing "pure" coding more than "applied" code.
After years of doing software maintenance of other people's code, my doc-comments have grown longer and longer. Software maintenance is often the kind of reverse-engineering challenge that Knuth describes in the linked article.
I hate ambiguities and mysteries in code. I make it my mission to provide a complete mental snapshot of the ___domain knowledge required to understand anything I write.
My doxygen doc-comments of C++ namespaces and classes have grown to the length of short blog posts containing a comprehensive introduction and overview to the topic, replete with links to external references.
Some programmers probably hate me for this. Maybe a few love me.
I read code for fun sometimes, but it's usually not whole codebases. I like to read PRs that I think are exciting in projects that I use, and then branch out to the context that they're invoking or changing as well.
I've never thought of it as something I do to advance myself as a programmer, though. I just kinda do it on impulse, and I'd still do it even if I found out it didn't really improve my skills.
Anyone else recreationally browse PRs in F/OSS that they use, kinda like reading the news?
> Since I had my epiphany we’ve had several meetings of the code reading group, now known as the Royal Society of Twitter for Improving Coding Knowledge, along the new lines. We’re still learning about the best ways to present code but the model feels very right.
Damn. Sounds like Twitter was a cool place to work, learn, and grow as a developer in 2014. It's wild to think how different it looks today.
One thing I've done in the past is put comment "tours" in the code so you can follow with your IDE's search. Each comment has the name of the specific tour, a number, and the tour guide's blurb. This lets the visitor follow a nice sequence, across different files if necessary... Sort of like the inverse of literate programming
I'm all for using comments, but you have to be on a team that's on board with them. I've joined more than one team that adopted the "the code should be its own documentation bruh" attitude, as if comments get in the way such that your IDE can't just collapse them all if you don't want to see them. In that case, I'm usually forced to give up. Lesson learned – always ask for the engineering team's stance on comments when interviewing!
I tend to treat code more like poetry or music. Its a shorthand for instructions telling the computer what to do. Depending on the language, naming, structure, and comments the original thesis may be lost or hard do understand. And in isolation short snipets don't often mean much or can change their meaning without context.
Recipe sites seem to think so, as they include a multi-page backstory about how the recipe was handed down from someone's grandmother and all the notable people she prepared it for. Oh and please buy our kitchen gadgets.
I would say so (not the same as a recipe). In this way a recipe and the ten commandments look the same. Reductionist comparisons are pretty easy to make. Sometimes code and recipes look the same, if the codebase is small enough and both omit the deployment/build chain/tooling in favor of implied requirements.
As far as I can tell, literature is a form of explicit communication to the naive reader, which intrinsically provides context to interpret it.
The README is literature. The code is not (common case).
A cipher is not literature, even if that cipher is breakable, resulting in literature. It may be useful to consider it literature if you know the cipher, but it is not until that transformation has occurred. You may recognize this if you try to read a story written by a child or mentally disturbed individual. The story might skip or be interrupted/incomplete or is otherwise nonsensically mangled structurally and narratively. Literature implies literacy to access the medium both ways (reader and author). Broken manuscripts are not literature, per se.
Literature is a soft term, so what I think holds no value other than how it might well describe the populist zeitgeist surrounding the term.
Even when we have to read the code to sort out some problem, or to reuse the principle at the core of it, reading is hard enough. One eggregious example that comes to mind was code released with all comments stripped out and largely obscure variable names.
With FOSS, the common desire is to build fix or modify, but frequently non-code explanations are sparse. That's one place where code reading is certainly desirable, and obviously the reverse engineering described.
Eh I would propose it is. Just not literature for everyone. You wouldn't go around saying something written in Japanese is not literature because it is a different language and follows different rules to read it. Same applies to a programming language, you need to be able to know how to read it to know what it is doing. You don't need inline comments, unless there isn't enough clarity in the code itself and someone shouldn't bend over backwards to sacrifice performance over readibility.
When writing was first invented, it was presumably, mostly for functional purposes. Keeping records; Calculating taxes; Government administration; etc. If one looked at writing at early stages of development, one could very well come to a similar conclusion that "Writing is not literature".
And that still is true, mostly. Writing generally is not literature. Just look at the comments under a Donald Trump video for example. Not sure anyone would want to read those. In fact I'm pretty sure <0.1% of the comments here in HN would qualify as "literature". But if you said "Writing is not literature", despite it being true, it's probably misleading.
The term "literature" is generally reserved to the finer styles of writing that in addition to whatever functional purpose it served, had an aesthetic to it that makes it worthwhile for others to read. Most literature comes from a long tradition where styles and techniques are developed to make the text more interesting and pleasing. The traditions and the corpus of literature grow stronger as time goes by. Of course, best works accumulate and are archived for future generations to study as well. This has gone on for thousands of years for "natural language literature". It has a head start of thousands of years.
Code has been invented for less than 100 years. That's nothing in the greater historical context. Imagine digging up ancient clay tablets of which 99% talk about how much bales of barley we need to collect to feed workers next year, and concluding that this language/culture is not capable of literature. That's what the article is arguing essentially.
But do we have evidence otherwise? Are there code so aesthetically pleasing (or at least interesting) that it might be read just for pure joy? Yes there is. Just not a lot. For example, check out the entries of IOCCC. Some of them are insanely beautiful, and I have no idea how the authors could have created them. They're the equivalent of carefully written poems in C, except they're possibly even harder to write.
So yes, while I agree that the article is probably literally correct in saying (vast majority of) code is not literature, I think it's wrong to interpret that statement as "code can't be literature". It obviously can be in a substantial way.
It's just that the cultural norms haven't caught up yet, and while people today somehow think they become more "refined" after reading for example Shakespeare (which I have no idea why besides gatekeeping attempts from the elite class), the programmer class hasn't yet taken over schools to shove quick reverse sqrt (0x5f3759df - ( i >> 1 )) down student's throats and call it a well-rounded education. (Tangent: yes, all the things you felt were useless in school are probably really as useless as understanding Q_rsqrt, i.e. not very useful unless you really need it, but 99% people don't really except for showing class status)
I grant you that I can't think of any equivalent to long form novels that might work in code, since telling and listen to stories are somehow baked into basic human psyche for which code has no equivalent, but literature is more than novels so I'd claim at least code can be a meaningful subset of literature.
I'll present a different take. There are some basic things you can do to make your code more readable:
- avoid double negations
- use grep-pable names and avoid string interpolation
- keep code duplication if that version is more readable and maintainable.
I'd love to see a longer list. In my experience, engineers too often code as if keeping the code as short as possible is what matters, instead of optimizing for code that is easy to read.
And I will certainly do the same in my current project.
Some of the most delightful moments I've had are when someone emails me with questions that indicate they've read my code, such as mentioning specific comments or asking about something in that dissection manual.
I consider it a badge of honor if people think my code is good enough to study.
[1]: https://git.gavinhoward.com/gavin/bc/src/commit/75cf2e3358b5...