From the article: "I no longer think about code lines as an asset to be accumulated, but rather as an expenditure to be avoided."
The obvious Edsger Djikstra reference:
[I]f we wish to count lines of code, we should not regard them as “lines produced” but as “lines spent”: the current conventional wisdom is so foolish as to book that count on the wrong side of the ledger.
Chiming in to say I agree with the sibling (dead) comment (not sure why it ended up dead).
Djikstra's words (and the article's) are clearly correct, but hardly revolutionary or contrary to modern belief today.
The real question is: why, despite a prevailing belief in modern software software engineering that red diffs are beautiful and cathartic things to be celebrated, and that the great spectre of technical debt is something to be excised, why do complex overengineered systems still prevail despite all this?
Software-as-art has lost a lot of ground to software-as-business, and in software-as-business, somewhere up the chain there is a management-level individual who isn't really sure how to measure either progress or code quality.
There are myriad tools designed for these people though, and they all chose the simplest (and wrong) solutions to the problem: progress is SLOC added and bug reports clsoed per day.
The height of the software-as-art era was the shareware days of the late 90s. Development certainly moved a lot more slowly overall, but there also tended to be better relationships between developers and their users and developers were not yet all working for people who tried to measure their progress by the only metrics they could understand. The entire industry has been a victim of Goodhart's Law.
> There are myriad tools designed for these people though, and they all chose the simplest (and wrong) solutions to the problem: progress is SLOC added and bug reports clsoed per day.
Is the SLOC counting still a thing nowadays, though? I've worked at companies ranging from 4-person startups to an old worldwide behemoth (all in the EU, though) and nobody even tracked SLOC, let alone evaluate developers on that; all the dumb metrics came from the issue tracker. Though I have the impression your real evaluation is based on how well you work with others, assuming you clear a low technical competence bar.
I wish! If there's a good book out about it, I'd love a recommendation too. I lived it though, writing my first "software" as a youth in the mid-80s. I've seen a lot of change in computing. And not all "get off my lawn" bad, just different.
I see software now as having three different faces: software-as-art, software-as-business, and software-as-engineering. The 80s and 90s had a lot of activity in software-as-art. I was mostly following Mac culture at the time, so I saw this through Bungie, Ambrosia, BareBones, and hundreds of smaller indie developers. The environment at the time enforced the software-as-art discipline, because downloading a program happened over 14.4kbps or slower, or 28.8kbps if your parents had good jobs, and came along with yelling that often sounded like, "get off the phone!" "But I'm downloading!" Installation media was 700K or 1.4MB, and that had to have all your code, art, sound, and other resources.
That's mostly all gone now. Bungie of course got married to Microsoft, which pissed off Mac enthusiasts way more than when Jobs announced a partnership with MS to get Office on the Mac. They've done well. Panic are the only old-school indie commercial desktop software developers I can think of off the top of my head that are still pretty true to their roots.
A lot of software of course just became free. I enjoy so much more high quality software at no cost now, which is really only possible because of the massive benefits of scale that have come from all the tools that have trickled down from software-as-business.
Software-as-business really took off. Apple, Microsoft, Sun, Oracle and others were always kinda big, but not the impossibly large megacorps that they are now. Most of them were still vulnerable to serious mistakes, and that was good, because it meant the users still had some power. Now, mistakes in software development don't really matter to these companies unless they impact 8 figures of quarterly revenue, and that's a process that has zero room for software-as-art.
Software-as-engineering is mostly stillborn, languishing in academia or a few places with rigorous standards (like NASA) or still finding its footing in modern DevOps. I still hold out hope that eventually this aspect will get some love too. I think it will be necessary, eventually, but maybe not until after I've written my last line of code.
Two things: lack of having a clue and perverse incentives. The first leads to endless piles of barely functional code; the second rewards that.
The 'real' programmer that spends a month on a long term maintainable chunk of code with a well defined API and some documentation to go with it in a few hundred lines of craftmanship will be at a disadvantage to the person that creates an unmaintainable mess half the time and that is promoted away for being so great at 'getting things done' before they have to deal with the mess.
> why do complex overengineered systems still prevail despite all this?
Well for one, IDE's help hide a lot of the complexity by offering rapid auto-completion for even the most complex of systems.
For two, it's not something you can read and truly grasp. Complexity doesn't necessarily have an objective measurement. Even the attempts we have of measuring it, such as cyclomatic complexity, don't really tell a full story.
You could reduce a somewhat complex string manipulation down to a regex replace. But that regex replace may be far harder for someone to understand what it actually does, even if they're experienced with that particular flavor of regex.
To be blunt, 100 lines of code could be easier to understand than 25 lines that do the exact same thing. It just depends on the reader & the task at hand.
Regarding the sibling dead comment, it is not uncommon that comments are dead from start, not because they were downvoted in this thread.
I don't know, but I imagine it is one of the HN automated moderation actions, based on previous downvotes of that user comments.
There is "vouche" feature to minimize unfairness in these cases. If you do not agree that a particular comment should be dead from start, you can click on the timestamp (to go to the permanent link of the comment) and click on vouche. I do this sometimes. I did now for the mentioned comment (from draw_down ) because I agree with you that it is a valuable comment and it is not dead anymore.
“I didn't have time to write a short letter, so I wrote a long one instead.” - Mark Twain
I think this applies to software as well. It’s easier to come up with complex solution, harder to create something simple that still meets the requirements.
IMO, it's due to the Action Bias. Human psychology prefers to take action rather than wait around. Making new code feels more active than deleting code (at least to me).
I can answer from experience. The reason is that generally people can find success by doing one or two things at a time, and then polish them through testing and bug reports. This often means that the wrong things are done for the right reasons, and the debt gets bigger and the value of the system increases. This makes big rewrites really hard.
This locks in the complexity, and inertia makes change expensive. I experienced this when I built a tcp socket over a request response library, and the reason was to capture the investments made in the library and suffer the pain of building tcp. This made the code complex, but it was reasonable to sell to stakeholders and achieve success on. Now, I wanted to just use a socket, but this took 1.5 years to do by slowly iterating on the right value proposition. I'm a big fan of slow massive rewrites, but it requires consistent and solid leadership to pull off.
I like lines-spent as a measure, but I wish I could figure out a hybrid measure which didn't hurt lines which confirm/narrow the state space (such as the "unnecessary asserts to show programmer intent").
Somehow being able to distinguish this would help avoid insanely complicated code (of which usually can be much, much simpler), but doesn't encourage anyone to literally ignore all error checking and do everything as some kind of chained-single-line-expression abomination [1].
[1] Something I often see in functional JS and regular python from time to time.
Dijkstra wrote that in 1988. Things change. The prevailing wisdom changes over time.
Is anybody really proudly counting lines of code at this late date? Is there anyone out there who doesn't love an all-red diff? Is there anyone who really thinks the prolix version of a function is better than the concise one, all else equal? Who is getting paid by line of code, or has to meet a a quota of LOC?
The hell it does. Dijkstra's words are as important, if not even more important than when they were first written. The whole problem is that we keep re-inventing the wheel and all associated problems over and over again. We never put the lessons learned the hard way to practice in the long run. New generation -> rinse, repeat.
I get that you have axes to grind, but that all doesn’t actually have anything to do with what I was talking about, which is essentially that nobody thinks “lots of lines of code” is a good thing anymore.
I work in a startup full of twentysomethings. None of them have read “Mythical Man Month” but they all know the bit about how adding engineers to a late project makes it later. It takes time, but these things find their way into the conventional wisdom.
It’s not 1988 anymore. Not everything changes, but some things do.
Or do you think we all still need to be reminded that “goto” is bad and we shouldn’t use it?
Consider that your personal experience is no match for what we have encountered in about 140 tech reviews of companies in various industries. I'm super happy to hear that you and your crew are clued in and doing well, and that this is all 'old hat' to you.
Even so, there are lots of other companies out there, some older, some newer, both large and small in regulated industries, e-commerce and so on who could very well use some of the common sense that pervades you and your team.
It's definitely not 1988 anymore. Contrary to popular wisdom the programmers from back then usually were clued in, rather than that they went to javascript bootcamp and started churning out reams of low quality code. Plenty of the software from back then is still around today. The Mythical Man Month writes about a team of reasonably competent professionals and the pitfalls they encounter, not about a bunch of clueless newbies.
That adding engineers to a late project makes it later is now an established fact, you would hope. And yet, not a month goes by without encountering exactly that proposition.
So yes, maybe we do need that periodical reminder that 'goto' is bad and that we shouldn't use it too, fortunately the number of languages that support that construct is dwindling, and in those cases where it is used it is hidden quite well without the nasty side effects that an uncontrolled jump into them middle of a bunch of conditionals could cause.
Old military saying that I repeat often because I have relearned so many times that shortcuts cause more work (read: problems and rigidity) in the long run.
Choose one: get it done right, or get it done right now.
unfortunately this does not hold up to reality, in many cases building the right stuff from the beginning is either too long, and you will be late to the market, or simply impossible since you don't have enough information from customers using the product.
Among the many wrong reasons to the Agile way of working being able to change and adapt to reality is a valid and important one.
I don't know which army are you referring to, but my army reverted to heavy bombing if smooth haven't worked.
“I prefer asserts over comments for this, since the compiler can also see them. The good news is, the compiler can also see that they don't do anything so a lot fewer are present in the binary program. Interestingly, a couple of them allows the compiler to optimize much harder. No, I won't tell you which those are.”
I appreciate the pattern of having “comments” that are equally useful to compiler and reader. And I’m impressed that some compilers actually use them to optimise the output. And I’m slightly baffled that the examples of these are apparently a secret. :)
Go and C++ (when using checked accessors like std::vector::at) can do the same thing. A precondition that satisfies all bounds checks will also eliminate them. You could do this as a post-compile optimization for any language under certain conditions.
The form of "assert" is not important. It is isomorphic with if (a>b) {exit}. The compiler can assume that thereafter a<=b. until one of them is modified.
Both loops get entirely unrolled. Its 5 instructions for each iteration in the first example, and only 3 in the second example. (To say nothing of the fact that conditional jumps are (usually?) much more expensive than add/mov instructions)
I believe the compiler can't do that, since the bounds check panic is a side effect that can be observed - the message tells which index that failed the check! For that reason, non-elided checks will not be reordered.
However, I believe a future Rust RFC could turn that around and validate the idea that in some cases such things could change execution order, even if it has noticeable side effects.
In practice a small function like this would be inlined, which gives room for further optimisations. At any point, if the compiler knows that all accesses are in bounds, it can remove the bounds checks. The trick is actually having it figure that out.
This was also used as a catch-phrase or motto in "Mr. Penumbra's 24-Hour Bookstore", which is a fun read, especially if you're familiar with the Silicon Valley tech scene.
10 LOC/h is spectacularly good over a long time period. Congratulations, from a varnish user.
I recently tried to set expectations in a coding interview that was scheduled for 3 hours. I told them that was enough time to read the spec, develop a simple test case, and begin or possibly complete an implementation of part of the spec. I also told them that I wasn't interested in a shop that hired people based on their ability to dash off broken, untested code in half a day. It seems to me that this kind of coding test (especially for someone with 25 years of open-source contributions) can only lead to bad things. I'd be glad to meet people like the author who have a practical view on Brooks and LOC/h.
Most coding interview challenges I came along are of limited complexity. I think it is important to clarify that reading, discussing the spec, writing tests a priori is something you would do in a real world coding task, but interviewers are usually more interested in the ability to discuss different approaches to solve them with their expected pros/cons, explain why some approaches won't work, prototype one out with some clean code and explain your thinking while writing the code. An experienced developer usually won't make a lot of mistakes for simple problems that require unit tests to see. If they do, the interviewer will make them aware and play the role of the testing, just to see how the candidate would react and solve the problem.
I've been doing tech interviews professionally for a large tech interview platform that you would have heard of. We have a timed coding challenge, but code quality is something we explicitly look for. Making slow but steady process with good testing is a great signal for us too, even if they don't get very far through our challenge within the allocated time.
That said, occasionally we have people who work very quickly and somehow still have great code. Thats an even better signal.
I hope it's communicated that they're not expected to finish. If I was given a coding challenge with a time limit I would assume I was expected to finish inside the time limit and make any sacrifices that needed to be made to achieve that - including ignoring testing and hoping I just don't make too many mistakes.
> This is why Varnish is written in "pidgin C" style and lousy with asserts which don't do anything, except clarify programmer intent, and in case of mistakes, stop bad things before they get out of hand.
My C programs always start as a few skeleton .h files with include guards, and a few .c files that are basically an ever growing list of asserts() on any pointer to a structure or variable, or on any value where it is expected to be within a certain range.
Gradually I feed my structs and typedefs into the .h files and replace the asserts with (hopefully) working code.
About the numbers, I feel like going through the commit diffs and counting all the + lines would be better (in some sense) metric than counting the final lines of code if we'd want some proxy for the amount of labor.
>I think the first copyright on this compiler is around 2011. That's 6 years for 750 LoC. That's about 125 lines of code per year.
>But that doesn't tell the whole story. If you look at the GitHub contributions that I've made, I've made 2967 of about 3000 commits to the compiler source over that time frame. In that time I've added roughly 4,062,847 lines of code to the code base, and deleted roughly 3,753,677 line of code. And there's the real story...It means that for every one of those 750 lines, I've had to examine, rework, and reject around 5400 lines of code.
It's an APL compiler, written in another APL dialect, which is about as dense as you can get in terms of lines-of-code. APL is one of the few really high level languages.
It looks like a lot of the lines of diffs are to HTML/XML files related to the certification of the code, rather than executable code. So, definitely not very dense.
I notice that my discussion about the churning on the GitHub repository has been repeated a number of times. I can provide some historical clarification on why there was so much churning there.
The documentation for those 17 lines of code are in a hopefully soon to be published thesis of about 70k words that includes performance information and the like.
However, during the development of this project, I didn't start writing it in APL on Dyalog. I explored a significant number of other architectural designs and programming methodologies. Some more popular I tried were habits like Extreme Programming, Agile methods, Java, Scheme, Nanopass, SML, Isabelle, and even a custom extension of Hoare Logic on top of dfns. I believe that I also explored implementing the compiler in C++/Boost and prototyped some stuff (I don't know if it ended up in this Git repo) and C.
In other words, the compiler has not been a single code base, but has been a series of complete rewrites using different methods, approaches, languages, techniques, and architectures. I have used heavyweight machine support (mostly around C++ with Visual Studio's infrastructure) as well as some very hardcore UNIX style low-level handiwork. Multiple different IDEs, text editors, and operating systems were all explored, as were multiple different backends, targets, and the like. At one time I had targeted LLVM, and another C, another OpenACC, and another ArrayFire.
The whole project has been a somewhat wide ranging exploration of the design space, to say the least.
What you are seeing of the XML stuff was from a particular design effort that was an attempt to apply strict Cleanroom Software Engineering as a methodology to the compiler design, to see what would happen. In the end, I abandoned the attempt, for what I hope will be obvious reasons, but during this time, I predominately worked on RHEL with the ed(1) text editor editing XML files for the DocBook publishing suite. Parts of the churning are the incorporation and removal of various dependencies that had to be brought in and out of the repository depending on what infrastructure I was relying on. In the case of DocBook, some of those files are large.
However, a significant amount of the work of Cleanroom Software Engineering is "coding via documented process." This includes the certification steps as well as the function specification, increment development, sequent analysis, and so forth.
Thus, for a very real portion of the Co-dfns work, I was literally programming in XML using ed(1) to model relatively complex state machines and function specifications that provided very fine-grained behaviors of the compiler. For example, a significant amount of work went into the following file:
This file is about 45k lines of XML, and was written and edited entirely by hand using ed(1). I had a video demonstration of this a while back which demonstrated how I did this, and particularly how I did a lot of this with ed(1), but I lost the script file recording.
Over time, as I continued to explore patterns and development approaches, I continued to discover that the code was faster, better, and easier to work with as I removed more and more "stuff" from the various processes and possibilities.
It wasn't until relatively late in the game that I actually realized that not only could the compiler be written in dfns well, but also that the compiler could be written in dfns in a way that was fully data parallel, which is the core insight of my Thesis. This had significant ramifications on the source code, because it meant that the compiler could not be tackled not only as a self-hosting project (at least in theory) but also in a fundamentally idiomatic way.
The result is that the compiler has generally continued to be more featureful, less buggy, and more dense at each major stage, with the latest leading to 17 lines of code. This is accomplishing essentially the same result as the 750 lines of code in a previous HN discussion, but does so partly by recognizing some passes as irrelevant and unnecessary to the current needs.
I do expect that after the publication of the thesis, the compiler will grow a little bit to add some new things that need to go in. However, at this point, I have a fairly efficient methodology.
So, the GitHub repository is not just a record of the code, but a record of a lot of different approaches to how to do what I was trying to do. Much of that XML you see was very much "coding" in the sense that I was providing for the core behavior of the system and was the primary specification of its behaviors in a formal, rigorous manner.
Yes, the code is dense, but changes to it don't make up most of the lines of diffs, and I really doubt the XML making up the diffs is edited as text; its a tool or generator.
I clarify this more above, but one of the reasons you don't see changes to that file is that much of the work of the repository was working on code that was deleted in favor of that file. And yes, the XML was edited by hand, as text, using ed(1). The vast majority of what I was exploring in the past I completely discarded in favor of that e.cd file that now represents the compiler.
While it is true that SLOC is not a good measure, this does not mean that no good measure uses SLOC. Much of the variance comes a lack of information about coding style and algorithmic approach. But it's not the 90s anymore. In the presence of a style guide and code reviews to ensure that the local style is being followed properly SLOC counts do become locally comparable as a reasonable measure. I'd still use the churn rate (lines added + line removed) as a rate of productivity though, over just lines added.
I wonder if this project, at least VCL compiler part of it, could've benefited from the use of a parser generator. I understand going slow, but using a parser generator would've allowed for faster development of VCL freeing up time to add more features so it isn't so hard to write for.
Official Varnish repositorie (https://packagecloud.io/varnishcache) doesn't offer any release for Buster yet. 6.3 was added since 2 days, but not for Buster yet.
The obvious Edsger Djikstra reference:
[I]f we wish to count lines of code, we should not regard them as “lines produced” but as “lines spent”: the current conventional wisdom is so foolish as to book that count on the wrong side of the ledger.
http://plasmasturm.org/log/linesspent/