Any suggestion for a handbook or compendium that you consider to be a worthy alt...

lifeisstillgood · 2024-10-21T20:35:53 1729542953

The thing here is, this reads like a prissy textbook that no-one can really disagree with but is still not gripping the reality. More HR handbook than blood-red manual.

For example, project management. The book covers this but does the usual wrong headed way of imagining there are executives with clear eyed Vision and lay down directives.

This is of course not how most projects in most companies are started. It’s a mess - reality impinges on the organisation, pain and loss and frustration result in people making fixes and adjustments. Some tactical fixes are put in place, covered by “business as usual”, usually more than one enthusiastic manager thinks their solution will be the best, and a mixture of politics and pragmatism results in a competition to be the one project that will solve the problem and get the blessed budget. By the time there is an official project plan, two implementations exist already, enough lessons learnt that the problem is easily solved, but with sufficient funding all that will be abandoned and have to be rebuilt from scratch - and at a furious pace to meet unrealistic expectations that corners will be cut leading …

That manual needs to be written.

epolanski · 2024-10-21T23:59:04 1729555144

You know that you could be speaking about mining operations or building highways in your post rather than software and everything would apply the same?

I really don't see the argument against the book here in your comment.

kragen · 2024-10-22T14:22:21 1729606941

There are three absolutely key differences here.

The first is that, if you get a four-year college degree in mining or civil engineering, you will not spend much of those four years studying management practices; you will spend it studying geology, the mechanical properties of rocks and soil, hydrology (how water flows underground), and existing designs that are known to work well. You probably will not build a mine or a highway, but you will design many of them, and your designs will be evaluated by people who have built mines and highways.

The second is related to why you will not build a mine or highway in those four years: those are inherently large projects that require a lot of capital, a lot of people, and at least months and often decades. Mining companies don't have to worry about getting outcompeted by someone digging a mine in their basement; even for-profit toll highway operators similarly don't have to worry about some midnight engineer beating them to market with a hobby highway he built on the weekends. Consequently, it never happens that the company has built two highways already by the time there is an official project plan, and I am reliably informed that it doesn't happen much with mines either.

The third is that the value produced by mining operations and highways are relatively predictable, as measured by revenue, even if profits are not guaranteed to exist at all. I don't want to overstate this; it's common for mineral commodity prices and traffic patterns to vary by factors of three or more by the time you are in production. By contrast, much software is a winner-take-all hits-driven business, like Hollywood movies. There's generally no way that adding an extra offramp to a highway or an extra excavator to a mine will increase revenue by two orders of magnitude, while that kind of thing is commonplace in software. That means that you win at building highways and mining largely by controlling costs, which is a matter of decreasing variance, while you win at software by "hitting the high notes", which is a matter of increasing variance.

So trying to run a software project like a coal mine or a highway construction project is a recipe for failure.

lifeisstillgood · 2024-10-22T22:08:35 1729634915

And as a side note, this is why LLMs are such a huge sugar rush for large companies. The performance of LLMs is directly correlated to capital investment (in building the model and having millions of GPUs to process requests).

Software rarely has a system that someone cannot under cut in their bedroom. LLMs is one such (where as computer vision was all about clever edge finding algorithms, LLMs are brute force (for the moment))

Imagine being able to turn to your investors and say “the laws of physics mean I can take your money and some open source need cannot absolutely cannot ruin us all next month”

kragen · 2024-10-22T22:37:39 1729636659

That's an interesting thought, yeah. But it also limits the possible return on that capital, I think.

fragmede · 2024-10-22T00:01:02 1729555262

You seem to have quite a bit of lived experience with that particular version of project management. Why not write it yourself?

kragen · 2024-10-22T21:15:43 1729631743

Although any random bathroom-wall graffiti is better than the SWEBOK, I don't know what to recommend that's actually good. Part of the problem is that people still suck at programming.

“How to report bugs effectively” <https://www.chiark.greenend.org.uk/~sgtatham/bugs.html> is probably the highest-bang-for-buck reading on software engineering.

Not having read it, I hear The Pragmatic Programmer is pretty good. Code Complete was pretty great at the time. The Practice of Programming covers most of the same material but is much more compact and higher in quality; The C Programming Language, by one of the same authors, also teaches significant things. The Architecture of Open-Source Applications series isn't a handbook, but offers some pretty good ideas: https://aosabook.org/en/

Here are some key topics such a handbook or compendium ought to cover:

- How to think logically. This is crucial not only for debugging but also for formulating problems in such a way that you can program them into a computer. Programming problems that are small enough to fit into a programming interview can usually be solved, though badly, simply by rephrasing them in predicate logic (with some math, but usually not much) and mechanically transforming it into structured control flow. Real-world programming problems usually can't, but do have numerous such subproblems. I don't know how to teach this, but that's just my own incompetence at teaching.

- Debugging. You'll spend a lot of your time debugging, and there's more to debugging than just thinking logically. You also need to formulate good hypotheses (out of the whole set of logically possible ones) and run controlled experiments to validate them. There's a whole panoply of techniques available here, including testing, logging, input record and replay, delta debugging, stack trace analysis, breakpoint debuggers, metrics anomaly detection, and membrane interposition with things like strace.

- Testing. Though I mentioned this as a debugging technique, testing has a lot more applications than just debugging. Automated tests are crucial for finding and diagnosing bugs, and can also be used for design, performance profiling, and interface documentation. Manual tests are also crucial for finding and diagnosing bugs, and can also tell you about usability and reliability. There are a lot of techniques to learn here too, including unit testing, fuzzing, property-based testing, various kinds of test doubles (including mock objects), etc.

- Version tracking. Git is a huge improvement over CVS, but CVS is a huge improvement over Jupyter notebooks. Version control facilitates delta debugging, of course, but also protects against accidental typo insertion, overwriting new code with old code, losing your source code without backups, not being able to tell what your coworkers did, etc. And GitLab, Gitea, GitHub, etc., are useful in lots of ways.

- Reproducibility more generally. Debugging irreproducible problems is much more difficult, and source-code version tracking is only the start. It's very helpful to be able to reproduce your deployment environment(s), whether with Docker or with something else. When you can reproduce computational results, you can cache them safely, which is important for optimization.

- Stack Overflow. It's pretty common that you can find solutions to your problems easily on Stack Overflow and similar fora; twin pitfalls are blindly copying and pasting code from it without understanding it, and failing to take advantage of it even when it would greatly accelerate your progress.

- ChatGPT. We're still figuring out how to use large language models. Some promising approaches seem to be asking ChatGPT what some code does, how to use an unfamiliar API to accomplish some task that requires several calls, or how to implement an unfamiliar algorithm; and using ChatGPT as a simulated user for user testing. This has twin pitfalls similar to Stack Overflow. Asking it to write production-quality code for you tends to waste more time debugging its many carefully concealed bugs than it would take you to just write the code, but sometimes it may come up with a fresh approach you wouldn't have thought of.

- Using documentation in general. It's common for novice programmers to use poor-quality sites like w3schools instead of authoritative sites like python.org or MDN, and to be unfamiliar with the text of the standards they're nominally programming to. It's as if they think that any website that ranks well on Google is trustworthy! I've often found it very helpful to be able to look up the official definitions of things, and often official documentation has better ways to do things than outdated third-party answers. Writing documentation is actually a key part of this skill.

- Databases. There are a lot of times when storing your data in a transactional SQL database will save you an enormous amount of development effort, for several reasons: normalization makes invalid states unrepresentable; SQL, though verbose, can commonly express things in a fairly readable line or two that would take a page or more of nested loops, and many ORMs are about as good as SQL for many queries; transactions greatly simplify concurrency; and often it's easier to horizontally scale a SQL database than simpler alternatives. Not every application benefits from SQL, but applications that suffer from not using it are commonplace. Lacking data normalization, they suffer many easily avoidable bugs, and using procedural code where they could use SQL, they suffer not only more bugs but also difficulty in understanding and modification.

- Algorithms and data structures. SQL doesn't solve all your data storage and querying problems. As Zachary Vance said, "Usually you should do everything the simplest possible way, and if that fails, by brute force." But sometimes that doesn't work either. Writing a ray tracer, a Sudoku solver, a maze generator, or an NPC pathfinding algorithm doesn't get especially easier when you add SQL to the equation, and brute force will get you only so far. The study of algorithms can convert impossible programming problems into easy programming problems, and I think it may also be helpful for learning to think logically. The pitfall here is that it's easy to confuse the study of existing data structures and algorithms with software engineering as a whole.

- Design. It's always easy to add functionality to a small program, but hard to add functionality to a large program. But the order of growth of this difficulty depends on something we call "design". Well-designed large software can't be as easy to add functionality to as small software, but it can be much, much easier than poorly-designed large software. This, more than manpower or anything else, is what ultimately limits the functionality of software. It has more to do with how the pieces of the software are connected together than with how each one of them is written. Ultimately it has a profound impact on how each one of them is written. This is kind of a self-similar or fractal concern, applying at every level of composition that's bigger than a statement, and it's easy to have good high-level design and bad low-level design or vice versa. The best design is simple, but simplicity is not sufficient. Hierarchical decomposition is a central feature of good designs, but a hierarchical design is not necessarily a good design.

- Optimization. Sometimes the simplest possible way is too slow, and faster software is always better. So sometimes it's worthwhile to spend effort making software faster, though never actually optimal. Picking a better algorithm is generally the highest-impact thing you can do here when you can, but once you've done that, there are still a lot of other things you can do to make your software faster, at many different levels of composition.

- Code reviews. Two people can build software much more than twice as fast as one person. One of the reasons is that many bugs that are subtle to their author and hard to find by testing are obvious to someone else. Another is that often they can improve each other's designs.

- Regular expressions. Leaving aside the merits of understanding the automata-theory background, like SQL, regular expressions are in the category of things that can reduce a complicated page of code to a simple line of code, even if the most common syntax isn't very readable.

- Compilers, interpreters, and ___domain-specific languages. Regular expressions are a ___domain-specific language, and it's very common to have a problem ___domain that could be similarly simplified if you had a good ___domain-specific language for it, but you don't. Writing a compiler or interpreter for such a ___domain-specific language is one of the most powerful techniques for improving your system's design. Often you can use a so-called "embedded ___domain-specific language" that's really just a library for whatever language you're already using; this has advantages and disadvantages.

- Free-software licensing. If it works, using code somebody else wrote is very, very often faster than writing the code yourself. Unfortunately we have to concern ourselves with copyright law here; free-software licensing is what makes it legal to use other people's code most of the time, but you need to understand what the common licenses permit and how they can and cannot be combined.

- Specific software recommendations. There are certain pieces of software that are so commonly useful that you should just know about them, though this information has a shorter shelf life and is somewhat more ___domain-specific than the stuff above. But the handbook should list the currently popular libraries and analogous tools applicable to building software.

kragen · 2024-10-22T21:16:46 1729631806

There are some people (such as the SWEBOK team) who would claim that software engineering shouldn't concern itself much with considerations like my list above. Quoting its chapter 16:

> Software engineers must understand and internalize the differences between their role and that of a computer programmer. A typical programmer converts a given algorithm into a set of computer instructions, compiles the code, creates links with relevant libraries, binds†, loads the program into the desired system, executes the program, and generates output.

> On the other hand, a software engineer studies the requirements, architects and designs major system blocks, and identifies optimal algorithms, communication mechanisms, performance criteria, test and acceptance plans, maintenance methodologies, engineering processes and methods appropriate to the applications and so on.

The division of labor proposed here has in fact been tried; it was commonplace 50 or 60 years ago.‡ It turns out that to do a good job at the second of these roles, you need to be good at the stuff I described above; you can't delegate it to a "typical programmer" who just implements the algorithms she's given. To do either of these roles well, you need to be doing the other one too. So the companies that used that division of labor have been driven out of most markets.

More generally, I question the SWEBOK's attempt to make software engineering so different from other engineering professions, by focusing on project-management knowledge to the virtual exclusion of software knowledge; the comparison is in https://news.ycombinator.com/item?id=41918011.

______

† "Binds" is an obsolete synonym for "links with relevant libraries", but the authors of the SWEBOK were too incompetent to know this. Some nincompoop on the committee apparently also replaced the correct "links with relevant libraries" with the typographical error "creates links with relevant libraries".

‡ As a minor point, in the form described, it implies that there are no end users, only programmers, which was true at the time.

kragen · 2024-10-23T01:46:02 1729647962

I wrote:

> Code Complete was pretty great at the time.

Unfortunately it seems that Steve McConnell has signed the IEEE's garbage fire of a document. Maybe if you decide to read Code Complete, stick with the first edition.