Laws of Performant Software

throwawayReply · on Sept 20, 2016

> a cache can be added in 10 LOCs

Yes, and then 10 months debugging edge cases where communicating parts are looking at different versions of the "same" data.

Caching is really important, but caching (and cache-invalidation) is really difficult, adding caching to an application that doesn't use caching is not "10 LOC and done".

creshal · on Sept 20, 2016

It really depends on what you're trying to cache. Caching expensive calculations e.g. is trivial to get right (use all arguments as cache key) and often you don't even need invalidation (unless your algorithm changes at runtime).

Klathmon · on Sept 20, 2016

Memoization works amazingly and can be very easy to implement, but it's not always that applicable.

In my experience, I just don't hit that many pure functions that don't do things like touch the database (which can change out from under the function), or are called enough with the same arguments that memoization is actually worth it.

But when it does work, it's like magic. I do a lot of work in javascript now, and it's great being able to wrap a function in a single line `memoize` function and instantly improve performance.

vinceguidry · on Sept 20, 2016

Caching is just another layer of complexity on top of your app. Sort of like HTML templates. If you're expecting it to be a no-maintenance drop-in solution, you're going to get burned. If you're willing to think carefully about how it fits into your ___domain, then you might be able to get away with a 10 LOC method on your base controller class.

jakub_h · on Sept 20, 2016

It's a pervasive element, though. Which to me means two things. First, I wonder if there isn't a way of adding it (really) transparently into a language. Basically, whenever there's a possibility that a value is a function of old values, chances are that a previously used result is still available. Second, strategies can have massive time and space implications, but that's exactly why being explicit about them in the application's code in any way should be avoided at all costs: they shouldn't change the semantics, only pragmatics. Given that compilers already do this with simpler things (are my local variables actually on stack or are they kept in registers?), one has to wonder if this isn't one of those things that computers could perhaps figure out on their own in the future, just like we don't allocate registers by hand anymore either. In a similar way that, say, ATLAS finds out by trial and error the best way to perform FP linear algebra within a system of parametric code solutions. I think the "here's what I mean, give me a piece of code that does this" approach could have massive impact in the future. Transparent caching, glue code/plumbing, automatic algorithm selection based on result constraints etc. all seem like possible applications.

jsingleton · on Sept 20, 2016

Seconded that cache-invalidation is hard. This may be a cliché, but it really is hard.

Implementing a cache may be easy but debugging a cache, or particularly the interactions of multiple caches (some outside of your control) certainly isn't. I've encountered these problems on many projects and also written about them in detail for my recent book.

jakub_h · on Sept 20, 2016

In fact, it is well known that the two hardest things in programming are cache invalidation, naming things, and off-by-one errors.

huhtenberg · on Sept 20, 2016

> caching (and cache-invalidation) is really difficult

That's an urban legend.

Needlessly complicated or over-abstracted general-purpose caching frameworks are difficult, but your dumbest imaginable linear LRU fast lookup is both exceptionally useful and can indeed be done in 10 LoC in a lot of cases.

throwawayReply · on Sept 20, 2016

It's not the algorithm that's difficult, it's the effect of adding caching to a system that wasn't built with caching in place at the start that is difficult.

This isn't an "urban legend" it's first hand experience working with companies trying to add caching. No, those companies aren't even trying to write caching algorithms, they're just bundling in a caching layer and hoping that the system behaves in the same way.

It only takes somewhere which writes data (perhaps in a way that bypasses the caching layer so the cache doesn't know it has changed) and re-reads it back quickly for software which used to work suddenly breaks.

Now you might look at that and go "omg refactor it! That's horrible code, that should never ship" etc, but not everywhere is the s.v. bubble with endless amounts of the best developers to throw at problems. Code which worked and solved a business problem ended up shipping, possibly without testers and probably without code reviews.

So adding a caching layer suddenly "breaks" those reports, now who's going to have to fix it, not the person who wrote those reports even if the very behavior of side-effected data changes and db re-reads is precisely a cause of data layer slowness that led to wanting to implement caching...

xixi77 · on Sept 20, 2016

I am not sure you and the OP are talking about the same thing.

Bundling a caching layer without even trying to write caching algorithms -- particularly if such a layer is serving multiple purposes -- sounds to me like what the parent is calling a "general-purpose caching framework", probably overly abstracted too as these things are wont to be.

I would be super cautious incorporating this kind of black-box stuff. Even if the docs have 10 LOC examples, there can be all kinds of unexpected quirks that you would need to be aware of before doing anything.

I read the OP as talking about specific, single-purpose caching techniques -- e.g. when you need to repeatedly compute a function of arbitrary parameters, it can help a lot to simply store values for the more common parameter combinations.

striking · on Sept 21, 2016

It all depends on the purity of the underlying computation. If you're multiplying XXL numbers together, that's different than accessing a database with dynamic or variable data. Math is totally pure. But you'll have to evaluate the constraints on the purity of that database access.

huhtenberg · on Sept 20, 2016

Nobody's saying that one can just throw in a caching layer, touching nothing else and it will just magically work. There's obviously some thought and due consideration required, but it is NOT "really difficult". And in a lot of cases it is in fact as simple as adding a handful lines of code.

PS. It is an urban legend, because "cache invalidation is hard" gets repeated a lot, initially as a joke, but it doesn't preclude people who aren't familiar with the subject from taking it as a fact and then repeating it as such.

Voice recognition from scratch is hard. Some lock-free data structures are hard. Caching is not hard. It's knowing what the heck you are actually doing and doing it well is what's hard. By the same measure, C macros would be hard, because some idiot can do #define true false and everyone else will spend the same 10 months trying to understand why the hell things break now and then. Caching is hard is when someone starts messing with other people' code without fully understanding it. But then anything is "hard" under these circumstances.

falcolas · on Sept 20, 2016

The difficulty in implementing voice recognition and lock free structures does not make caching easy. Yes, you can implement caching in 10 lines of code. Anybody can do that - it's the writing the correct 10 lines of code which is very hard.

Sure, if you're implementing Fibonacci, memoization is simple. But if you're trying to memoize the results out of a database, things are going to be a lot more complicated, really fast.

It requires knowledge of what can go wrong, what will go wrong, the use cases associated with a bit of data to be cached, how the application will be deployed, what other caching will be implemented across the system, and a dozen other bits of information unique to each use-case.

Knowing what questions to ask, and of whom to ask them, requires experience.

deong · on Sept 20, 2016

> It's knowing what the heck you are actually doing and doing it well is what's hard.

That sort of goes without saying. If doing something well is hard, then doing that thing is hard. No one is interesting in writing code that doesn't work -- we're always talking about "doing it well".

We don't all going around claiming that quantum mechanics is easy and then backing it up by demonstrating our ability to pull grossly wrong answers out of our asses.

sqeaky · on Sept 20, 2016

> It only takes somewhere which writes data (perhaps in a way that bypasses the caching layer so the cache doesn't know it has changed) and re-reads it back quickly for software which used to work suddenly breaks.

So those 10 lines need to be in the wrong place?

Why expose the uncached API?

Web or single system implement caching in a defined is easy if you don't have a defined API you probably don't have a good system. If you don't have a good system, why are you trying to implement caching? The not being good part is probably why its slow.

xixi77 · on Sept 20, 2016

Precisely -- it's the "general-purpose" part that is extremely complicated, which is what everyone here seems to be talking about when bringing up invalidation, but having simple, specific and localized caching in a performance-critical region is not that hard, and can be very effective.

falcolas · on Sept 20, 2016

I have to agree with the parent - cache invalidation is tough. When do you do it? What triggers an invalidation? When is it OK to use known-stale data? How long should a cache be valid for? If multiple copies of a program are brought up, what effect will it have on the caches? How do updates from one instance get populated to others? What are the impacts on up- and down-stream services?

I'm going to borrow from someone much more eloquent than I: How simple it is to declare a static hashtable, and yet how perilous!

http://thecodelesscode.com/case/148?topic=caching

rantanplan · on Sept 20, 2016

>That's an urban legend.

ROFL. What are you talking about?

You're talking as if it's a solved problem for all cases. Hint: it's not.

If it was an urban legend people wouldn't write dissertations on it.

slowmovintarget · on Sept 20, 2016

This is another one of those PLace-Oriented Programming (PLOP) problems. Rather than a fictional "now" read at a place, perceive a factual "then" which may be cached to your heart's content.

Granted, if you aren't taking advantage of immutable data in the first place, it can hurt to get there.

flukus · on Sept 21, 2016

In the real I world see caching mess things up more than they help with stupid implementations like caching entire database tables, n+1 problems being moved from the database to the cache, etc. Where I am now we have this absolutely retarded in memory cache that we write to (it will write to the DB several minutes later). At other places I've seen the cache stored in session variables.

Caching has it's place, but more often than not I see it used as a bandaid on a terrible design.

MrDosu · on Sept 21, 2016

> can be added in 10 LOCs

to me is kinda like you can easily make heavier elements by just adding a few electrons, protons and neutrons...

critium · on Sept 20, 2016

While this article is itself worth a read, remember the programming mantra:

  * Make it Work
  * Make it Right
  * Make it Fast

And this is the last step. While this does not apply to all projects but it does apply to the majority.

brianwawok · on Sept 20, 2016

This is really an overly simplistic way to code. I urge people to think deeper up front about performance.

Know your performance goals going in and code accordingly. If you require 100 micro average latency and you coded in Node.js, step 3 will be a rewrite.

Every single line of code I write, I can tell you my performance goals. If indeed it is a simple crud screen by a user, the goal may be "meh, document.ready called when viewed from 100ms browser lag within 1 second". Backend trading code would have different goals..

BatFastard · on Sept 20, 2016

It depends on how familiar the engineer is with the problem space.

If the engineer knows little about the problem space, then any type of optimization is not warranted until a working version is achieved.

The biggest secret to writing software is "not painting yourself in the corner".

brianwawok · on Sept 20, 2016

If an engineer is not familiar with the problem space, he should learn more before he writes more code.

bryanlarsen · on Sept 20, 2016

But the best way to learn is often to write some code.

Klathmon · on Sept 20, 2016

Yeah, when starting a new project I tend to write a POC, throw it out, make a ton of notes on the problem and possible solutions, write a second POC, throw it out again, refine my notes, make sure i really understand the problem and my solution is actually working, and if everything looks good, write it "for real".

coredog64 · on Sept 20, 2016

The management question, therefore, is not whether to build a pilot system and throw it away. You will do that. […] Hence plan to throw one away; you will, anyhow.

- Fred Brooks, "Mythical Man Month"

sqeaky · on Sept 20, 2016

Then the problem is an academic endeavor and hopefully not something to be sold to a customer. If it is open source, then it is likely to fall into disuse like 99.9% of open source endeavors. If it is in the 0.1% of open source projects then the community will find the effort for a rewrite, but it could be painful like python 2/3 or perl 6/7

adrianN · on Sept 20, 2016

Have you never written a prototype to understand your customers needs?

sqeaky · on Sept 20, 2016

I usually iterate. Building upon what was already built.

All to often a 'prototype' becomes the finished product without the intervening iterations. If that was the plan from the start it works more smoothly.

sqeaky · on Sept 20, 2016

By academic, I mean any project for learning. Not something specifically realated to schools.

andrepd · on Sept 20, 2016

Exactly. Completely ignoring performance considerations until after you got everything working right potentially means a substantial or complete rewrite of much of the code.

critium · on Sept 20, 2016

My posit was that 80% of the software an engineer will write will not require optimization and based on the responses i've been getting I should have used "rule of thumb" vs mantra. My intention was to warn the ones that read this and say, oh i gotta do all of this for every piece of software that I write, which will undoubtedly lead to overly complex code when a simple solution would have worked just as well.

  "Know your performance goals going in and code accordingly."

I this is key sentence here and its worth repeating. Know your performance goals before your fingers touch the keyboard.

brianwawok · on Sept 20, 2016

> My intention was to warn the ones that read this and say, oh i gotta do all of this for every piece of software that I write, which will undoubtedly lead to overly complex code when a simple solution would have worked just as well.

No that is not true at all.

I am writing a crud app to be used by 1 person, some manager of a widget factory. I say "my perf goal is to have page loads in 10 seconds or less". How will that make my code more complex? If anything it will make my code MORE simple, as I can relax all kinds of constraints like "making 72 database queries per page is generally bad".

breischl · on Sept 20, 2016

I believe the GP's point was that if you know that your performance goals are very relaxed, then you can make the code appropriately simple from the beginning. Conversely, if your performance goals are stringent, then you can take an appropriate approach from the beginning.

kasey_junk · on Sept 20, 2016

I disagree completely with your 80% number. I haven't worked on a project that had 0 performance work ever.

Performance is a first class design constraint just like development time, budget & functionality, if you don't treat it as such from the beginning you are asking for trouble.

breischl · on Sept 20, 2016

But different parts of a project usually have different performance goals. Typically just the important/frequently used parts need to be optimized. Infrequently used things (maybe setup, admin interface, options dialog) can usually just be "good enough."

So one could say that within a project, 80% of the code isn't performance critical.

kasey_junk · on Sept 20, 2016

All parts of a system have performance requirements. Some of them you might meet naively, great!

But if you don't know what they are it is very hard to back into acceptable performance.

breischl · on Sept 20, 2016

No kidding. Who ever said anything about not knowing what the requirements are?

The point you were disagreeing with here was "80% of the software an engineer will write will not require optimization". ie, 80% of any given system hitting the performance requirements naively. So... why are you still arguing?

kasey_junk · on Sept 20, 2016

Because blind adherence to the "Make it work then make it fast" advice has been the bane of my career.

Generally speaking, I have not found it to be true that you can make something fast if you didn't think about performance first. If you thought about it, and came to the decision "it will be fast enough no matter what we do" bully for you, but for me that happens way less than 80% of the time.

dagw · on Sept 20, 2016

That's all well and good, but still "make it work" should always come before "make it fast". For example the code I'm writing right now in my other window is terribly inefficient, quite naive and may very well have to be re-written, but I don't care at the moment. All I care about right now is: Is what I'm trying to do actually practically possible in the general case. If so what approach will give the best/good enough results. And finally, if I manage to do what I'm trying to do, will this particular approach offer a better solution to the higher level problem I'm dealing with than my other approach.

Once I've answered "yes" to all those questions then I can think about heading back and trying to make it fast. Writing really high performance code that doesn't solve the problem you have or give you the results that you need is, of course, a waste of time.

kasey_junk · on Sept 20, 2016

You misunderstand "make it work" includes performance goals. The difference is if you consider them upfront it includes the explicitly not implicitly.

You'd never say that the you've made the code "work" if you knew it would take 3x of your budget to get there. Similarly with performance, it doesn't "work" if it doesn't meet the perf goals of the project, no matter how relaxed they might be.

dagw · on Sept 20, 2016

I guess we're arguing semantics at this point. For me "make it work" means that I'm able to write a piece of code that takes the input I want and returns the output I want eventually, doesn't matter if it takes a week instead of a minute to run. Before I get to that point then any optimization I do is probably not the best use of my time. Also if I can't get to that point then it means I really don't understand the underlying problem and I'm probably not in a position to start reasoning about making it fast.

Generally speaking I find it much easier to take slow, working code and making fast, as opposed to fast, broken code and making it work.

kasey_junk · on Sept 20, 2016

Generally speaking I don't. The only way I've ever been able to be successful on the performance front is to conclude that code that is slow is broken, not that it is working before then.

Backing into acceptable performance after the fact just doesn't work for the problem domains I work in (which on first blush have not been exclusively performance based).

oldmanjay · on Sept 20, 2016

If your attitude is shaped by narrow problem domains, why argue with people as if you were talking about the general case?

ska · on Sept 20, 2016

Even better is: Know your performance goals going in and choose your tools and approaches accordingly.

You still mostly want to follow the rough priority above. You absolutely may prototype to convince yourself a performance goal can be met early on, but if the goal is high performance code you are still far better off making the first pass for correctness. Skipping this step often leads to highly performant code that is wrong, and is a pain in the ass to debug.

The "make it work, then make it right, then make it fast" mantra is both overly simplistic and deeply true.

karmelapple · on Sept 20, 2016

How do you know the speed expectations? Do you always talk concrete numbers with the stakeholders before each change you make, or just use reasonable rules of thumb you determine?

brianwawok · on Sept 20, 2016

Depends on the project!

Asking stakeholders is good. Often then will have no idea, or say something generic like "make it fast".

Try to pin them to something more concrete.

"So how about adding this new feature makes X happen no more than 5% slower than it currently does"

kasey_junk · on Sept 20, 2016

Not before each change but before any major new initiative or refactor. Having those numbers up front is the only way to make appropriate trade offs.

Having this conversation with stakeholders often educates them on the costs of performance as well. Getting 100% of responses sub 200ms is frequently orders of magnitude more expensive than getting 99% of them there, and stakeholders usually get that fast when you show them budget info.

ansgri · on Sept 20, 2016

Replying to the second paragraph, there often is a real value in maintaining strong upper bound for the latency, especially in distributed real-time systems (which are most of the real systems, anyway).

E.g. (99% sub-200ms and 1% _unbounded_) vs (80% sub-200ms and _always_ sub-500ms) means 1% of potentially unanticipated crashes (a hell to debug and explain to customers!) vs a highly reliable system and happy customers.

kasey_junk · on Sept 20, 2016

For sure, thats the definition of a real time system after all. But having conversations about what the long tails do to the "normal" path and what the costs (both in money and performance in the "normal" path) is quite simply something that you can't back into.

ansgri · on Sept 20, 2016

Maybe I didn't understand you correctly, but you can at least have a "return error on timeout" and process that with a predictable logic. Or maybe you do have an architecture when any individual tardy request absolutely cannot impact others. After all, I come from stream processing systems where there's only few "users" with constant streams of requests, and these users are interdependent (think control modules in a self-driving car).

kasey_junk · on Sept 20, 2016

What I'm suggesting is the decision on what you do in the case of long tail performance problems, is not something you can back into.

If you are going to have timeouts with logic, that has down stream implications. If you are going to have truly independent event loops, that is a fundamental architectural decisions.

None of those things match the "make it work, then make it fast". You literally have to design that into the system from jump street as it is part of the definition of "works".

vinceguidry · on Sept 20, 2016

> I urge people to think deeper up front about performance.

That's part of "making it work." If you're writing backend trading code, then performance is among the first consideration.

brianwawok · on Sept 20, 2016

Except I also see crud apps that take 8 seconds to load, and you can't just fix it with a cache. EVERY app has performance considerations. Literally every app. Some may be very loose.. but then spell it out, and use that to think about it.

vinceguidry · on Sept 20, 2016

If you're following basic REST guidelines regarding idempotency then, yes, you can "just fix it with a cache." That's the whole point of the guideline. Updates can still take awhile, but you can fix that if you need to with a backend jobs server.

Performance should always be in your mind somewhere, but it doesn't always need to be at the forefront.

jschwartzi · on Sept 20, 2016

Unless it's taking 8 seconds to load because your user is on 2G. Then the only way you can avoid that hit is by not making the requests in the first place.

brianwawok · on Sept 20, 2016

Well, or sending very compressed binary data to the user and rendering more client side, vs sending fat HTML on every click.

svdree · on Sept 20, 2016

What you'll often find however, is that in the last step ("make it fast") you have limited room to maneuver, because of the stuff you did in the first 2 steps. If you really need high performance, you need to design for high performance, not just leave it as an afterthought.

zzzcpan · on Sept 20, 2016

Usually it's almost impossible to predict where your design flaws would be until you actually use the thing in production. Because you make a lot of assumptions and some of them will inevitably be false. So, make it fast is mostly about changing design.

critium · on Sept 20, 2016

absolutely agree with your last sentence. But thats not a majority of software written in my experience.

  "If you really need *high performance*, you need to design for high performance, not just leave it as an afterthought."

Emphasis, mine.

jackmott · on Sept 20, 2016

Then why do I encounter slow software every day? Why do we keep encouraging this?

brianwawok · on Sept 20, 2016

Because this entire Mantra is horrible advice. All 3 need to be in the spec, and all 3 need to be coded for.

arethuza · on Sept 20, 2016

"Make it fast" sometimes does not need to be coded for if the applications is fast enough after the "Make it right" step.

brianwawok · on Sept 20, 2016

Doesn't matter, every app has some speed requirement.. if it is trivial, then there isn't much to do for it.

It's not something you can just do later, because thats not how software works. Fast code needs designed to be fast. Not "fixed" at the end.

lostcolony · on Sept 20, 2016

Because they stopped at step one (make it work) or two (make it right). Or the first of my three (borrowed from Joe Armstrong), make it work, make it beautiful.

It's not simply that they failed to plan for performance, it's that after making it work -they never measured and removed obvious bottlenecks-.

Most pieces of software don't need high performance. Drawing some widgets on screen just isn't that demanding. But it -does- require you to go back afterwards and remove places you introduced inefficiencies. You don't need to code in C and optimize against cache misses for that; you just need to take time after things work to make them not suck, and thats something a lot of software development doesn't take the time to do.

JoeAltmaier · on Sept 20, 2016

Most client software doesn't need performance. Drawing widgets is a client-side thing. Scaling is really the hot point that discovers bottlenecks. Server have to scale. And nobody has time to plan/code very far for scaling when they haven't succeeded yet.

I think its perfectly normal to write slow, non-scalable code as proof of concept. Then continue to attack bottlenecks as you grow (if you grow). Its a lucky startup that has to deal with performance. They can afford to dedicate a couple smart developers just to that issue.

Michael Dell said every time your company doubles in size, you have to reinvent your processes. True for software too.

lostcolony · on Sept 20, 2016

Yeah, I just gave a random example of something that is very common in a generic software application; at some point, you draw some basic GUIs. Those can lead to performance bottlenecks as well (loading and displaying 10 data points in your test bed; easy. Loading and displaying 1 million data points in a production situation, a little harder). There's also basic network operations, some DB/disk writes, actual CPU usage for whatever processing has to occur, etc. Any of those may need to be super performant for some apps, some will be front end, some backend, but for many applications and uses, at least initially, it's not worth worrying about at first.

That said, I'd draw a distinction between vertical scaling and horizontal scaling. The former you should address as needed, as the gains are comparatively limited, and the bottlenecks are unknown (you think you're CPU bound; whoops, nope, I/O. Or whatever); the latter should be designed for if there's a chance you'll need it. Because oftentimes, things that are merely decisions early on (no difference in amount of work) can lead to savings of months of effort and churn down the line if you go for something that scales. Decisions like deciding what data needs to have strong consistency, versus what data can be eventually consistent (and choosing data stores based on that), trying to avoid shared state, thinking about "what happens if there are more than one of these?" and designing/implementing with that in mind (even if some aspects are super hard and you punt on them, there's plenty of low hanging fruit that you can address with minimal effort early, rather than massive later on).

rakoo · on Sept 20, 2016

Because there is no user story for "make it performant", and there is little thought put into dynamic design vs static design. If money comes in with crappy performance and it is hard to predict how much more money will come if the developer improved performance, then it won't be a business priority.

crdoconnor · on Sept 20, 2016

I've seen more performance problems caused by people not having clean code than I ever have from people not thinking about performance from the get go.

I've also seen plenty of performance issues ironically caused by performance hacks wedged in early on.

brianwawok · on Sept 20, 2016

You will often find clean code and fast code converge on very similar places. It is often a false dichotomy to think code needs to be either clean OR fast.

Now this does break down, if you need to get to the point where you are bit twiddling, it is not going to be clean as using something higher level.. but you can often put the nasty parts in a static method somewhere and still have the code be very easy to read.

critium · on Sept 20, 2016

this reminds me of the post about the jvm code cache a few days ago where if they had left the jvm to optimize the code cache by itself, the would not have ended up with the problem situation where the had to spend time to figure it out their optimization was the cause of the problem.

-- sorry for the run on sentence.

barrkel · on Sept 20, 2016

Sure; but a working version is still really really useful to compare against (e.g. build up a suite of test cases) even if you have to redesign large parts for performance later.

gabemart · on Sept 20, 2016

My personal approach is to do some order-of-magnitude performance testing as early as possible, to validate that the approach chosen to solve a particular problem is at least tenable.

If you ignore performance completely until late in the project, you can paint yourself into a corner. This includes cases like knowing that performance is 50x slower than will be acceptable throughout development, but saying "we can add X later for an easy performance win". If you don't actually test that X gets you within reach of your performance target at an early stage, you can end up with a fully built system that is unusable.

hawski · on Sept 20, 2016

One should put performance considerations under "Make it Work", but that is probably obvious. The definition of "It works" (or eventually "Its correct") should include certain latency or throughput requirements and sometimes can't be left out for "Make it Fast" phase.

On the one hand I understand that most programs don't need to be specially fast. But on the other it leads to such of waste of time for the users. Where it really matters we usually see some kind of rewrite or new program that is designed to be fast and it can take significant slice of the pie. At the same time there is also a case for ease of use - ease of use can make even slow programs not only feel fast, but also take shorter time from decision to install/run said program to achieving user's goal.

There is no silver bullet, but few (or many?) rules of thumb ;)

jsingleton · on Sept 20, 2016

As with any engineering problem, there is usually a trade-off involved. Premature optimization is bad but software does need to be fast enough to start with and flexible enough to be improved later.

You don't want to code yourself into a corner by accident. It's fine to knowingly take on technical debt, as long as you have a plan to fix it in the future. Even if this is never required.

I always try to take a pragmatic approach, as things are usually not as binary as these simple rules of thumb assume. The real world is typically very nuanced, which is why engineering can sometimes seem like more of an art than a science, and why experience is so valuable.

Shameless plug - I hope this attitude comes across in my book that focuses on web application performance issues: "ASP.NET Core 1.0 High Performance" (https://unop.uk/book/).

overgard · on Sept 20, 2016

Yes and no; we all know the Knuth quote etc, but there are a lot of design decisions you have to make up front (language, database, server, framework, etc.) that you can't change later, but which will set the floor and the ceiling for what your performance looks like.

For instance, if you're working in a resource constrained environment, garbage collected vs not garbage collected is a big decision. Or if you're working on a web app, how you layout your database tables or your nosql equivalents is going to have a huge effect, and is much harder to change later.

There's a huge difference between premature optimization vs making decisions that will have performance consequences down the road. If you wait till the end of a project/release cycle to think about performance, the amount you can do about it will usually be disappointing.

losvedir · on Sept 20, 2016

What's the difference between "work" and "right"?

sgift · on Sept 20, 2016

Edge cases. When something "works" it handles the requirements of your customers in the normal case, but may misbehave in edge cases. When it is right it won't.

arethuza · on Sept 20, 2016

Correct functionality and good implementation respectively.

jnordwick · on Sept 20, 2016

If this is your mantra, you aren't really working on code that require performance. If you where, the "make it fast" and "make it right" would be the same thing.

oldmanjay · on Sept 20, 2016

Yes, it is unfortunate that this approach only works 99% of the time.

adrianratnapala · on Sept 20, 2016

What is the difference between "performant" and "fast"?

If we were talking about cars, then I would say performance includes not only speed but all kinds of things to do with handling. But in computing it always seems to mean "speed"[1], and "performant" always seems to be a bizarre neologism for "fast".

I think people invented it because speed is a many faceted thing. There is latency, there is thoughput, there is user-visible responsiveness, all at multiple interacting levels. But these complexities and vaguenesses apply equally to the quasi-word "performant".

[1] Actually the main article is a partial exception as she inconsistently includes stability within "performance". In one sentence she says "...performance degradation, including crashes, and the unbounded use of resources." But later she says "Code that doesn’t perform, or that crashes."

TickleSteve · on Sept 20, 2016

fast == absolute.

performant == relative (efficient on the hardware available).

e.g. an algorithm may be performant on a little Cortex-M but certainly is not fast compared to the same code on an I7.

adrianratnapala · on Sept 20, 2016

The problem with this definition is that it isn't true.

17km/hour is an absolute speed. It is fast for a runner, OK for a cyclist and slow for a car.

"Fast" is relative.

TickleSteve · on Sept 21, 2016

I agree in that context....

but in the context the original sentence was used, the statement stands.

English is not the most precise language....

YZF · on Sept 20, 2016

- Programming language does matter. You will not be able to get the same performance in any programming language you choose.

- Abstractions are the enemy of performance. You can't get high performance through many layers of abstraction. This relates to my previous point.

- Algorithmic complexity matters when dealing with large data sets. Abstractions can hide the true complexity. (e.g. the famous string append example from IIRC Joel Spolsky).

So the key to high performance software is:

- As close as possible to the hardware.

- The right data structures/layout (taking into account the hardware).

- The right algorithms.

- Measure properly and optimize.

sshumaker · on Sept 20, 2016

I think your point around abstractions could be more nuanced. The wrong abstractions can be deleterious to performance, thoughtfully chosen ones can actually be beneficial. Every programming language is several layers of abstraction above the hardware it runs on, which gives the compiler opportunities for optimization, or even the superscalar CPU opportunities for parallelization / reordering. And so on.

YZF · on Sept 21, 2016

I've yet to see a compiler that can beat a human hand-optimizing. It's true that for zero effort the compiler can usually do better but if you really care about performance you need to get around those abstractions. I worked on a wavelet video decoder, after a while we had a pretty optimized C code that used SSE intrinsics for large portions of the decode. Some clever guy spent 3 months and rewrote the entire decoder in assembly. It involved careful data layout and optimizing the instruction sequences. It ran more than twice as fast than our hand optimized C code. Same algorithm, input and output.

taneq · on Sept 21, 2016

> Abstractions are the enemy of performance. You can't get high performance through many layers of abstraction. This relates to my previous point.

This depends on the kind of abstraction. Layers and layers of crap will never be fast. The correct stack of abstractions, which map well to your problem space, will usually make your code faster because they will make it easier for you to see higher-level optimizations.

> Algorithmic complexity matters when dealing with large data sets. Abstractions can hide the true complexity. (e.g. the famous string append example from IIRC Joel Spolsky).

Very true. Abstractions are a tool to help understand the problem space, not an excuse to avoid understanding it.

> Measure properly and optimize.

A+. If you can't measure it, you can't control it. It boggles me how many people don't get this. "I replaced all the floating point math with integer math because integer math is faster." Really? IS IT? How do you KNOW? Did you benchmark it? "No but..." ahem sorry, I've had this argument too many times. :P

cottonseed · on Sept 21, 2016

> Abstractions are the enemy of performance.

I like to say "abstractions + compilers = performance".

flukus · on Sept 20, 2016

I'd make that last key number 1.

overgard · on Sept 20, 2016

Rule #1 is bogus. If you're starting with a language that's 2-10x slower you're always going to be behind the curve. There's a reason C and fortran haven't gone anywhere.

OskarS · on Sept 20, 2016

You're not wrong, but in the real world it frequently happens that the choice of language is not the bottleneck. The bottlenecks are usually things like memory, network and file I/O, making the language you write in make very little difference.

klodolph · on Sept 20, 2016

Well... if memory is the bottleneck, and you use a language implementation where all the integers are heap-allocated, things are not looking good. I would say that choice of language can important if your bottlenecks are CPU, memory, or maximum latency.

squeaky-clean · on Sept 20, 2016

> If you're starting with a language that's 2-10x slower you're always going to be behind the curve.

I got involved in a huge discussion about optimizing a web app recently. Things that were learned from the conversation?

1) It turns out moving more frequent cases to the top of an 'if-else' chain in JavaScript offers a greater speedup than I expected.

2) It doesn't matter if you shave 4ms off a request by optimizing your if-else chain if a little later in your code you make 27,000 database queries when you could have made 2.

Knowing how to write a performant app will always be better than writing sloppy code in the fastest means possible.

overgard · on Sept 20, 2016

That's kind of besides the point though, yes your bottleneck may not be CPU, but if it is, then language matters a lot

squeaky-clean · on Sept 20, 2016

It's exactly the point. If you don't know what makes an app performant or not, moving to C++ is not going to magically make it any better if you're doing brute force operations everywhere.

> yes your bottleneck may not be CPU, but if it is, then language matters a lot

Even if your bottleneck is the CPU, it does not mean it's the programming language. A quicksort in python is faster than a "while not sorted randomly shuffle this list" in C++.

Sure, moving to another language may be faster. But that is beside the point. The article even says "The programming language doesn’t matter as much as the programmers’ awareness about the implementation of that language and its libraries."

So this whole discussion is under the assumption that a developer does not know what it takes to make an app fast. If you're using arrays in operations where you have to insert into the middle a lot, and never have to iterate over all of it sequentially, you probably should be using something like a linked list. Choosing the proper data structure (and learning why) should take priority over the faster language.

tomsmeding · on Sept 20, 2016

27000 instead of 2 database queries would seem to be more than just a performance problem...

sokoloff · on Sept 21, 2016

Yet it absolutely happens in Linq-to-SQL and other client-side join solutions. Within the last year, I've been in a post-mortem where this was the root cause of progressive slowness and eventual timeout (when the record set became progressively larger).

acomjean · on Sept 20, 2016

To some extent this is true. But built in datatypes that are performant are more likely to be used so thats an advantage some of the higher level languages have.

Truly optimized C code will be faster than perl/python/php. But given the same problem the perl/pyhton/php programs will tend to use hashed data types (dictionaries in python, arrays is php) and end up with fast code. Sometimes faster than C code because the writing/using hashing in C is more difficult and its not always used.

foxhop · on Sept 20, 2016

Performant is not an adjective, it is a noun. "One who performs". Software can perform poor or well. Performant does not mean it performs well.

http://weblogs.asp.net/jongalloway/performant-isn-t-a-word

This matters. Software can perform well at its job (not crash, get the correct answer) but may not perform efficiently (takes a long time, has unbounded resource use, uses a brute force pattern).

ulber · on Sept 20, 2016

If you subscribe to descriptive linguistics at all (which you must if you believe languages can naturally change) a quick google search reveals that performant is actually very much an adjective (i.e. it is commonly used as one to the point that one would be considered obtuse to refuse to recognize it as such).

To be fair one can also find a fair number of discussions about whether performant is a word in that google search, so "performant as an adjective" is clearly a newish use of the word.

Edit: after reading your reference I can comment that journal/book editors mostly do not subscribe to descriptive linguistics and it is probably good that they do not (to not take chances with parts of language that might turn out to be fads).

gsnedders · on Sept 20, 2016

There are plenty of published journals going back decades that accept performant as a term of art.

adrianratnapala · on Sept 20, 2016

Your edit is a good pont. The prescriptive/descriptive dichotomy always seems a bit odd.

If you are a linguist: i.e a social scientist who job it is to explain how languages work in this world. Then you must describe.

If you are a language teacher then your job is to prescribe. And you make a good point that journal editors have good reasons to do the same. (Especially when looking after authors from STEM fields!)

rossitter · on Sept 20, 2016

> Performant is not an adjective, it is a noun. "One who performs".

It has been used as a noun, but rarely. You are thinking of the -ant formation seen in informant: one who informs. But this is not the only -ant in English, and it is not the one used here.

That which is resistant, resists well; it offers a good amount of resistance. Those who are insistent, insist strongly; they make plenty of insistence. That which is compliant, complies fully; it is very much in compliance.

That which is performant, performs well; it offers (a) good performance.

As to what good performance is:

> Software can perform well at its job (not crash, get the correct answer) but may not perform efficiently (takes a long time, has unbounded resource use, uses a brute force pattern).

This is a reasonable objection—I agree with it, wanting speech to be plain—but here is something to oppose it: I suspect that just about everyone who clicked on this article, including the two of us, knew precisely what the author wanted the word to mean, even if they had an objection to that use of the word. Performant has sprung to life, and it describes an efficient performance.

As for what I think of the word: the English language is already rich with others which would do just as well, which is probably why this one seems so jargony. It is one of those technical words that sounds more like a social signal—"I know what I'm talking about"—than something precise: "This is what I'm talking about."

lucozade · on Sept 20, 2016

Unless it's being used an an adjective, in which case it's an adjective.

rubidium · on Sept 20, 2016

http://xkcd.com/1735/

p333347 · on Sept 20, 2016

Poor xkcd. Frowning upon people freely interchanging their, they're and there is not being grammar police (unlike complaining about split infinitives, dangling preposition etc). It seems such people that make this mistake are too lazy and arrogant to correct themselves and are too eager to call names. The placard ruined what would have been an average comic..

aslammuet · on Sept 20, 2016

This is in fact not a discussion about the semantics of performant.

JoeAltmaier · on Sept 20, 2016

It was made into one.

ryandrake · on Sept 20, 2016

By using "performant" in the title (which is known to set off the language nazis), the author is practically begging for the discussion to devolve into this tangent.

It's imprecise: What is it even supposed to mean? Because it's a made-up word, who knows for sure? Does it mean "better performing?" If so, why not just say that? It's not that many more keystrokes. Better performing in what way? CPU? Resource utilization? Say so. You've put a lot of thought and effort into writing something, why blow it by using an imprecise pseudo-word? The author is undermining his own credibility, telling us "I don't care enough about the topic to even pick an actual word, let alone summarize, in more detail, what I mean to discuss."

MaulingMonkey · on Sept 20, 2016

All words are "imprecise" and "made-up".

My understanding is that performant is totally legal French, meaning efficient or effective, with usage dating back at least 4 decades. If you're not into stealing random words from other languages, I have to question why you're into English in the first place.

c22 · on Sept 20, 2016

I wonder which words were not made up. The true name of God?

qsymmachus · on Sept 20, 2016

Fight the real enemy, like the word "learnings"

daemonk · on Sept 20, 2016

From a data analysis perspective (which seems like some of her rules are related to), my number one rule that I always try to follow is to: work with the data, not with the tools. It can be really easy to go down the rabbit hole of messing about with the perfect tool for your analysis and end up with no results.

Optimization is important so far as you need it. If I can launch a bunch of aws instances to get a single-run job done in an hour, then I'll throw hardware at the problem instead of worrying about my code. I care about the analysis results, not necessarily how performant my method is.

If I'll need to run the analysis multiple times or I plan to publish it as a tool, then thats another story.

gwbas1c · on Sept 20, 2016

A common recurring theme in my career are horribly-performing applications because the original programmers worked with an ORM instead of the database.

Database code isn't hard! An HBM file is just as complicated as handling a data reader! Stored procedures (or in-line SQL) are simpler in the long run!

ansgri · on Sept 20, 2016

I also don't get the ORM stuff: typically you work either (1) with lots of different objects organized in some kind of document (in NoSQL sense), and you want all these objects to be predictable, free from side effects and clearly serializable to e.g. JSON, or (2) with bulk data which you transform and aggregate to come with a small collection of fully constructed (also, denormalized) objects, and for this purpose SQL is beautiful. Or else they wouldn't've invented Linq to Collections.

tofflos · on Sept 20, 2016

I believe there should be a stronger distinction between decent performance and high performance. Most frameworks coupled with idiomatic code will give you decent performance.

I find controlled experiments / microbenchmarks to be a useful method for finding the initial bottlenecks of a system. Once I know the initial bottlenecks I can make an informed descision on whether that performance is sufficient.

Controlled experiments / microbenchmarks are also essential in establishing a ballpark number for the theoretical maximum performance of your system. From that point on, the performance is yours to lose.

catnaroek · on Sept 20, 2016

> It is critical that people question “how does this magic actually work?”

No. It's critical that people question “why do I have to rely on magic?” in the first place. Even in high-level languages, perhaps especially in high-level languages, it's a good idea to write straightforward code.

> there is an appalling lack of candy in the C/C++ ecosystem (...) performance isn’t hidden

C and C++ are very different languages, and there is no shortage of C++ libraries full of incomprehensible magic. When debugging template-heavy C++ code, it can be very hard to determine where expensive and unnecessary object copies are created. Thank you C++, for making copy construction implicit!

> Are you producing too much garbage unnecessarily?

The high-level programmer's best defense against creating too much garbage is programming with values (whose redundant representations in memory can be automatically deduplicated by the runtime system) rather than object identities.

> Are your dictionaries too big to the point of being inefficient?

The problem isn't the dictionary abstraction, but rather the implementor's choice of underlying data structure. If you have a really big dictionary whose keys are strings (a common use case), you want tries, not hashtables.

> string concats can be replaced by string builders in the same amount of lines,

Using string builders is a low-level chore, and defeats the point to using a high-level language. A better alternative is using a persistent list/string data structure that actually supports efficient concatenation.

> Does your program start ad-hoc threads? Use a threadpool with fixed size.

Again, too low-level. A programmer working with a high-level language should be able to spawn as many green threads as she wants to, and let the language's runtime system handle multiplexing those green threads over OS threads.

> Unless you are 100% sure the lines are always of reasonable size, do not use readline.

This is terrible advice. If readline is causing you problems, you are using the wrong string data structure.

---

Okay, I'm done bashing what could be bashed. The last two items in the OP are actually good.

perfmode · on Sept 21, 2016

> Again, too low-level. A programmer working with a high-level language should be able to spawn as many green threads as she wants to, and let the language's runtime system handle multiplexing those green threads over OS threads.

Do this in Go and you'll be surprised how quickly you run out of memory.

catnaroek · on Sept 21, 2016

Sounds like a reason not to use Go as a high-level concurrent language.

perfmode · on Sept 21, 2016

Sounds like infinite threads is wishful thinking.

catnaroek · on Sept 21, 2016

You understand the difference between “as many green threads as the programmer wants” and “infinitely many threads”, right?

perfmode · on Sept 21, 2016

Yes. Not finite. Unbounded. Finite == bounded == pool.

catnaroek · on Sept 21, 2016

Unbounded isn't the same as infinite. For example, if you have a fair coin, the number of times you have to flip it until you get 5 tails is unbounded but finite.

The problem with creating infinitely many threads isn't even space. Even if you had an infinite amount of memory, programs are supposed to complete their tasks in a finite amount of time, so you shouldn't spawn infinitely many threads, because it's an unreasonable thing to want. On the other hand, spawning 100k green threads is a perfectly reasonable thing to want. It's the language runtime's job to multiplex these 100k green threads over 4 or 8 or however many OS threads make sense.

dllthomas · on Sept 21, 2016

> bounded == pool

Pool implies pre-allocated.

marknadal · on Sept 20, 2016

Glad this submission is being upvoted but I think there is room for more practical and direct advice. Here are some resources for JS:

- https://github.com/petkaantonov/bluebird/wiki/Optimization-k...

- https://github.com/amark/gun/wiki/100000-ops-sec-in-IE6-on-2...

- http://danieltao.com/lazy.js/

sillysaurus3 · on Sept 20, 2016

Do threads pass data among themselves? Use blocking queues whose capacity is a function of the max amount of data that can potentially be waiting without exhausting the memory.

I.e. a ringbuffer. The Disruptor is an efficient and simple way to coordinate this. https://lmax-exchange.github.io/disruptor/

brianwawok · on Sept 20, 2016

Disruptor is nonblocking. I think the OP is talking about actual blocking queues which are generally bad in high performance code.

sillysaurus3 · on Sept 20, 2016

Disruptor blocks when the ringbuffer fills up.

brianwawok · on Sept 20, 2016

Touchè. Most people size it large enough to never block on insert, so it is usually considered a non-blocking queue. Reads never block for example.

MyNameIsFred · on Sept 20, 2016

While I agree with the author and like the article, I suspect that it would only make sense to people who already accept these things as true.

hackits · on Sept 20, 2016

If you chase two rabbits you will not catch either one.

Little bit of background here before I begin. I did my comp-sci and mathematics degree and even though they have been useful in some degree or another they both were mostly a complete waste of time in my experience. For the vast majority of the clients and projects that I've had since finishing my university they've all revolved around fixing framework problems or bugs within large code bases. Just today I fixed a massive Land Titling (Enterprise Java/Oracle) bug where the original underlying framework would leak connections/memory until it fell over after 3-4 hours of usage. Other bugs such as off by 1 problems, data translation, data encoding. The vast majority of the time its been incorrectly designed frameworks or replacing frameworks with another framework.

For most of the work out there (I would guess 90% of it) its maintaining and supporting clients to achieve their business goals. Completely un-interesting stuff but get a good reputation of getting stuff done and they don't even wink at your asking price.

gens · on Sept 20, 2016

gens's* law of writing software that performs well: Don't be smart.

Do simple data structures where you can. Arrays, AoA, SoA, AoS. Process data in bulk where you can. Put complex algorithms only where needed (kd-tree when doing something like raytracing, threading only when necessary and even then only with bulk processing, complex locking or timing mechanisms only when you have to, etc.). Don't follow paradigms blindly.

Basically data should be processed only in a functional way or in "waves". When there are only a handful of variables, a functional way will keep them in cpu registers. When there is more data, going over an array will make the cpu load the data that is to come next into the cache.

Oh, and write C.

[*] gens is a hobby programmer and possibly an idiot

kod · on Sept 20, 2016

Isnt item 4 basically encouraging microbenchmarks? Microbenchmarks can be misleading, and there's no guarantee you're benchmarking the important things. It mentions profilers in passing, but i find it more productive to start with a profiler.

cs702 · on Sept 20, 2016

I would add another law: Code organized in small, easy-to-read chunks >> code organized in long, dense chunks.

xixi77 · on Sept 20, 2016

I would even say code organized in small, easy-to-read chunks >> code organized in long, dense chunks >> code organized in lots of tiny, impossible to read (because calls go back and forth all over the place) chunks :)

cs702 · on Sept 20, 2016

OMG, Yes!

syngrog66 · on Sept 20, 2016

related to this you may want to check out my cheatsheet PDF on software performance & scalability:

http://synisma.neocities.org/perf_scale_cheatsheet.pdf

welcome feedback and ideas for things to add or change

signa11 · on Sept 20, 2016

but there is no such word as 'performant' ! wiktionary doesn't count :) note that efficiency and performance are _not_ the same thing.

improving efficiency implies doing less work for the same end-goal. thus, when a program is 'efficient' then it is doing the minimum amount of work that the computation demands. or in other words, we have the best algorithm around for some kind of complexity argument for the task at hand. an algorithm which is efficient is not wasting anything.

performance on the other hand, implies, how quickly the work that is to be done is actually done. basically performance improvement would allow you to do the same amount of work faster (in time).

is the program at any given point in time (during its execution) doing maximal speed of work ? that doesn't seem to make much sense.

infact, in practical terms, what is the 'maximal speed' of work ? <theoretical constructs like Bremermann's Limit doesn't count :)>

programs can perform 'well enough', but that doesn't mean we are all done with it...

carapace · on Sept 20, 2016

Yeah I hate 'performant'. It's used as a sloppy synonym for "high performance". It's like calling a high voltage circuit box "voltant". Which to me implies just that it "has a voltage" But you can only have a voltage relative to some other level of charge somewhere else. Volts are an interval between charges, everything has a voltage (), so everything is "voltant".
Performance is the same. Everything performs, you can only talk about higher or lower performance of one system relative to another. Just calling something "performant" implies that it is "high performance", but leaves out what it's performing better than.
I hate that word. lol

( except not: everything has a charge; we assume some zero level charge and we can say that everything has a voltage relative to that level. That's why there are two words: charge and voltage.)

wodenokoto · on Sept 20, 2016

Performance us actually much,much broader than the speed at which a task is completed.

The performance of a classifier can be anything from how fast it runs, to its F1 classification score.

signa11 · on Sept 20, 2016

> The performance of a classifier can be anything from how fast it runs, to its F1 classification score.

but classification performance would be the efficiency of a classifier algorithm right ? i.e. can i get better classifications / generalizations etc. in lesser time.

wodenokoto · on Sept 20, 2016

Yes and no. It can be better in less time, it can also be just better or just faster (but worse) all depending on what is important.

Classifier A gets F1 score of 0.93 on test set has a lower performance than classifier B, which gets an F1 score of 0.94, regardless of time.

However, you can also say that classifier C, which only achieves an F1 score of 0.929 has better performance, since it is four times as fast.

rahilb · on Sept 20, 2016

Forget _laws_, I just want to know how the dlang forum is so fast.