Lessons after five years of professional programming

markokocic · on April 3, 2013

First advice is horrible. One shouldn't do in application things that are meant to be done in (relational) database, like ordering, grouping, joining... I guess 66% of my application performance tuning was removing iteration over large result sets in application code with a few lines of SQL. The other 30% being making sure the database is not hit with redundant queries.

I have rarely seen the case where minimizing the size of result set returned by database to the application is not the right choice.

EDIT: I'm speaking about relational databases.

Other advices are good, though.

vidarh · on April 3, 2013

It is not horrible advice once your system hits hard limits of the database system. Depending on your database system you can hit those fairly quickly.

It is often far cheaper to scale an application over many boxes by extracting data from your canonical database into a set of in-memory read-only search structures, for example, and delta-index and merge changes regularly.

It is similarly often far cheaper to sort and group large dataset outside the database because sorting and grouping are simple to parallelise over multiple machines working on in memory subsets and doing cheap merges at the end.

If your system can run at reasonable speed in your RDBMS, sure, do that rather than reinvent the wheel.

But when you find yourself maintaining complex trees of replicas, it is often worth testing if you can do better with specialised middleware that can selectively throw out guarantees your RDBMS can't because it would violate guarantees it is meant to provide and that can otherwise make use of specialized characteristics of your data.

E.g. you don't see people running large search engines out of RDBMS's. For a simple reason: while many RDBMS's provide full text search, you can do it far faster when you realize that your full text index is "always" going to be catching up, and so once you exceed the threshold where a single RDBMS doesn't serve your needs anymore (and often before that) you can save massive amount of resources by building small, frequent deltas of changes, distributing them to however many app servers you need, and gradually merging the deltas into larger chunks to keep the number manageable.

There are a lot of scenarios like that where moving the logic out of your data store makes sense.

mfenniak · on April 3, 2013

I think the advice in point #1 only makes sense with a caveat: When performance is an issue, if you can calculate or process it at the application layer _without adding load to the database layer_, then take it out of the database layer.

For example, order-by w/ limit/offset? If you do that at the application layer, you've increased the I/O usage on the database server, and clearly this advice doesn't make sense.

On the other hand, group-by (assuming no HAVING)? If you do that at the application layer, you've reduced the load on the database server, increased the load on the network, and you've probably made a justifiable performance vs. scalability trade-off. If you measure it and back it up with data.

I think the advice here in the article is too broad, but I can see a kernel of wisdom in it that is non-intuitive.

markokocic · on April 3, 2013

Well, without a caveat the advice is plain wrong in 99% cases, since that caveat changes the advice completely.

primitur · on April 3, 2013

I agree with you - I think this bit of advice addresses the effect, not the cause, of an ignored problem: design, design, design.

Design your database for your application - if you have some major hassle with your database after App 1.0 is developed, its because you missed this very important step.

Don't treat the .db like its an architectural sandbox, adding/tweaking/removing things 'to make it better' after the fact. Careful .db architecture means, once your app reaches a functional state, your .db should be already pretty much immutable.

Maybe the solution for those who can't escape this flaw is to simply build the App first, then the .db, so that there is little chance for the .db to be screwed up in the first place, who knows ..

np422 · on April 3, 2013

If your performance bottleneck is at the database layer it's a difficult problem to solve, scaling relational databases requires all kinds of witchcraft and magic such as read-only replicas, sharding, result-caching ...

But it is very simple to add another application server and update your load-balancers configuration.

Some large (internet) applications have moved operations such as joins to the application and only use the database for simple storage.

I think it's a reasonable advice.

hackerboos · on April 3, 2013

I agree. Why would you sort in memory? RDBMS have been heavily optimised to do these things.

vidarh · on April 3, 2013

You would sort in memory when your dataset and/or number of clients is large enough that you would need to distribute your dataset over many servers and the overhead of replication with your preferred RDBMS is too high. E.g. large sorts can be reasonably well parallelized (split the dataset into chunks, sort each chunk, and zipper merge the sorted subsets on a machine collecting the result), and similarly caching the dataset on however many app servers you need in order to do sorts often makes it worthwhile even for much smaller datasets.

If you can run everything on one machine, or your RDBMS handles sharing and replication efficiently and easily, then by all means try that first. But you can often beat the pants of RDBMS' for specialized scenarios.

ccheever · on April 3, 2013

i think this is actually reasonable advice because in most systems, the application layer is pretty stateless and can easy be scaled out to more machines whereas the db is usually a central bottleneck.

stiff · on April 3, 2013

I think too that this was his point, it only might make sense in very specific scenarios though, for day-to-day web-dev this is terrible advice as others have said.

rombdn · on April 3, 2013

I agree - on the majority of entreprise system (ERP) I have worked this was the case. I can confirm that GROUP BY are huge performance killers and sometimes the simplest solution is to avoid them by slightly changing the business logic if possible.

Nursie · on April 3, 2013

Concurrency should not be avoided just because it's scary, especially in the modern world of multiple cores/processors everywhere. Threads can get you great performance gains. Yes, there are particular problems in this area, but they're not impossible to overcome or a guaranteed way to introduce horrible bugs.

Also I agree with the other comment on SQL layers.

I think the main lesson learned after 5 years ought to be that you still have a lot left to learn.

billN · on April 3, 2013

1. When performance is an issue, if you can calculate or process it at the application layer, then take it out of the database layer. order by/group by are classic examples. It’s almost always easier to scale out your application layer than your database layer. As true for MySQL on your server as it is on the sqlite in your handheld.

I disagree. RDBMS are highly optimized for this kind of operations (including MS SQL in here as well) and you’d better off retrieve in the application layer only what you actually need to work on. Think about paging: why would you want to retrieve millions of records in the application layer when you just need a few hundreds to work on? You’re going to waste SELECT time, connection time, application layer processing time (as I believe you’re logic won’t be as optimized as the one that resides in a well optimized RDBMS)

2. Concurrency, avoid it if you can. If not, then remember that with great power comes great responsibility Avoid working directly with threads if you can. Work at a higher level of abstraction if possible. In iOS, for example: GCD, dispatch and operation queues are your friends. The human mind was not designed to reason about infinite temporal state—I get nauseous thinking about how I learned all this first hand.

Queue patterns are cool, but concurrency sometime gives you a great deal of performance improvement. Sure, it’s dangerous, fragile and can mess things up real quick. But I wouldn’t just exclude it because of this. From great developers _is expected_ great responsibility.

3. Minimize state as much as possible, and keep it as localized as possible. The functionalists were/are onto something.

Good point, without exaggeration.

4. Short composable methods are your friend.

Agree, as long as you don’t end up in a compulsing-composive behavior where you abstract and split every method into meaningless individual parts. I’ve seen pojects containing 50 folders, each of them with one file and one method, from people trying to follow this pattern.

5. Comments are dangerous since they can get out of date and mislead, but so is not having them. Don’t comment the trivial, but strategically write paragraphs if needed in specific sections. Your memory will fail you, as soon as tomorrow morning, even after coffee

Code should be self readable, and if you get to the point where it isn’t, then you may have to rewrite few bits. This is not always possible though, especially for temporary hotfixes or hacks. In this case, comments are a must.

6. If you feel one use-case scenario will “probably be ok”, that’s the one that’s going to lead to catastrophic failure a month in production. Trust your paranoid gut, test and verify.

I tend to be more concerned about the scenario you feel 100% confident about. In case of major failures, you’re going to be looking at the dubious parts of your system (scenarios, code, patterns) – making it more difficult to spot the issue, if it’s hidden inside a “safe” part.

7. When in doubt, over communicate all concerns with your team.

Communicate and discuss all concerns as well as proposed solutions.

8. Do the right thing—you usually know what that thing is.

Couldn’t agree more. This is very difficult to explain, but you may find yourself sometime writing code and you know that the right thing takes few hours more and requires more effort… don’t keep doing what you are doing just because “maybe you won’t need it”. Always go for the right thing. If you don’t, it is likely going to bit you back.

9. Your users aren’t stupid, they just don’t have the patience for your cut corners.

Or simply they don’t get your UX. User testing is key to success.

10. If an engineer is not tasked with the long term maintenance of the systems they build, view them with suspicion. 80% of the blood, sweat, and tears of software occurs after its been released—that’s when you become a world weary, but wiser “professional.”

I would expect every engineer in my team to write self readable, well planned, “long term” code. Even if it is for the stupidest internal tool that nobody else is ever going to upgrade. And most of the times this does not slow down development (short and long term). It just becomes automatic.

11. Checklists are your friends.

Kanban boards, checklists, basecamp, whatever helps your team to track stuff and get things done.

12. Take initiative to purposeful enjoy your work, sometimes this will take effort.

True, although sometime you have to do the boring parts as well – you’ll have to do things who sucks, to make sure your customers won’t.

13. Silent failures, I still have nightmares. Monitor, log, alert. But be wary of false positives and the inevitable desensitization it leads to. Keep your system senses clear and alert.

Spot on. Don’t end up with a notification system that sends you thousands of exceptions to your mailbox everyday. You’ll start to ignore them and it’ll be difficult to find out real issues. Keep your monitoring clean. Fix issues, or deprioritize them.

14. At the end of the day, we’re paid to manage complexity. Work accordingly.

…and make complexity simpler for our users.

pyre · on April 3, 2013

  > Code should be self readable, and if you get
  > to the point where it isn’t, then you may have
  > to rewrite few bits. This is not always
  > possible though, especially for temporary
  > hotfixes or hacks. In this case, comments are
  > a must.

* Sometimes business decisions may not make logical sense, but someone says "do it this way." It makes sense to comment this in the code.

* The code might tell you what it's doing, but not necessarily why it's doing it.

* You may be making use of legacy components that you are unable to rewrite. It may make sense to comment on their use within newer code that interfaces with them, making it possible for people to bugfix the interface without needing to delve all the way into the legacy component.

billN · on April 5, 2013

Agree - but ideally these are just few exceptions. If these practices take over, then we may have a bigger issue to deal with and have a look at some serious refactoring of the infrastructure. e.g. too many legacy components may just require some rewriting or, if not possible, some kind of wrappers or facade patterns that makes the behavior self-readable (to the point where the legacy code is used).

snake_plissken · on April 3, 2013

Toes agree. It's the why you are deciding to do something a certain way that's important.

rombdn · on April 3, 2013

1. I disagree, on many ERP the DB is totally overloaded and the middleware is sleeping (and/or easier to scale). I think this is the kind of case where the ideal architecture paradigm has to be bent.

Your others points are really informative thanks.

sageikosa · on April 3, 2013

I'll make a qualified concur with you. Order by and group by are often best lest to the database, where (proper) indexing can provide streamed access to large sets of data. Relatively static lookup value substitution and security checks (via permissions/user tables) are best moved closer to the user.

I'm also a big fan of building transactional updates in "service-space" and using transactional coordination to ensure ACID, rather than making bulky stored procedures to churn over bits of procedural data.

facorreia · on April 3, 2013

I concur. Exadatas ain't cheap and they're selling like hot cakes. It's cheaper and quicker to scale horizontally than vertically and that is particularly true when the database becomes a huge performance bottleneck taking several DBAs' full attention trying to deal with I/O contention.

smoyer · on April 3, 2013

This "young'in" seems to be learning ... with a few more insights we'll have to promote him to management so he doesn't accidentally share this knownledge with anyone.

kbenson · on April 3, 2013

I may be interpreting them the wrong way, but whenever I see articles about wisdom accrued after X years programming where X < 10, I almost always come to the same conclusion. Wait another 5-10 years, and then see what you think.

What's the difference between an article like this and something by Carmack? The true veteran doesn't usually write pithy numbered lists to follow (and usually, they don't write articles at all, it's someone interviewing them). They teach by example. They explain a situation, what their problem was, and how they solved it. This allows the reader to really see why it was (or wasn't!) a better choice, and understand where it may or may not be appropriate in their own circumstances.

I guess it boils down to me thinking that anyone willing to write a list like this probably knows just enough to be dangerous (which items on this list could be, depending on how they are interpreted). So again, wait another 5-10 years and then see what you have to say on the subject.

akg · on April 3, 2013

I agree with #12 and although it is important to enjoy your work, I find it is very useful to take initiative and really think about "Why" you are doing something. The "Why" helps me get past many hurdles, keeps me motivated even during boring tasks, and almost always ends in a result I can be proud of.

_glass · on April 3, 2013

"7. When in doubt, over communicate all concerns with your team." I hope someone would have given and stressed this advice right in the beginning of my career. Every time I communicated for my taste too much, it was overly helpful. I know some individuals were they are really communicating too much, but even daily newsletter of your current state to other people in the process chain seem to work just fine.