WTF Is Big O Notation?

roel_v · on March 26, 2019

(Warning: nerd sniping incoming)

The main problem with this article is that it doesn't stress (or even mention, as far as I can tell) that it's performance characteristics at the margins. Meaning, it's about a generalized operation on a very large data set. Your fancy hashing algorithm might be slower than just doing a linear scan over a 10-element table, not to mention the maintenance of your tree structure. In the article, this is most obvious in the following sentence:

"Let’s say we have 1000 records in our film table. To find “Academy Dinosaur” our database will need to do 1000 operations (comparing the title in each row)."

No you don't. If your table is sorted, you'll need only 1 operation to find 'Academy Dinosaur', because it's your first element. What does happen is that on average, to find any title, you will need n / 2 operations (so not 1000, but 500, in this example - this part is plain wrong as it's written). Of course, n / 2 is linear in n, so the complexity stays the same. But my point is that to understand big O notation for real world impact, you first need to understand the concepts the article mentions, and then you also need to be able to accurately assess the lower-dimensional terms that are left out, and estimate them.

(preempting counter sniping: this is assuming that titles are unique)

(to put this in other words, an algorithm or data structure operation described as being 'O(n)' could really be 'O(n) + x' where x is a constant; but we leave that out because at the margin, it doesn't matter (i.e. is dwarfed by the O(n) term. But when n is small and x is large, x becomes dominant, so assessing the effects of the complexity, in the real world, need to keep the expected magnitude of n in mind.)

opportune · on March 26, 2019

NO. BIG O notation is not average unless specified as such! It is an equivalence class of the provable upper bound on the execution time of the algorithm on an arbitrary input. There are algorithms like quick sort where average runtime != “big O of quicksort”

umanwizard · on March 26, 2019

This is completely wrong. Big O has nothing to do with algorithms. Big O notation can be used for any mathematical function from an ordered set to another ordered set, including average runtime of an algorithm as a function of data size, worst-case runtime of an algorithm as a function of data size, guaranteed amortized runtime of an algorithm as a function of data size, or anything else (including things with nothing to do with computation -- it's perfectly reasonable to say things like x^2 + 5x + 2 is O(x^2) and just be talking about pure math with no reference to algorithms).

If someone asked "what is the runtime of quicksort in big O terms", it is ambiguous whether the answer is O(n^2) or O(n*log(n)) , since it's not clear which function you're talking about.

enedil · on March 26, 2019

To be honest, this is false too. You need functions with values that can be multiplied by scalars. If you just have ordered set, then you can't account for this constant from the definition. Alternatively, you could say that values of the function need to be from a normed vector space (but I'm not sure if it can be extended furthermore. Maybe normed C-module?)

umanwizard · on March 27, 2019

Sure, but that doesn't change my point.

roel_v · on March 26, 2019

Sure that's what it is formally, but that's not the applied meaning the article is about. Take an hypothetical search algorithm that is O(1) except when the container you're searching is of length 50, then it's O(n). That makes the algorithm O(n), right?

(yes this is a stupid contrived example, and I know one can make any case when one can make up any reality etc. - but I think it's a valid style figure in this case, to contrast the theoretical definition with the application).

Would you toss out that search algorithm and say 'well we have algorithms that search in better than O(n) so forget about this one'? No, you'd do 'if (container.size() == 50) { return container.bisect_find(key); } else { container.lalaland_find(key); } And that's exactly the difference between the theoretical concept and the application.

(although I'll admit that I started out writing the comment trying really hard to avoid the word 'average' because it's so confusing and then it still slipped in)

joshuamorton · on March 26, 2019

No, that algorithm is O(1).

50 is a constant. As n -> infinity, the algorithm runs in constant time.

Specifically, I can pick some number C such that C * f(n) > the number of operations, then the algorithm is O(f(n)).

So I choose f(n) == 1, and C == 100.

Then the runtime of the algorithm is 1,1,1,1,1...., 50,1,1,1,1,.... For all values of n, this is less than C * f(n) == 100, a constant, so it is O(f(n)). f(n) is 1, so it is O(1).

roel_v · on March 26, 2019

Yes you're right - let me modify my contrived example and say 'at exactly midnight, the complexity is O(n)'.

(the cool part about contrived examples is that you can keep making things up until it fits - now I'm just hoping I didn't miss something again like the first time :) )

joshuamorton · on March 26, 2019

Still doesn't work. Waiting one minute is constant time. (to be clear, I'm going to be able to come up with a counterexample for anything you throw at me because you're assumption here is just false, in general an algorithm that is O(f) for all values except n is O(f), because you can pick a constant greater than n, no matter what the dimension n acts in is).

(If you reject that and claim that just because I can construct an algorithm that is O(1) from yours doesn't mean that your algorithm is O(1), I'll point out what you presented isn't technically a function, since its not mathematically pure, so the concept of Big-O is ill-formed)

roel_v · on March 26, 2019

Well but (I think) what you're talking about now is the complexity of the 'wrapper' function I posed, not the complexity of the hypothetical algorithm I started out with. Yes you can add an indirection that converts it back into O(1) but the original, but that wasn't the point. I think. But then again, maybe what your second paragraph is saying is exactly that? I'll admit that I lost track of things somewhere between writing my original comment, switching back and forth to my code between compiles and now being back here :)

Also, I don't understand what you mean by 'mathematically pure', but that's on my end.

joshuamorton · on March 26, 2019

So there's two ways of looking at it, the more handwavy is "my construction that treats your algorithm as a black box does only ever strictly more work than your algorithm. If mine runs in a constant number of operations, yours must as well." And certainly your algorithm runs in amortized constant time (which is a weird phrase).

The other way is to ask "what is a function"?

Normally, a sorting function is expressed as

    sort(List[Comparable]) -> List[Comparable]

your sorting function however has a different way of working. It has an additional input, the time. Now you might argue that you aren't taking the time as input, but you are...somehow. So in reality, your function has a signature

    sort(int, List[Comparable]) -> List[Comparable]

There's this extra int input which is weird, so you don't have. Computational complexity tools like O-notation don't concern themselves with such things. O notation only cares about how the runtime changes as the size of the input changes.

If the size of the int "time" is bounded (which it is, since the max is 86400), then one of the inputs is constant. It can be ignored.

Purity in this context is the idea that the function only acts on its inputs, there are no external forces that modify how it works. Pure functions are mathematical objects, impure functions aren't really.

So once you convert your construction to being pure, it becomes more clear why the function is ill-formed: you've got this extra constant size input (that runtime analysis can ignore) that you claim affects the runtime. Something is clearly amiss.

robconery · on March 26, 2019

The Big O of an algorithm does not change based on data size. Even if your set had precisely 1 record in it - the code you write, if it loops over every item in the set (even if it's only one item) - is still O(n).

roel_v · on March 26, 2019

Yes I know that - was this reply meant for another comment?

TheCoelacanth · on March 26, 2019

No, it's best-case O(1), worst-case O(n) (technically O(1) since there is a constant upper-bound, but let's pretend you said it's O(n) when length is divisible by 50 instead of equal to 50) and average-case O(1).

Each of those functions has its own big O, without specifying which function you are talking about big O isn't meaningful.

worik · on March 26, 2019

...an algorithm or data structure operation described as being 'O(n)' could really be 'O(n) + x'

No it should not. (O(N) + x) is O(N). That is how it works, is spoken about... What it means!

It is a very simple minded article, and IMO misleading, but it is better than "(O(N) + x) is bigger than O(N)". In the sense of order analysis, it is not.

roel_v · on March 26, 2019

Well yes, that's what I said, although I guess I could have used 'n + x operations' to keep it consistent with the article and still in line with the formal definition.

enedil · on March 26, 2019

Yeah, you really could have. It also grinds me when people will say "it's O(n), not O(n^(3/2))", because the former is a subset of later, not to mention talking about worst case/average case/best case. Big O notation has nothing to do with probabilities, literally nothing.

c3534l · on March 26, 2019

> What does happen is that on average, to find any title, you will need n / 2 operations (so not 1000, but 500, in this example - this part is plain wrong as it's written).

You're conflating theta and omega notation for big O. Big O is the worst case scenario. The average or even typical operation doesn't matter. If the algorithm is the most complex when sorting a list of items that were already sorted in descending order, then that is the use-case that is considered when figuring out Big O. If you want average, use Big theta.

teraflop · on March 26, 2019

No, this is also inaccurate. Big-theta and average-case complexity are entirely different concepts.

Big-O/theta/omega are all ways of categorizing functions. Those functions typically take as their argument some measure of the "size" of the input (which may be one variable or multiple). The output of the function could be:

* worst-case performance ("largest possible running time for all inputs of size N")

* average-case performance ("expected running time for an input of size N, chosen from some well-defined distribution")

* amortized performance ("largest possible running time among all possible sequences of N inputs, divided by N")

(We could also apply any of these definitions to something other than time, such as memory or I/O operations.)

For any of these definitions, we can talk about big-O, or big-theta, or big-omega in a well-defined way.

As an example, the worst-case time complexity of bubble sort is Ɵ(n^2). The average-case time complexity (assuming an input consisting of distinct elements, uniformly randomly permuted) is also Ɵ(n^2). Nevertheless, the best-case performance is Ɵ(n).

edflsafoiewq · on March 26, 2019

No, you're conflating Big O measuring an asymptotic upper bound of a function with worst-case analysis measuring the upper bound of resource use. These are different things. Asymptotic notation is totally oblivious to what is being measured, whether its a worst-case or average-case or anything else. You can give the Big O of the best case, the Big Theta of the worst case, etc.

umanwizard · on March 26, 2019

Best, average, and worst case (and any other mathematical function from an ordered set to another ordered set) can all be described with big omega, theta, and O notation. The two concepts are orthogonal.

For example:

Quicksort's average-case time complexity is Omega(1), also Omega(n log(n)), Theta(n log(n)), O(n log(n)), and also O(e^n).

Its worst-case time complexity is Omega(1), also Omega(n^2), Theta(n^2), O(n^2), and also O(e^n).

roel_v · on March 26, 2019

Fair enough, when looking at the academic origin and defintions; but googling a bit and looking at articles like https://medium.com/@.RT/total-n00bs-guide-to-big-o-big-%CF%8... (under 'The Big Caveat', third paragraph), I feel validated in my assertion that your 'Big theta' is what is in practice and colloquially is known as 'Big O', and is also what this article is talking about.

(But I'll admit that I had mostly forgotten about this distinction, and certainly couldn't tell which one is which without looking it up)

eatbitseveryday · on March 26, 2019

>> "Let’s say we have 1000 records in our film table. To find “Academy Dinosaur” our database will need to do 1000 operations (comparing the title in each row)."

> No you don't. If your table is sorted, [...] on average, to find any title, you will need n / 2 operations (so not 1000, but 500, in this example - this part is plain wrong as it's written). Of course, n / 2 is linear in n, so the complexity stays the same.

If something is sorted, search happens in O(log N) which is not linear: start with compare at N'=N/2. If "less than", go to N''=N'/2; if "greater than" go to N''=3N'/2. Etc.

If the data is not sorted, then yes, you will need N lookup operations.

victorNicollet · on March 26, 2019

I believe the point was that, if your linear search happens to return your first element, you did not perform 1000 operations.

The context is a linear scan, not alternative algorithms that could make use of a sorted data set.

roel_v · on March 26, 2019

Yes exactly.

anon946 · on March 26, 2019

Not only is 'O(n)' the same as 'O(n) + x', but also the same as 'c*O(n)', where c is a constant.

robconery · on March 26, 2019

OP here. Big O is indeed "worst case scenario" always, the size of the data set doesn't matter. An O(n) operation doesn't care if the data is sorted - even if it's the first item as you suggest. When you discuss Big O it's always worst case.

kadoban · on March 26, 2019

If Big O is always worst case, you're going to have a hard time saying anything interesting or useful about quicksort, for instance, or hash tables.

You're going to be limited to "quicksort is O(n^2)" and "hash table lookup is O(n)", both of which are quite misleading.

There are good reasons that we usually look at the worst case, but big-O does not fundamentally or necessarily mean that.

roel_v · on March 26, 2019

Well that's not what your article is saying - it says (well, not in these words, but it's at least what I got from it) that it's meant to be something practical to understanding algorithmic complexity and how that relates to code performance. And for that, the size of the data and the details of the algorithm very much do matter. Throughout these comments, people seem to be using two 'concepts' of big O and because of that, talking past each other: the academic 'provable upper boundary' concept and the applied 'what algorithm or data structure should I choose for my concrete problem, and how does complexity help me decide' concept. That last one is what is colloquially known as 'big O analysis', whereas technically that term is reserved for something else indeed. I'll readily admit that when I first made my GP comment, I didn't really clearly make that distinction in my mind either, which is probably what is the real underlying issue I was trying to point out.

umanwizard · on March 26, 2019

This is wrong. Big-O can be used to talk about any function you like, whether best case of an algorithm, worst case, average case for uniform inputs, average case over some different distribution of inputs, how many branch mispredicts a Skylake core hits while running the code on backwards sorted input, the median price of a Big Mac as a function of a country's GDP per capita, or anything else.

UncleMeat · on March 26, 2019

Big O is not "worst case scenario". It describes an upper bound on functions. You can absolutely compute the Big O of the average case runtime of an algorithm.

LifeLiverTransp · on March 26, 2019

Also not mentioned, hardware pyramid makes fools of people who just expect the O-Notation to be the outcome of measured performance. A cache optimal - Algorithm thats worser in the 0-Notation might still outperform a bad implementation of a O-Notation superior algorithm.

TL,DR; Knowing Big-O does not detach you from physics.

mannykannot · on March 26, 2019

This is an important consideration in practice, and I think it would be helpful if classes teaching O-notation would include an exercise that makes this point. It is also true, however, that O-notation considerations always dominate for large enough problems.

Anyone expecting the O-Notation to give the outcome of measured performance has made a category error, as it is never about run-time itself, it is about the change of run-time with problem size (and only in the asymptopic limit).

hackerpacker · on March 26, 2019

Nor the realities of various languages. I've seen hashes break down exponentially with "big data". (inb4 "then it wasn't a 'true' hash). There are countless variables beyond "+ x"

If it is important, then you test/profile it. Otherwise it isn't important, and this is mostly theory crafting.

andreareina · on March 26, 2019

And a lot of real-world implementations will use the asymptotically-inferior algorithm when the (sub)problem is small, for exactly this reason.

mixmastamyk · on March 26, 2019

Good information for a part II blog post. Believe it makes sense to leave it out of the first.

davismwfl · on March 26, 2019

That was a good write up, and it will help people. I come from the self taught side, and find that it is critical to know these things when you work in certain areas of a code base and once you reach a certain level. But I don't expect everyone needs to have the same level of understanding.

I also find interviewers asking detailed questions about Big O and specific different algorithms just idiotic, especially the esoteric ones I've been asked in the past and/or heard asked. When I interview someone, all I want to know is that you know there is a difference and know to look when it matters.

My method for learning if a candidate understands this all is to get people to choose data structures based upon different problem sets I will give them and then ask the pros and cons. The more questions they ask about usage usually the more they understand in my experience. And this is just way more informative as a hiring manager as you will learn more if they understand what they are saying, versus if they can regurgitate something they memorized from a book or website.

labster · on March 26, 2019

Having for the most part avoided the computer science department during my earth science education, this is helpful.

I have sort of an intrinsic idea of what's going on -- self-joins can easily compare a data set with every other member of the data set, and hit O(N²). And that if your N is 5, the DB will probably going spend more time parsing SQL than executing your query, so no point in optimizing. So I don't understand or care about what the definition is, but it's useful tool to think about how slow your code can be.

gpm · on March 26, 2019

This seems like a good time to bring up one of my pet peeves about big O notation.

Every theoretical model I've ever seen says that indexing into n bits of memory takes O(1) time. That's obviously impossible:

- The pointer you need to read is log(n) bits.

- The physical memory is at best O(n^(1/3)) distance away from the CPU, and thus takes that much time to get back to you. In reality it's probably O(n^(1/2)) because we build on flat planes (once you start talking about petabytes of data anyways).

Maybe this doesn't matter in practice, because the constants associated with these are small enough (but are they, how many memory/disk bound applications are there? How much extra performance could we squeeze out by using 16/32 bit pointers for small arrays?). It certainly doesn't matter in theory where things work like we say they do regardless of the physical reality. But it annoys me that it's so obviously wrong.

anon946 · on March 26, 2019

Well, it's "wrong", but there's the saying that all models are wrong, but some are useful. If time complexity analysis had to include physical distance, then it would perhaps be less wrong, but also much less useful.

That said, "time complexity" is a bit of a misnomer anyway. It's often implied, but underneath, there is a specific operation. For example, when we say that merge sort is O(N lg N), we mean in number of comparison operations.

deathanatos · on March 26, 2019

There was a decent article exploring the timing between memory accesses, and when they started taking longer b/c you hit L2 cache, L3 cache, main RAM, swap, etc., and eventually showed how if you really graphed the access times, the overarching "curve" they followed was square/cube root¹; that is, real world hardware obeys the laws of physics.

My googling dug up [1], but I'm not sure if that's it or not.

My university schooling was pretty straightforward about memory access not being O(1). We also went over adding two numbers together (also not O(1)), but these are just generally elided for brevity's sake; I don't think most of us want to start worrying/optimizing for CPU caches prior to it being a thing that perf testing says we need to worry about.

> 16/32 bit pointers for small arrays?

It's not about the size of the array, it's about the ___location in memory. If you're storing an index, then sure, knock yourself out.

[1]: http://www.ilikebigbits.com/2014_04_21_myth_of_ram_1.html

¹I'm a software engineer, not a physicist.

mwfunk · on March 26, 2019

Part of it is that there's very casual, conversational usages of Big O vs. much more formal usages when trying to actually quantify performance.

Conversational Big O tends to include a lot of spherical cows (https://en.wikipedia.org/wiki/Spherical_cow), because when Big O comes up in conversation, people are more likely to be talking in generalities about high level designs. More detail could be counterproductive in that context.

When someone is actually trying to quantify and predict performance of an algorithm, then the other extreme becomes desirable- the more detailed and specific the function, the better.

gpm · on March 26, 2019

Ya, I'm directly addressing non-conversational use of Big O notation here, including the areas of academia I'm familiar with and know people in.

jhanschoo · on March 26, 2019

In industry there is indeed a hand-waving of the size of addressing. IMO for industry this is not too important since a lg n multiplicative factor is peanuts.

With respect to the space problem, for some problems researchers don't care about it so much because it only means that it is a polynomial factor slower. For problems where the degree of the polynomial is important, theorists typically work with variants of Turing machines and need to specify which variant they are using, which typically do not allow random access, so this does not become important. Of course, random-access Turing machines are also studied that violate the physical constraint you mentioned.

umanwizard · on March 26, 2019

Actually the size of indices can be important in real life. For example, on Nvidia GPUs, a kernel that indexes data with 64-bit indices will usually be much slower than one that uses 32-bit indices.

(Hard to argue that this is related to big-O since there are only two data points, but I'm just pointing out that index size is in fact a real-world concern).

giucal · on March 27, 2019

I think they don't matter because:

1. they are all fixed or bounded quantities;

2. all algorithms are subject to them.

On the other hand, in practice, the input size is also bounded. You deal not with arbitrarily long arrays, for example, since memory is not infinite.

So, I think, asymptotic time complexity is meaningful as long as the inputs we are considering can grow so much -- while remaining bounded -- that a linearithmic algorithm indeed outperforms, say, a quadratic one for a large and relevant class of inputs.

And that may be why computational models make those assumptions; but I'm not remotely sure.

briankelly · on March 26, 2019

There's a nice series of articles on this here: http://www.ilikebigbits.com/2014_04_21_myth_of_ram_1.html

gpm · on March 26, 2019

Oh cool, hadn't seen that!

heinrich5991 · on March 26, 2019

In academia, you usually see O(log n) for addressing. Since it doesn't really seem to matter for real-world performance, we just assume it's O(1) in practice.

gpm · on March 26, 2019

You're probably seeing a different part of academia than I am. The portions of CS academia I'm familiar with don't do O(log n) addressing.

derekp7 · on March 26, 2019

My experience, as someone who isn't a professional full-time developer (systems engineer / devops focused), but who personally geeks out on CS theory, is I find it fairly rare for developers that I work with that think of Big O considerations when writing their code.

A prime example, was a piece of code that has to process incoming results that get stuck in a transnational database table. Instead of selecting, then processing, the records that the module cares about, it does the selection (which typically returns 1 - 10 records), sorts them, processes the record from the top of the list, then throws the rest of the initial select results away. It then goes back to the DB an does the same select, and repeats the process.

End result: if the queue gets backed up to 100, or 1000 records, the process never catches up. You have to temporarily change the status of the inbound records to something else, then put maybe 10 at a time back to pending.

There are countless other times I've seen similar types of issues where on small data sets the code runs fine, but the performance degrades to n^2, or even worse is when it degrades in a factorial manor. And the worst of it is that if I talk to the developers using common CS notation (such as, your algorithm should have logarithmic or at worse linear degradation as the input queue grows, and it is behaving exponentially), their eyes glaze over.

eindiran · on March 26, 2019

That's not my experience as a software engineer at all. In fact, I would say that I see people worrying about writing algorithms that are asymptotically optimal (well before that should ever be a consideration) 10x as often as I see people failing to consider the efficiency of the algorithms they write.

derekp7 · on March 26, 2019

Do you think this is due to a difference between working at a place that has software requiring a specific non-computer related skill set (such as medical diagnostics software), vs being at a place that hires primarily CS grads for regular software? Other developers that I've worked around at previous jobs were in manufacturing (Informix 4GL devs), and an IT services company, among others. The one common thread I've seen though is they tend to be impressed by what I can do in C, for example, or helping them get to the root of an issue they have (I will typically reverse engineer what they are doing from the outside looking in, using things like strace, then writing up an analysis of what I see from the systems side).

Another recent example is a case where the app needed to start a number of sub-processes. So the dev used an equiv of a system() call, which spawned a bash shell, to do a ps -ef |grep process_name |grep -v grep |grep -v vi |grep -v vim |grep -v less, to see if the process was running, then doing another system() call to start the sub process. The whole string would be repeated for each sub process, and then they wondered why it took 8 minutes to start all their sub interfaces when they deployed to a larger customer that needed 700 interfaces started (instead of the typical 20 - 50). Oh, and after they would start an interface, the code would do the "ps..." commands on all the other interfaces to update an internal table on their status.

eindiran · on March 26, 2019

That might very well be there difference. My coworkers are almost exclusively people who studied CS, and several wrote C professionally at some point.

mixmastamyk · on March 26, 2019

I don't think one needs to be a CS grad to recognize such a poor design.

Retric · on March 26, 2019

The example you gave is not really a Big O issue. It's the extra DB calls adding a huge constant that's the problem not the underlying algorithm. O(x) can be horrible when the constant is 1/10th of a second each record.

This may be a case of the developers not understanding your ORM tooling. They get something that works, and don't quite what's going on under the hood.

codingdave · on March 26, 2019

> their eyes glaze over.

Sounds like a communication problem to me. Non-academic developers might not know "Big O notation". But they absolutely know that nested loops are way less efficient than single loops. They know that some code performs better than other code. And while they might be writing inefficient code, that is an opportunity to talk to them and help them improve.

If you come at them with academic terms (right though you may be), and their eyes glaze over, try talking about the actual change you are seeking. Especially in the database world, as less experienced developers might not know that database performance can be tuned by how you use it. We all were new once. So educate them. Kindly.

imtringued · on March 26, 2019

It doesn't sound like teaching Big O Notation would help your coworkers write faster software. You would be better off by teaching them how to use databases properly.

deathanatos · on March 26, 2019

> I find it fairly rare for developers that I work with that think of Big O considerations when writing their code.

Every time I choose between a set and a vector in code, I'm essentially drawing on my knowledge of their big-O properties to choose the right one for whatever task I have at hand.

I wouldn't say that's the only piece of CS theory I use, either. I routinely run across problems that are best modeled as DAGs (any dependency relationship between items boils down to this quickly). I use DB theory any time I design a database table. I draw on language theory to write parsers.

Most of it is reflexive, I think, at this point, but it doesn't change that a "vector" is the right structure b/c I don't need O(1) lookup, or I'd like indexing.

roel_v · on March 26, 2019

" is I find it fairly rare for developers that I work with that think of Big O considerations when writing their code."

Do you mean that as in 'people I work with don't care about this, therefore it's not necessary' or as in 'people I work with are clowns who don't even know foundations'? Because (I think) I'm seeing replies to your post interpreting it both ways.

derekp7 · on March 26, 2019

I wouldn't use the word clowns, because they are really good at other areas. In this case, the developers focus on a development environment that isn't widely used in the US, and I believe that they haven't had a formal CS education (more of the same type of self taught that I am).

roel_v · on March 26, 2019

"I wouldn't use the word clowns"

Nah it was just an expression from my part, didn't mean to say that anyone not thinking about algorithmic complexity when writing any piece of software is a hack :)

p1necone · on March 26, 2019

I don't think knowing about big O notation (or really much cs theory) is a requirement for reasoning about this particular case. "network/sql calls are slow" and "don't do the same work multiple times when it isn't necessary" would be enough.

chrisseaton · on March 26, 2019

> I find it fairly rare for developers that I work with that think of Big O considerations when writing their code

Do you not hear people talk about 'accidentally quadratic' quite a bit? I hear practising developers talk about that a lot. They seem to have a good idea of what to avoid.

pvg · on March 26, 2019

'Accidentally quadratic' happens and is talked about precisely because much of the time, there's not much reason to think about Big O in many practical software development situations.

User23 · on March 26, 2019

One of the things I like about the C++ STL is once you learn the idioms, the API tells you whether an operation is "fast." I don't really care if a line programmer can give me the formal definition of O(x), but I sure want him to know if an operation is appropriate to call in a loop.

That's speaking as a professional. As someone who finds CS fascinating, I'm appalled by the lack of interest many professional developers have in what is one of the most rigorous and elegant fields of mathematical inquiry. Even so I don't expect others to care about the things I do.

sidlls · on March 26, 2019

Why are you appalled? As a physicist I'm not appalled by the lack of interest most engineers have in fundamental theory that I know inside out. The reason is precisely because I understand there's a difference between practical applications and theoretical inquiry.

ykevinator · on March 26, 2019

It's a great post and everyone stop flexing. Learn basic complexity if you don't know it, it has huge dividends.

bogomipz · on March 26, 2019

>"My rule of thumb here is that if I have to use a loop within a loop, that’s O(n^2)."

This might potentially do someone a disservice. The whole point of analysis is just that - to "analyze" the runtime and not resort to "rules of thumb."

The presence of nested for loops itself is not a good reason to declare something quadratic. Consider the following which is most certainly not O(N^2):

for (int i = 0; i < 1000; i++)

  for (int j = n; j >= 1; j = j / 2)

    // do something

EDIT: initialized j to the val of n on the inner loop per the comments below.

gridspy · on March 26, 2019

For those who missed it, the absence of "n" in these loops makes "do something" happen a constant number of times. So as written by the OP the loops are currently O(1)

if for (int i = 0; i < 1000; i++) were instead

for (int i = 0; i < n; i++) - You'd have O(n)

If also for (int j = 1000; j >= 1; j = j / 2) were instead

for (int j = n; j >= 1; j = j / 2) - Then you'd have O(n log n)

OR for (int j = n; j >= 1; j -= 1) - O(n^2)

OR for (int j = n * 1000; j >= 1; j -= 1) - this is STILL O(n^2)

anon946 · on March 26, 2019

Well, as written, it is 'O(1)'. Or did you mean to put N instead of 1000?

bogomipz · on March 26, 2019

The inner loop is log(N), it halves the value of J after every iteration. The "do something" was meant to stand for a constant time operation so that it's N log N. But my point was really that if you told an interviewer that it was O(N^2) simply because it contained nested for loops it would be a very clear sign that you didn't fully grasp Big O.

vecter · on March 26, 2019

It's a subtle point, but anon946 was pointing out that since your loops are fixed at 1000, this entire thing is O(1) (granted, with a large constant). You should change all of the "1000" to "n". That would make it O(n lg n).

edit: you've now edited the inner loop to be n, but the outer loop is still 1000. This is still wrong, as the entire algorithm is now just O(lg n).

enedil · on March 26, 2019

Nah, every algorithm that is O(N logN) is also O(n^2). That would be a sign, that you as an interviewer didn't fully grasp Big O.

bogomipz · on March 26, 2019

Nah "f(n) is said to be in O( g(n) )" formally speaking but practically speaking an algorithm in some production code that runs in O(n log n) vs one that runs in O(n^2) are not the same thing at all.

That would be a sign that you as a candidate are being needlessly pedantic and usually a red flag.

enedil · on March 26, 2019

You're confusing O with Theta.

bogomipz · on March 26, 2019

No I am most certainly not. Nowhere did anything I said in the comment above indicated I was referring to both a lower AND upper bound which is Big Theta.

Honestly it sounds as if you might not understand the difference between Theta and O. I understand that a function that is O(N^2) grows no faster than O(N^3) asymptotically speaking however the intention of my original comment and example is quite clear about their context.

Please stop, you are adding nothing to the discussion.

vecter · on March 26, 2019

You most certainly are confusing O with Theta. O is just an asymptotic upper bound. Of course a function that’s bounded by n lg n is also bounded by n^2.

bogomipz · on March 26, 2019

No I am not. The other commenter inexplicably and needlessly decided to introduce Theta and assert that I was confused.

Maybe you might reread my original comment. This whole side discussion is of no value to my original comment which has a very clear and very narrow context.

I am not sure why the both of you want to belabor some ancillary talking point that you yourselves decided to introduce. It is of exactly no value to the context of my original comment and not in the least bit productive.

robconery · on March 26, 2019

OP here - O(N log N) is not O(n^2). If n is 1000 then log n is 10, which is 1000 * 10 which is 10,000. That's a bit less than 1000 * 1000.

vecter · on March 26, 2019

You're wrong. n lg n grows asymptotically slower than n^2, and is thus is O(n^2). n is also O(n^2), as is 1, or even sin(n). All of this follows directly from the definition [0].

Also the fact that you're plugging numbers in for n indicates to me that you don't actually understand big O notation, which is surprising since the original article is decent.

[0] https://en.wikipedia.org/wiki/Big_O_notation#Formal_definiti...

anon946 · on March 26, 2019

Yes, but my point is that, as written, there is no dependency on N. Thus, O(1).

robconery · on March 26, 2019

OP here. Agree that thinking about code is better than using a rule of thumb, but we need to start somewhere don't we? I tried to make it clear in the post that looping over n items within an n loop is n * n.

Grue3 · on March 26, 2019

Big-O is an upper bound. Something that's O(n) is also O(n^2).

mbell · on March 26, 2019

> I don’t want to turn this into a Redis commercial, but I will say that it (and systems like it) have a lot to offer when you start thinking about things in terms of time complexity, which you should! It’s not premature optimization to think about Big O upfront, it’s programming and I don’t mean to sound snotty about that! If you can clip an O(n) operation down to O(log n) then you should, don’t you think?

The thing is, I don't care about 'time complexity', I care about performance. Big O can serve as a useful datapoint for what the performance may be, but it's only that, one data point. e.g. It's not uncommon to find that a 'worse' algorithm/data structure in Big O terms will out perform a 'better' one because the 'worse' one has better cache locality. So no I don't think you should use an O(log n) operation in place of an O(n) operation just because of Big O, what matters is which one is faster.

robconery · on March 26, 2019

OP here - Big O notation is simply shorthand math. When you're discussing things in this way, time complexity and performance are the same thing. When you care about resource usage (memory etc) that's space complexity, which is different. Either way, they're good things to understand.

>So no I don't think you should use an O(log n) operation in place of an O(n) operation just because of Big O, what matters is which one is faster.

Mathematically the log n is always faster :). Realistically... well that would be a tough one to prove, even with caching, but I say go for it.

mbell · on March 26, 2019

> time complexity and performance are the same thing

By definition, they are not.

> Mathematically the log n is always faster :)

No, it's not. Time complexity only gives you an asymptotic bound on the number of 'operations', it tells you nothing about what the actual run time will be.

robconery · on March 26, 2019

I believe we're talking past each other. Big O has nothing to do with "actual run time*. It doesn't care what about the number of inputs you have - just that you have them.

Mathematically, if n=1000 then log n is 10. 10 operations vs. 1000 is, theoretically and time complexity wise, faster.

Our disconnect is "actual" vs. "theoretical" and I want to stress again that Big O is purely theoretical. It's a technical adjective.

js2 · on March 26, 2019

Formal definition from my ancient copy of Sedgewick:

A function g(N) is said to be O(f(N)) if there exist constants c₀ and n₀ such that g(N) is less than c₀f(N) for all N > N₀.

Continuing: "Informally, this encapsulates the notion of “is proportional to” and frees the analyst from considering the details of particular machine characteristics. Furthermore, the statement that the running time of an algorithm is O(f(N)) is independent of the algorithm's input. Since we're interested in studying the algorithm, not the input or the implementation, the O-notation is a useful way to state upper bounds on running time which are independent of both inputs and implementation details."

I'll also repeat this bit from the end of the analysis chapter:

Perspective:

Many of the algorithms in this book have been subjected to detailed mathematical analysis and performance studies far too complex to be discussed here. Indeed, it is on the basis of such studies that we are able to recommend many of the algorithms we discuss.

Not all algorithms are worthy of such intense scrutiny; indeed during the design process, it is preferable to work with approximate performance indicators to guide the design process without extraneous detail. As the design becomes more refined, so must the analysis, and more sophisticated mathematical tools need to be applied. Often, the design process leads to detailed complexity studies that lead to "theoretical" algorithms rather far from any particular application. It is a common mistake to assume that rough analyses from complexity studies will translate immediately to efficient practical algorithms: this often leads to unpleasant surprises. On the other hand, computational complexity is a powerful tool for suggesting departures in design upon which important new methods can be based.

One should not use an algorithm without some indication of how it will perform. [...]

yxmayt · on March 26, 2019

A precise way of being imprecise.

User23 · on March 26, 2019

The formal definition is a fairly gentle introduction to logical math. Here is a decent enough write-up: https://justin.abrah.ms/computer-science/understanding-big-o...

toadi · on March 26, 2019

I mostly don't think about it too much. If I have a piece of code that seems slow I make it less complicated and mostly look up the big-O. Reason is that I only had a few time in my 20 year career that I really had issues with this.

mixmastamyk · on March 26, 2019

True, after a few years of experience you recognize pitfalls to avoid. I remember figuring this out on my own as a kid writing a chat client, years before learning O terminology. As the number of clients reached ten performance dropped and quickly realized I needed to handle the situation more efficiently and did.

Doesn't provide the smug satisfaction of using the academic notation/jargon given in tech interviews and hn threads however.

auraham · on March 28, 2019

I really like the post, I think it is very intuitive. However, it was a bit confusing at the end. Consider these scenarios: (1) looking for an item in you cart using a hash, and (2) looking for an item in the whole databse using a hash. From the article, I assume that (1) is O(1) because of the hash, but for (2) it is O(log n). Maybe I am confusing a hash and an index. If so, what is the difference?

gaze · on March 26, 2019

Oh ffs another article about Big-O without the actual definition anywhere in it. Put it at the end after motivating it, put it at the beginning and then explain it, I don’t care. Just actually give it at some point.

ratling · on March 26, 2019

This is legitimately a good post.