The power of two random choices

atq2119 · on Feb 8, 2024

The underlying mathematical driver of the power of two random choices is a pretty amazing fact about the "n balls into n bins" random experiment.

If the balls are thrown independently and uniformly at random, the most loaded bin will have theta(log n) balls in it (almost certainly).

If you instead pick two bins uniformly and independently at random for each ball and throw the ball into the bin which has fewer balls in it so far, then the most loaded bin will have theta(loglog n) balls in it, i.e a massive asymptotic improvement.

mjb · on Feb 8, 2024

The balls-into-bins problem comes up often in distribute systems engineering, and so it's super important to have the right intuition. I think a lot of people come in to the space assuming that random balls-into-bins is much more even than it really is (and so underestimating the value of better-than-random placement).

Another interesting example is hash-based partitioning of data. It doesn't spread data out nearly as evenly as one might expect!

(I wrote a blog post about that too: https://brooker.co.za/blog/2018/01/01/balls-into-bins.html)

nyrikki · on Feb 8, 2024

I have found remembering that both the law of large numbers and the central limit theorem are dependent on the independent and identically distributed assumption.

As an example, many computing systems are Markovian, which is memoryless with an exponential distribution.

So the second part of I.I.D. doesn't hold. But may be close enough for your needs.

mjb · on Feb 8, 2024

However, in many real systems the aggregate behavior follows the asymptotic trend of the central limit theorem even though the underlying distributions may not fit the strict requirements.

nyrikki · on Feb 8, 2024

Yes, I always start with the IID, but try watch for evidence that those assumptions need to change.

Too many convenient properties are lost once you lose IID to prematurely drop it.

littlestymaar · on Feb 8, 2024

[flagged]

orlp · on Feb 8, 2024

On the very page you linked it is explained what the theta means. Ironic.

littlestymaar · on Feb 8, 2024

No need to be snarky.

I simply didn't know the notation and that's why I wanted to know if that was a mistake or intentional.

Dylan16807 · on Feb 8, 2024

> I simply didn't know the notation and that's why I wanted to know if that was a mistake or intentional.

The problem is that "aren't you confusing" comes across as very snarky. That's WTF is with the downvotes, and getting upset about those downvotes doesn't help.

littlestymaar · on Feb 8, 2024

> The problem is that "aren't you confusing" comes across as very snarky.

Does it really? My bad then, English being hard once again…

> and getting upset about those downvotes doesn't help.

It did actually, the comment isn't at record breaking level anymore ;)

Dylan16807 · on Feb 8, 2024

> Does it really? My bad then, English being hard once again…

So in a situation like this, "are you confusing" sounds somewhat presumptuous, but "aren't you confusing" sounds like you're quite sure you're right and they're wrong.

"are you maybe confusing" is probably the best way to use that phrasing, but better still would be "do you mean", so What's “θ” in your comment? Do you mean the letter “O” in “big O notation”[1]?

Though ideally when you're linking a page like that you ctrl+F it first.

orlp · on Feb 8, 2024

I didn't mean for it to be snarky, I just found it funny.

ericpauley · on Feb 8, 2024

Big Theta describes a both and upper and lower asymptotic bound.

littlestymaar · on Feb 8, 2024

TIL; Thanks!

OJFord · on Feb 8, 2024

It's usually what people mean when they use O, really, colloquially anyway. Theta's like you actually identified the right 'magnitude curve'; what big-O actually says is much weaker, it's only an upper bound. You can say something's O(n) when actually it's constant, etc. But generally we don't, if someone says casually it's O(n) they mean 'and also Omega(n)', i.e. it's Theta(n).

(Or at least, it is when talking about something specific - how does this function behave, say. If you talk about needing something to happen linearly then that's a fine use of O - of course you're also happy with constant time if linear is ok.)

O is short for Omicron btw, lest you think it's O and then we suddenly went Greek for no apparent reason!

lcnPylGDnU4H9OF · on Feb 8, 2024

I was so confused by this when I learned it after first hearing about big-O notation. Like why are we only talking about the upper bound? Isn't the lower bound at least as, if not more, important? (spoiler: yes) I'm glad you wrote this since it reinforces that understanding.

Dylan16807 · on Feb 8, 2024

> Isn't the lower bound at least as, if not more, important? (spoiler: yes)

I can't think of any situations where I care about the lower bound.

I mainly care about the upper bound and the average. Beyond that some percentile statistics are good.

And by "the" upper bound, I mean a tight upper bound for it to be useful. But you want a tight lower bound too, or it'll just be O(1) and that has no value at all.

And many algorithms have quick exit situations that give them a tiny lower bound and make big-theta invalid.

(Unless by "lower bound" you actually mean "lowest/tightest upper bound"?)

OJFord · on Feb 8, 2024

Yes what you're describing is theta - both an upper and lower bound (within constant factor).

Unless we're talking about 'we need to do this within O(...)', it is generally what we mean (as software engineers, not math/CS academics) even when we say O.

Because as you point out, it would be stupid to, say, file a bug report that some function is O(n^100) and too slow if it's also O(n^2) - both are correct (any time the latter is) but it's not really helpful to say the former. Theta effectively says you've found the smallest O.

Dylan16807 · on Feb 9, 2024

> Yes what you're describing is theta - both an upper and lower bound (within constant factor).

No, I'm not.

A tight upper bound is only sometimes the same thing as theta.

Insertion in a vector or a hash table is best case 1, average case 1, worst case n. Quicksort is average case nlogn, worst case n^2. Bubble sort is best case n, average case n^2, worst case n^2.

None of those algorithms are bound above and below by the same function.

When Theta exists, it does imply that the upper bound is tight. Which is the feature that's actually important. But Theta talks even more about the lower bound, which rarely matters. Theta is not what you're actually looking for the vast majority of the time.

I still can't think of a single situation where I care about "lower bound within constant factor". I care about average time, I care about upper bounds, occasionally I want an algorithm where the time is always exactly the same for a particular n. But not asymptotic lower bound.

Preventing smartass answers for "upper bound" is a good goal, but the way you do that is not by switching to Theta.

OJFord · on Feb 9, 2024

For this purpose, when you say 'best case'/'average case'/'worst case', those are three different functions. Or normally/in general we'd talk about worst case.

Zanni · on Feb 8, 2024

Nice animated visualization of this: https://twitter.com/GrantSlatton/status/1754912113246798036

koolba · on Feb 8, 2024

That’s neat. An easy concept to understand here is that with a pair to consider you will never target the worst node. At worst it’s the second most loaded node.

dahart · on Feb 8, 2024

The most interesting tidbit, I think, is cut off from the right side of the plot -- eventually with enough delay, 1 random choice is the winner, for exactly the same reason that 3 random choices wins for small delay, and 2 random choices wins for medium delay.

I kinda wonder now if the crossover points between best, 3, 2, and 1 are equidistant on a log-scale plot. If I squint a little, I could maybe see 4, 16, and 64 being the approximate crossover points, especially if there was a little less noise.

mjb · on Feb 8, 2024

Yes, it's super interesting when systems move into the regime where stale information becomes worse than no information at all. Part of what makes best-of-2 interesting is how long it resists falling into this pit, but it does eventually happen (as, I suspect, it must with any distributed placement/load balancing algorithm).

zellyn · on Feb 8, 2024

The most interesting question this has always raised for me is whether it implies you should always try two doctors/dentists/therapists/etc. before making a choice. Seems like it would be just as powerful “in real life“.

mjb · on Feb 8, 2024

There's a whole area of math in "optimal stopping" which aims at exactly this question: how many dentists should you try before you pick your favorite?

For example: https://en.wikipedia.org/wiki/Secretary_problem

vikingerik · on Feb 8, 2024

That problem is always sort of intuitively misleading. The success criterion is defined as picking the single best option from the field. But that's not how real life works. You don't have to pick the single best option for a job or spouse or dentist or whatever; you'll likely be satisfied with anything in the top ten or quintile or somewhere around there. It's interesting mathematically, but it's seldom really applicable to real life.

taeric · on Feb 8, 2024

Well, this strategy would only help you find ones that are not over loaded? And probably only works if you can change choices over time?

That said, sampling, in general, works surprisingly well in all things.

chrisweekly · on Feb 8, 2024

That reminds me of a great book -- "Algorithms to Live By" -- which examines precisely this kind of question. Useful, educational, and entertaining = highly recommended.

amluto · on Feb 8, 2024

> Using stale data for load balancing leads to a herd behavior, where requests will herd toward a previously quiet host for much longer than it takes to make that host very busy indeed.

This phenomenon is common with cars using traffic-aware navigation, too.

ckcheng · on Feb 8, 2024

Another place this technique was used and discussed recently:

Caches: LRU vs. Random (2014) (danluu.com)

https://news.ycombinator.com/item?id=39093109

Night_Thastus · on Feb 8, 2024

It feels subjectively like Caching and Load Balancing both belong to the same 'class' of problems at least at some level. I'm curious if there's a term that encompasses both.

deterministic · on Feb 11, 2024

> dedicated load balancers are often undesirable. They add cost, latency, complexity, and may introduce a single point of failure.

Not my experience at all. Centralised load balancer are simple, easy to maintain, and just works.

My rule of thumb is to always always always avoid distributed computations as much as possible. Trying to avoid a “single point of failure” always in my experience leads to more complex failure scenarios, making your system more likely to fail in hard to fix ways.

mkhnews · on Feb 8, 2024

I used this method for a cache eviction/replacement scheme. For our use case it was better than traditional lru.