How I explained MapReduce to my Wife?

yummyfajitas · on Aug 5, 2011

The explanation I gave to my mom:

We want to count all the books in the library. You count up shelf #1, I count up shelf #2. That's map. The more people we get, the faster it goes.

Now we get together and add our individual counts. That's reduce.

[edit: created a blog post containing this explanation. Maybe it will be useful after this HN conversation vanishes. http://crazybear.posterous.com/mapreduce-explained-in-41-wor... ]

wisty · on Aug 5, 2011

That's the first iteration of the explanation.

Next iteration: But what if you want to count up non-fiction and fiction? There's just a slightly different "reduce" step, because people have to add up both of them. You could also break it down by subject, or author, or pretty much anything else.

Next iteration: Of course, you don't have to just add. You can do anything that has mathematical properties similar to addition (my mom teaches math at high-school). You could add up the set of all the offense words found, since addition and set addition have similar mathematical properties. You couldn't take the average number of pages, because some people will find more books than others, and you wouldn't get a fair average. But you might be able to keep track of a couple of attributes that can be summed up, and use that to calculate the average after the job is done (for example, total number of pages, and total number of books).

dcosson · on Aug 6, 2011

Now if only everyone's mom understood the concept of a linear set operation... :)

A good example that would also help someone appreciate the power of map/reduce might be something like "how many times does each letter of the alphabet appear in the library?" For the map step I take shelf 1 and you take shelf 2, but now for each book we'll have to write down something like "a:300,b:50,c:90,..." and so on. But now there are more things to "reduce" than there are books (since we now have a value for the number of occurrences of each letter in the alphabet for every book). Well, there's nothing stopping us from splitting up the reduce step as well - we could split it up by shelf again (where you add the totals for each letter on your shelf and I'll do mine, then we'll do one final reduce of at the end), or you can add up letters a-m and I'll do n-z.

th0ma5 · on Aug 5, 2011

I remember a Stanford lecture video that said "well, if we called it Map, Sort, Group-reduce then no-one would use it :P" but that's a minor naming issue to describe just the subtly you are exactly describing, which is to just not be entirely random in the specifics of how the MapReduce is executed.

lgarron · on Aug 5, 2011

Why wouldn't you be able to get the average? Return page count + total pages for each shelf and then take the ratio of "reduced" sums.

michael_dorfman · on Aug 5, 2011

Re-read the comment. It discusses exactly what you are proposing.

lgarron · on Aug 7, 2011

Oh, I see. I was struck by the direct phrasing of "You couldn't take the average number of pages", which made me parse the rest of the comment incorrectly.

sambeau · on Aug 5, 2011

I have used a similar explanation using fields of wheat, combine harvesters and grain trucks.

The combine harvesters are performing a Map - they do the heavy work and actually modify the contents of the fields. The more combines you have the quicker the job gets done.

The reduce step is the trucks that collect the grain and combine it into one pile (back at the silo).

I like this analogy as it shows that some natural jobs are linearly divisible as they need no synchronisation between workers while others are more complicated due to needing to combine results.

Each combine has no need to synchronise with the others - it can be told which fields to harvest and be left to do it.

During the grain collection you can have many trucks but it soon becomes clear that there are loads of ways to organise them depending on the size of the trucks, fields and the distance to the depot. Trucks could combine loads in the fields before returning to the grain silo. There is always a danger that one truck might be left waiting for another.

This analogy also allows us to discuss whether we need to wait for all grain to be harvested before we send out the trucks and start to gather it in and whether we need the same number of trucks as combine harvesters.

Cushman · on Aug 5, 2011

MapReduce in ten seconds or less. Impressive.

kalid · on Aug 5, 2011

I love clear, succinct explanations like this. Overly complex ("rigorous") definitions are one reason we often miss the big picture. Kudos.

juliano_q · on Aug 5, 2011

The example in the blog was good enough and made me understand it quite well, but you nailed it in two lines. Very nice man, thanks.

lambada · on Aug 5, 2011

I'm stunned.

I finally get MapReduce.

I'd looked at definitions but it had never really clicked.

Thankyou, and upvoted.

figital · on Aug 5, 2011

You also could have gone to the card catalog and measured the number of cards in feet with a tape measure. That's indexing your database.

lmarinho · on Aug 5, 2011

The example I came up with when trying to explain the concept was pretty similar, however I framed it in a problem of finding the total value in a huge amount of coins.

vegai · on Aug 5, 2011

"You count up shelf #1, I count up shelf #2."

Isn't that a reduce too, though?

masklinn · on Aug 5, 2011

Sure, any iterative process is implementable in terms of reduce so the mapping itself is a reduction, and each counting instance in the mapping is a reduction as well (unless you map on books rather than shelves, though that can be described as a reduction on a single-item set).

rogerbraun · on Aug 5, 2011

I think so, too, but it does not matter. The function you use on the elements in your list when doing a map can be anything, so it may even be something like counting up.

yummyfajitas · on Aug 5, 2011

In my analogy, I was assuming a shelf is a single record.

evilswan · on Aug 5, 2011

This is even better.

thebootstrapper · on Aug 5, 2011

Impressive.

Qa8BBatwHxK8Pu · on Aug 5, 2011

i still don't get it. it'd been clear if it was something like "we have lot of processing to handle. we share them among computers to get the job done."

bmurphy · on Aug 5, 2011

Nice. Much better.

pyrhho · on Aug 5, 2011

I wish I could upvote you more...

Uchikoma · on Aug 5, 2011

Am I the only one who thinks this is bad explanation?

nandemo · on Aug 5, 2011

yummyfajitas' explanation is much better.

Uchikoma · on Aug 5, 2011

Agreed.

_l0bi · on Aug 5, 2011

The reduce stage explanation is rather poor. Reduce aggregates based on key and does not just mash everything together like the proverbial grinder.

freakwit · on Aug 5, 2011

Like all good lessons, this one starts with a little lie and tells you the truth later on...

"Supriya : But how I will create different types of Chutneys?

Shekhar : Now you will see the missing phase of MapReduce — Shuffle phase. MapReduce will group all the outputs written by every Map based on the key. This will be automatically done for you. You can assume key as just a name of ingredient like Onion. So all the onion keys will be grouped together and will be transferred to a grinder which will just process onions. So, you will get onion Chutney. Similarly all the tomatoes will be transferred to the grinder marked for tomato and will produce tomato Chutney."

granitepail · on Aug 5, 2011

You see, women love cooking. The article is kind of hilarious in a chauvinistic sense, but it _does_ do a decent job of describing MapReduce. The only problem I see with simple overviews like this is that they leave out the fact that, often times, the reduce step is far more complicated than adding numbers together. This explanation pretty much applies for any basic framework for parallelism.

On a side note, I was surprised when MapReduce made such waves, as it's nothing _new_, per se. I still can't believe Google was granted a patent for it. I wrote "MapReduce"-esque algorithms in early high school when I made a distributed system to compute Pi. All it really does is provide a framework for creating, dispatching and joining jobs.

Uchikoma · on Aug 5, 2011

At home I do the cooking as I enjoy cooking much more than my wife. Indeed I do love it.

granitepail · on Aug 8, 2011

Yeah, I was being sarcastic. I get irked when gender roles sneak their way into things like this.

awaz · on Aug 5, 2011

Me second

pferreir · on Aug 5, 2011

Third.

yaroslavvb · on Aug 5, 2011

Suppose you want to count number of books for each author. Mappers take books from shelves and bring them to shufflers, shufflers make piles of books, one pile per author, reducers compute size of each pile

heyrhett · on Aug 5, 2011

I'm seeing this as a thinly veiled brag about how your wife makes an awesome mint chutney. :)

lurker19 · on Aug 5, 2011

The most valuable idea in that article is to to ask for permision twice before launching into an overcomplicated bad analogy to explain something your wife doesn't really care about.

There is no value in explaining a solution to someone who does not yet care about the problem.

In this case "what is mapreduce?" is looking just for an explanation of why these words are being used without their common meaning, not the whole algorithm.

clu3 · on Aug 5, 2011

For developers, this paper contains a nice diagram that explains MapReduce better than any examples I've found http://labs.google.com/papers/mapreduce.html

namityadav · on Aug 5, 2011

I think the analogy was "meh" at best. For those of us who have any technical background, the wikipedia explanation is not bad:

"Map" step: The master node takes the input, partitions it up into smaller sub-problems, and distributes those to worker nodes. A worker node may do this again in turn, leading to a multi-level tree structure. The worker node processes that smaller problem, and passes the answer back to its master node.

"Reduce" step: The master node then takes the answers to all the sub-problems and combines them in some way to get the output – the answer to the problem it was originally trying to solve.

seagaia · on Aug 5, 2011

This is nice. It's always been tough to explain quickly to friends what MapReduce does and how it extends to things like Hadoop, without extending to the abstractions with keys and values and the like.

nithyad · on Aug 5, 2011

Like yummyfajitas's explanation. The original post is too long and convoluted...

wil2k · on Aug 14, 2011

I use an explanation using elections. Where cities, counties, states etc. (what applies in your $country) count and aggregate their values upwards until it's clear who has won on a national level. :)

sandieman · on Aug 5, 2011

Am more surprised your wife cares about technical terms. My wife would never be this interested along the lines of "don't bring work home".

Apocryphon · on Aug 5, 2011

Why is the rating only 2.5 stars? Is there something wrong with the explanation?

civild · on Aug 5, 2011

Has it been deleted? Just get a page not found error, can't see it on the homepage either.

JumpCrisscross · on Aug 5, 2011

Link broken

killa_bee · on Aug 5, 2011

sexist title (and the use of a COOKING METAPHOR for chrissakes). surely this is why there are so few women in computer science.

dionidium · on Aug 5, 2011

It's not notable that he was able to explain it to his wife because she's a woman, it's notable because she's not in the field. This is really pretty obvious (and explicitly stated in the article).

pferreir · on Aug 5, 2011

Man... seriously... such a waste of characters.