Course: The Google Technology Stack

amix · on Oct 3, 2009

My guess is that this stack is getting out-dated, or at least improved vastly as Google tries to do index updates in near real-time. The major problem with map-reduce is that the results aren't in real time (e.g. you collect data and then you run analytics on this data), running the analytics can take a lot of time depending on the data set and the hardware in play. Calculating the PageRank is done via map-reduce and this task can take an awful long time, since the data set is huge. This has resulted in slow index updates. I don't know how Google has solved this problem, my guess is that they have thrown an awful amount of hardware at the problem or have improved their stack.

What's a better way to do it? I think it's creating an algorithm that can be updated in real time and where you don't have to re-calculate the rank for every page on each update. Such an algorithm would require a very different stack than Google currently uses and my guess that their architecture will move into this direction as they try to make their search real-time (which from what I have read and experienced they are trying to do).

abecedarius · on Oct 3, 2009

For the PageRank part, I'd expect the previous solution to make a great seed to iteratively solve on the updated dataset.

I've seen some ideas about incremental mapreduce batted around and apparently implemented: http://www.google.com/search?q=incremental+mapreduce

elblanco · on Oct 5, 2009

It's great that people are looking at Google's tech as a source of teaching material. But other big companies have equally interesting tech, some have been around longer...yet it can be very hard to learn anything about what they do...

yarapavan · on Oct 3, 2009

The FriendFeed Room for the course: http://friendfeed.com/lecture-course-on-the-google-techno