My guess is that this stack is getting out-dated, or at least improved vastly as Google tries to do index updates in near real-time. The major problem with map-reduce is that the results aren't in real time (e.g. you collect data and then you run analytics on this data), running the analytics can take a lot of time depending on the data set and the hardware in play. Calculating the PageRank is done via map-reduce and this task can take an awful long time, since the data set is huge. This has resulted in slow index updates. I don't know how Google has solved this problem, my guess is that they have thrown an awful amount of hardware at the problem or have improved their stack.
What's a better way to do it? I think it's creating an algorithm that can be updated in real time and where you don't have to re-calculate the rank for every page on each update. Such an algorithm would require a very different stack than Google currently uses and my guess that their architecture will move into this direction as they try to make their search real-time (which from what I have read and experienced they are trying to do).
It's great that people are looking at Google's tech as a source of teaching material. But other big companies have equally interesting tech, some have been around longer...yet it can be very hard to learn anything about what they do...
What's a better way to do it? I think it's creating an algorithm that can be updated in real time and where you don't have to re-calculate the rank for every page on each update. Such an algorithm would require a very different stack than Google currently uses and my guess that their architecture will move into this direction as they try to make their search real-time (which from what I have read and experienced they are trying to do).