How Facebook keeps 100 petabytes of Hadoop data online

mpim · on June 13, 2012

Probably better to link to the original article instead of a summary: https://www.facebook.com/notes/facebook-engineering/n/101508...

ajays · on June 13, 2012

It's not 100PB of data; it's 100PB of physical disk space. Given the (standard) 3x replication and filesystem overhead, you're looking at about 30PB of data. Certainly a ginormous amount of data, that's for sure.

dude_abides · on June 13, 2012

If only the secondary name node did what one would assume a secondary name node is supposed to do, Facebook wouldn't have needed to invent avatar nodes, and even Hitler wouldn't have gotten so upset and fired that intern: http://www.youtube.com/watch?v=hEqQMLSXQlY

matan_a · on June 13, 2012

This doesn't actually discuss how Facebook manages and maintains their 100PB of storage, rather just talks a bit about the namenode SPoF issue. Disappointing article for sure.

halayli · on June 13, 2012

This gigaom article is horrible.