more ryancox's comments

ryancox · on Aug 5, 2009

one thing to keep in mind: there is no distributed filesystem (HDFS in Hadoop's case). So depending on your workload, you may need to make a choice about which DFS to use along side Disco.

ryancox · on Jan 22, 2009

This is the algorithm used in Hadoop's record setting TeraSort benchmark:

http://perspectives.mvdirona.com/2008/07/08/HadoopWinsTeraSo...

http://svn.apache.org/viewvc/hadoop/core/trunk/src/examples/...

ryancox · on June 13, 2008

A couple suggestions for importing:

1) I would recommend creating a scratch account to test the import with. You can always nuke that account and retry the import process.

2) You will want to include more fields than the sample does. Definitely date. Maybe things like tags; though their tag support for reading is pretty limited at present.

jonnytran · on June 13, 2008

Good idea. It imports tags/labels. But perhaps I'll add tagging everything that was imported w/ something new.

ryancox · on June 13, 2008

FWIW: I've also recently ported to Tumblr. The Python code I cooked up for the import grew into:

http://code.google.com/p/python-tumblr/

jonnytran · on June 13, 2008

Cool. Yours looks more like something with long-term developer value. Mine was just a quick tool designed to be used once by people who don't want to code.