Hacker News new | past | comments | ask | show | jobs | submit login

Can I ask, do you really need Hadoop and "big data" for this? There have gotta be substantially fewer than 10k courts in the United States. What needs processed that SQL can't accommodate?

Meta-note: It may be wise to make a rule on what's appropriate to leave as a comment on these hiring posts. I can see some companies shying away if they feel like it's going to turn into a "critique my stack and/or hiring process" thing.




We're processing the opinions rather than the courts, so we're dealing with millions of documents. Since we're building a network of their citations, it winds up being way too much data to hold in memory on a single node, hence the need for Spark.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: