Can I ask, do you really need Hadoop and "big data" for this? There have gotta be substantially fewer than 10k courts in the United States. What needs processed that SQL can't accommodate?
Meta-note: It may be wise to make a rule on what's appropriate to leave as a comment on these hiring posts. I can see some companies shying away if they feel like it's going to turn into a "critique my stack and/or hiring process" thing.
We're processing the opinions rather than the courts, so we're dealing with millions of documents. Since we're building a network of their citations, it winds up being way too much data to hold in memory on a single node, hence the need for Spark.
Meta-note: It may be wise to make a rule on what's appropriate to leave as a comment on these hiring posts. I can see some companies shying away if they feel like it's going to turn into a "critique my stack and/or hiring process" thing.