I see so many cool visualizations built with D3.js these days. All the libraries I used to build visualizations in the past were very limited and using them involved many tradeoffs, but D3.js seems just about limitless in what it can do. I can't wait to be able to give it a try. Too bad my company standardized on visualization toolkits some time before D3.js became a viable option.
My finding with D3 is that it chooses to give you power at the expense of ease. So you can do almost anything, but very little is simple. For most of it, you pretty much have to make all the shapes yourself with SVG, then use D3 to tie their dimensions and ___location to the data. So you have to know SVG from the get go. That's not a problem. It's a little weird with what it considers CSS stying vs node properties, but is otherwise straightforward enough. As long as you go into learning D3 with this in mind, and don't come at it as equivalent to other user friendly graphing libraries, it's amazing.
A year or two, I would have agreed with you. However, I've been working on a new project in D3 over the past month and I've realized that it now has so many mature, built-in visualization layouts that almost anything you'd want to do can be done out of the box with a little bit of tweaking. See, for example, the layouts in https://github.com/mbostock/d3/wiki/Layouts (which include a pretty tree layout) as well as the axis/timeseries handling built into d3.scale/time/svg. Granted if you're doing a completely new visualization, you're going to have to do a lot of custom stuff, but there are a LOT of helpers built in these days.
The biggest learning hump for me was understanding the way data is bound to DOM elements, and how a mismatch between the data and the selected DOM elements is handled. I never really understood until I read Mike Bostock's post "Thinking With Joins", at which point I attained d3 enlightenment: http://bost.ocks.org/mike/join/
How does it decide which word to use initially? From what I can tell it picks the first one. I think the experience would be greatly enhanced if it did just a little extra processing and picked the most used word first. Or if it showed a list of the most used words that was selectable.
Very cool use of transitions, one of the many fields where d3 really rocks.
However, I've always wondered if word trees are really usefull... Sure it makes nice things with the Luther King speech or extract from the Bible.
Any way to shift the text processing to the client? I'd like to use the bookmarklet on some academic papers (many behind a paywall) and the few I've tried only seem to parse the abstract...I assume this is because the text processing is happening server-side, but I could be wrong.
Alternatively, could you release your backend code as well? I'd like to run this on larger corpora.
It attempts to access URLs directly but this only works if the server sends the appropriate CORS headers (hardly ever).
Otherwise, it falls back to using a proxy, which means the client only sees what the proxy sees. However, you can also paste raw text on the main page.
I could imagine modifying the bookmarklet so it lifts the text directly from the browser instead of just copying the URL. This would solve the proxy issue neatly and would also work for local-only or intranet sites, for which the proxy also fails.
I tend to agree with Wattenberg and Viégas that it's interesting to treat all words and punctuation equally, but it's certainly a matter of opinion and it would be simple enough to tokenise the input data differently.
Yes, I saw that reference after posting it here. It makes sense. I also see it is understanding combination of 2 and 3 words together, very brilliant!
Recently I have dabbled into d3 and used your site for lot of inspiration. I created something very similar to analyze text from web pages but using bubbles
XMLHttpRequest cannot load http://www-958.ibm.com/software/data/cognos/manyeyes/datasets/alice-in-wonderland-by-lewis-carroll/versions/1.txt. Origin http://www.jasondavies.com is not allowed by Access-Control-Allow-Origin.
Failed to load resource: the server responded with a status of 504 (Gateway Time-out) http://www.jasondavies.com/xhr?url=http%3A%2F%2Fwww-958.ibm.com%2Fsoftware%2Fdata%2Fcognos%2Fmanyeyes%2Fdatasets%2Fobama-war%2Fversions%2F1.txt