I used the Safari library subscription for several years while I was in school. While it was really handy to have all those books available within several clicks, it was slow, and their search needed help. If you could index their library on a paragraph level, I'd be willing to subscribe to their service again.
I for one would like to see more doc on the bang notation---I've been using DGo for a couple months now (I think I first heard about it on HN) and this was the first I'd even heard of the bang stuff, much less knowing which ones are available!
The arrow key nav reminds me of http://keyboardr.com/. It would be good if you made the page higher density with content on the left and right as well. Good stuff though!
Zero-click info for error messages would be a great resource. There are already some sites that do this for specific kinds of error (e.g. ora-code.com for Oracle errors) but a general "put in this error and I´ll tell you what it means" site would be fantastic.
If I were to add up all the searches I do in a day, then searches for programming advice (e.g. how do I fix X, what's the syntax for Y) would outnumber all my other searches put together, so I definitely think there's a bit opportunity here.
I think it'd be cool if you had a setting to make DDG look roughly identical to Google. It's hard to judge it fairly when I'm so biased in favor of Google, and DDG is constantly reminding me that it's not Google.
I'm curious: what do you mean by "casual research"? Do you see Duck Duck Go as the free alternative for individuals who don't have a spare $10k lying around for a subscription to LexisNexis, ProQuest, etc.?
I meant either a) you quickly want to know what something is; or b) you want to sort of a explore a topic and its related topics. Zero-click info solves a and that + related topics/category pages helps greatly with b. A large % of queries fall into these two buckets.
I use solr for the Wikipedia paragraph index. Let me know if you want to know any more about my setup. Also, if it turns out well, I'd love to add this source to DDG :)
I'd like to know how you do precision/recall testing. I've spent a lot of time doing pairwise testing of SOLR, sphinx and ferret (haven't tried Xapian), checking which hits one indexer misses that the other gets, stopwords (increments of 0, 50, 100, 175), combinations of token separators, etc. I don't really have to worry about UTF8 to ascii conversion with my data.
I haven't really gotten a rigorous methodology or a good number that says precision and recall are acceptable, aside from lists of (a few hundreds of) test queries per topic where i record numbers of hits i expect.
You mean the precision/recall evaluation process or specific tweaks within solr itself to achieve it? The former is like you, through test cases, though I have real data streaming in all the time. The latter is incremental, but includes multiple fields and boost functions, as well as query and result manipulation.