Observation: Lucene rocks

tlrobinson · on March 16, 2008

Agreed. I was able to set up full text indexing/searching in a few hours.

The longest part of the process was trying to figure out which versions of Java Lucene and Zend PHP Lucene were compatible. FYI:

Lucene 2.1 index format support (which is also used in Lucene 2.2) is included in the current "trunk" branch. It is available via SVN in current nightly snapshots.

We hope to include Lucene 2.1 index format support in ZF 1.5.0. The current release (ZF V1.0.4) works with Lucene 1.9-2.0 index formats.

http://framework.zend.com/manual/en/zend.search.lucene.html#...

henning · on March 16, 2008

And the biggest difference between my ghetto system and Lucene is that searches with lots of results are very, very fast.

thorax · on March 16, 2008

I've heard good things about Lucene, but we use Sphinx: http://www.sphinxsearch.com/

For our tests, it indexed much faster than the common Lucene implementations, and for our needs was also a tad faster overall. I haven't tried the newest version, though.

nickb · on March 16, 2008

I don't know what kind of testing you've done but nothing even approaches the speed of Lucene. It's by far the fastest open source search engine currently available. If you're using Rails, I cannot recommend Solr enough. It's amazing.

Cutting's a genius.

thorax · on March 16, 2008

Do you have references for "nothing even approaches"? Specifically compared to Sphinx? The only comparisons I've found are showing sphinx coming ahead in many indexing/search cases (if only slightly). See my other comment on this thread with links to benchmarks where sphinx clearly "comes close". We did a good bit of research on this, so it does feel odd that you'd say "nothing even approaches".

It was also ridiculously easy to get Sphinx up and going. Lucene is a killer engine, no doubt, but Sphinx's ROI alone won us over.

alice · on March 18, 2008

http://blog.evanweaver.com/articles/2008/03/17/rails-search-...

nreece · on March 16, 2008

I've used Sphinx in one of the PHP/MySQL projects, and its much faster than any other (free/open) data indexing platform I've used. Althought configuring Sphinx and getting it to run takes a bit of an effort, but its worth it.

henning · on March 16, 2008

If you're working in Rails and you just want simple search, Sphinx is the fastest path to getting started.

I think of Lucene as more implementor-neutral than Sphinx; Lucene is an API as well as a Java library.

thorax · on March 16, 2008

Some random benchmarks: http://jayant7k.blogspot.com/2006/06/benchmarking-results-of... http://pagetracer.com/2008/02/15/sphinx-and-lucene-search-en...

azsromej · on March 16, 2008

I second your observation, though my recent foray into Lucene was far simpler. I used the RAMDirectory feature to build an index in memory for a large list of names (and our queries go through a thick OR/M). The user of the application needs to be able to filter the list by keywords and doing the query each time was taking too long (2 or 3 seconds). It's now near instantaneous.

I think for 10,000 documents (two fields: name and id) it takes 20 seconds to build the index in Lucene .NET.

I had always heard of using Lucene for really large datasets and thought it might be overkill for speeding up a somewhat small part of one application dialog. In reality it took a single reference to the Lucene .NET dll and a few functions to build the documents and add them to the index.

asjo · on March 16, 2008

Has anyone compare Lucene to Xapian? I have never tried Lucene, but have been very happy with Xapian.

http://xapian.org/

initself · on March 16, 2008

Plucene - Perl port of Lucene

http://search.cpan.org/~tmtm/Plucene-1.21/lib/Plucene.pm

bluelu · on March 16, 2008

Plucene is slow as hell.

You better use Kinoseach, which also uses the same index format as lucene.

Some benchmarks are on this site : http://marvinhumphrey.com/kinosearch/benchmarks.html

chaostheory · on March 16, 2008

lucene is pretty cool and it's a lot better than anything I've seen so far (including ferret). The only problem I've experienced with it was index corruption, which is fairly common and frustrating (though in fairness it could have been due to my sys admin skills)