i work with computers

19 Nov, 2008

My Lucene.Net experience (thus far)

Posted by: Carl In: asp.net| c#| development| sql

After SQL Server 2005 did not return expected results in my full-text searching queries, I’ve decided to move to another search method–an open source search engine, Lucene.Net.  In their own words, “Apache Lucene is a high-performance, full-featured text search engine library…” So, it’s just an indexer, not meant for storage of data.  It needs to work in conjunction with your data source; whether that’s a folder on your file system, or a database, etc.

So far I’ve created simple methods to index data (each row in Lucene is called a “Document”), which in my case, is an index of an ID column, and a Title column.  I want my searches to be on the Title column.  The next thing that will have to be done is creating a scheduled process, of sorts, that will scan my datasource (my database table, in this case) and index all new content.  I’m going to create a Windows Service that does just that, and it will index once per hour.  It’s as simple as creating a service, and adding a .Net Timer control to see if it’s time to update, then running my indexer class.

Now that I’ve got my data, or at least some data, indexed, I can run searches on that data.  There are a few Lucene classes needed to perform a search; a Searcher, an Analyzer, a QueryParser, and a Query object.

  • The Searcher object is responsible for executing the search based on the other objects passed to it. You’ll see what I mean in a second.
  • The Analyzer object is used to analyze search results.  When indexing a “Document”, you can supply an Analyzer object to tokenize the indexed data in different ways.  I chose to use the SimpleAnalyzer, which does not throw out “stop words”, such as “a the an” etc, nor does it remove punctuation.  It basically just takes the inputted data, in my case, the Title column data, and outputs it to lowercase, then indexes it.
  • The QueryParser takes in parameters such as the Analyzer used, and the column to search (again, in my case, I want searches to be done on the “Title” column)
  • The Query object, comes from the QueryParse object, and takes the actual query text as a parameter.  This is where I’d input the user’s search text.

Now that it’s all setup, and I can search, what do I think about it?  I really like it, actually.  It has given me a lot of flexibility, and the results have been more relevant than my Full-Text query results counterparts.  I also removes countless hits to the database for searches as costly as non-caching, auto-complete searching. Plus, it’s fast!  Granted, right now I’ve only got a small portion of my data indexed, maybe 25,000 records, with only two columns per record (ID, Title). I’ll be excited to see how it handles when I’ve got all the data rows indexed,  and performing on the live site.

1 Response to "My Lucene.Net experience (thus far)"

Comment Form

About

I'm an ASP.NET developer who loves learning new frameworks, and methodologies, and I absolutely love simple, yet elegant solutions (don't we all?). Since I'm constantly picking up new things, I'm always asking myself how I can use the new knowledge in my current app to make it better, or more user friendly (or even more developer friendly). In my free time I typically am coding, reading tech books or spending time with my beautiful bride. And that's about it. Hope I didn't bore you too much.