Helo – today I wanted to post a small tutorial for a small index and search operation using the Lucene indexer and Maven for the project setup.

Setup

  1. Create an empty Maven sample project using the Eclipse Maven Plugin or use the following console command:

    mvn archetype:create -DgroupId=com.hascode.demo.search -DartifactId=lucene-sample
  2. Here is my pom.xml there are some dependencies for Lucene defined:

    <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
      <modelVersion>4.0.0</modelVersion>
      <groupId>com.hascode.demo.search</groupId>
      <artifactId>lucene-sample</artifactId>
      <version>0.0.1-SNAPSHOT</version>
      <name>My Lucene Search Sample</name>
      <description>Lucene Search Sample</description>
      <dependencies>
        <dependency>
          <groupId>org.apache.lucene</groupId>
          <artifactId>lucene-core</artifactId>
          <version>2.4.1</version>
        </dependency>
        <dependency>
          <groupId>lucene</groupId>
          <artifactId>lucene</artifactId>
          <version>1.4.3</version>
        </dependency>
      </dependencies>
    </project>

Index Example

I put everything in one class in the package com.hascode.demo.search called Main.java – hey it’s just a demo:

package com.hascode.demo.search;

import java.io.IOException;
import java.util.ArrayList;
import java.util.List;

import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.index.CorruptIndexException;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.queryParser.MultiFieldQueryParser;
import org.apache.lucene.queryParser.ParseException;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopDocCollector;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.store.LockObtainFailedException;

public class Main {

    public static void main(String[] args) throws CorruptIndexException,
            LockObtainFailedException, IOException, ParseException {
        List l = new ArrayList();
        l.add("you all");
        l.add("visit");
        l.add("some blog");
        l.add("sometimes");

        // create some index
        // we could also create an index in our ram ...
        // Directory index = new RAMDirectory();
        Directory index = FSDirectory.getDirectory("/tmp/ourtestindex/");
        StandardAnalyzer analyzer = new StandardAnalyzer();
        IndexWriter w = new IndexWriter(index, analyzer, true,
                IndexWriter.MaxFieldLength.UNLIMITED);

        // index some data
        for (String i : l) {
            System.out.println("indexing " + i);
            Document doc = new Document();
            doc.add(new Field("title", i, Field.Store.YES,
                            Field.Index.ANALYZED));
            doc.add(new Field("name", i, Field.Store.YES,
                            Field.Index.ANALYZED));
            w.addDocument(doc);
        }

        // loop and index some random data
        for (int i = 1; i < 40000; i++) {
            Document doc = new Document();
            doc.add(new Field("title", "xyz" + i, Field.Store.YES,
                    Field.Index.ANALYZED));
            doc.add(new Field("name", "" + i, Field.Store.YES,
                    Field.Index.ANALYZED));
            w.addDocument(doc);
        }
        w.close();
        System.out.println("index generated");
        // parse query over multiple fields
        Query q = new MultiFieldQueryParser(new String[]{"title", "name"},
                analyzer).parse("s*");

        // searching ...
        int hitsPerPage = 10;
        IndexSearcher searcher = new IndexSearcher(index);
        TopDocCollector collector = new TopDocCollector(hitsPerPage);
        searcher.search(q, collector);
        ScoreDoc[] hits = collector.topDocs().scoreDocs;

        // output results
        System.out.println("Found " + hits.length + " hits.");
        for (int i = 0; i < hits.length; ++i) {
            int docId = hits[i].doc;
            Document d = searcher.doc(docId);
            System.out.println((i + 1) + ". " + d.get("name") + ": "
                    + d.get("title"));
        }

    }

}

Running the Example

Running the code produces the following output:

indexing you all
indexing visit
indexing some blog
indexing sometimes
index generated
Found 2 hits.
1. sometimes: sometimes
2. some blog: some blog

Troubleshooting

If we change line 65/66 to search for xyz we get a nifty exception at runtime – more about this situation can be found in the Lucene FAQ:

Exception in thread "main" org.apache.lucene.search.BooleanQuery$TooManyClauses: maxClauseCount is set to 1024

Article Updates

  • 2015-03-02: Structure and table of contents added, links to Lucene tutorials added