Helo – today I wanted to post a small tutorial for a small index and search operation using the Lucene indexer and Maven for the project setup.
Setup
-
Create an empty Maven sample project using the Eclipse Maven Plugin or use the following console command:
mvn archetype:create -DgroupId=com.hascode.demo.search -DartifactId=lucene-sample
-
Here is my pom.xml there are some dependencies for Lucene defined:
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>com.hascode.demo.search</groupId> <artifactId>lucene-sample</artifactId> <version>0.0.1-SNAPSHOT</version> <name>My Lucene Search Sample</name> <description>Lucene Search Sample</description> <dependencies> <dependency> <groupId>org.apache.lucene</groupId> <artifactId>lucene-core</artifactId> <version>2.4.1</version> </dependency> <dependency> <groupId>lucene</groupId> <artifactId>lucene</artifactId> <version>1.4.3</version> </dependency> </dependencies> </project>
Index Example
I put everything in one class in the package com.hascode.demo.search called Main.java – hey it’s just a demo:
package com.hascode.demo.search;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.index.CorruptIndexException;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.queryParser.MultiFieldQueryParser;
import org.apache.lucene.queryParser.ParseException;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopDocCollector;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.store.LockObtainFailedException;
public class Main {
public static void main(String[] args) throws CorruptIndexException,
LockObtainFailedException, IOException, ParseException {
List l = new ArrayList();
l.add("you all");
l.add("visit");
l.add("some blog");
l.add("sometimes");
// create some index
// we could also create an index in our ram ...
// Directory index = new RAMDirectory();
Directory index = FSDirectory.getDirectory("/tmp/ourtestindex/");
StandardAnalyzer analyzer = new StandardAnalyzer();
IndexWriter w = new IndexWriter(index, analyzer, true,
IndexWriter.MaxFieldLength.UNLIMITED);
// index some data
for (String i : l) {
System.out.println("indexing " + i);
Document doc = new Document();
doc.add(new Field("title", i, Field.Store.YES,
Field.Index.ANALYZED));
doc.add(new Field("name", i, Field.Store.YES,
Field.Index.ANALYZED));
w.addDocument(doc);
}
// loop and index some random data
for (int i = 1; i < 40000; i++) {
Document doc = new Document();
doc.add(new Field("title", "xyz" + i, Field.Store.YES,
Field.Index.ANALYZED));
doc.add(new Field("name", "" + i, Field.Store.YES,
Field.Index.ANALYZED));
w.addDocument(doc);
}
w.close();
System.out.println("index generated");
// parse query over multiple fields
Query q = new MultiFieldQueryParser(new String[]{"title", "name"},
analyzer).parse("s*");
// searching ...
int hitsPerPage = 10;
IndexSearcher searcher = new IndexSearcher(index);
TopDocCollector collector = new TopDocCollector(hitsPerPage);
searcher.search(q, collector);
ScoreDoc[] hits = collector.topDocs().scoreDocs;
// output results
System.out.println("Found " + hits.length + " hits.");
for (int i = 0; i < hits.length; ++i) {
int docId = hits[i].doc;
Document d = searcher.doc(docId);
System.out.println((i + 1) + ". " + d.get("name") + ": "
+ d.get("title"));
}
}
}
Running the Example
Running the code produces the following output:
indexing you all
indexing visit
indexing some blog
indexing sometimes
index generated
Found 2 hits.
1. sometimes: sometimes
2. some blog: some blog
Troubleshooting
If we change line 65/66 to search for xyz we get a nifty exception at runtime – more about this situation can be found in the Lucene FAQ:
Exception in thread "main" org.apache.lucene.search.BooleanQuery$TooManyClauses: maxClauseCount is set to 1024
Other Lucene Articles
If you’re interested in some other Lucene articles of mine, please feel free to have a look at the following list:
-
Lucene by Example: Specifying Analyzers on a per-field-basis and writing a custom Analyzer/Tokenizer
-
Creating elegant, typesafe Queries for JPA, mongoDB/Morphia and Lucene using Querydsl
-
Content Detection, Metadata and Content Extraction with Apache Tika
-
JPA Persistence and Lucene Indexing combined in Hibernate Search
-
Hibernate Search Faceting: Discrete and Range Faceting by Example
Article Updates
-
2015-03-02: Structure and table of contents added, links to Lucene tutorials added