Elasticsearch Integration Testing with Java

When building up search engines, indexing tons of data into a schema-less, distributed data store, Elasticsearch has always been a favourite tool of mine. In addition to its core features, it also offers tools and documentation for us developers when we need to write integration tests for our Elasticsearch powered Java applications. In the following tutorial I’d like to demonstrate how to implement a small sample application using Elasticsearch under the hood and how to write integration-tests with these tools for this application afterwards. ...

August 23, 2016 · 12 min · 2532 words · Micha Kops

Lucene by Example: Specifying Analyzers on a per-field-basis and writing a custom Analyzer/Tokenizer

Lucene is my favourite search engine library and the more often I use it in my projects the more features or functionality I find that were unknown to me. Two of those features I’d like to share in the following tutorial is one the one hand the possibility to specify different analyzers on a per-field basis and on the other hand the API to create a simple character based tokenizer and analyzer within a few steps. ...

July 6, 2014 · 7 min · 1468 words · Micha Kops

Content Detection, Metadata and Content Extraction with Apache Tika

Encountering the situation that you want to extract meta-data or content from a file – might it be an office document, a spreadsheet or even a mp3 or an image – or you’d like to detect the content type for a given file then Apache Tika might be a helpful tool for you. Apache Tika supports a variety of document formats and has a nice, extendable parser and detection API with a lot of built-in parsers available. ...

December 2, 2012 · 11 min · 2222 words · Micha Kops

Lucene Snippets: Index Stats

In Lucene 4.x there is an API to fetch index statistics for specific document’s fields. The following examples shows how to create an index with some random documents and fetch some statistics for a field afterwards .. Lucene Dependencies Just one dependency needed here .. lucene-core. I’ve added the declarations needed for Maven and SBT here .. if you’re using Gradle or Buildr you should’t have a problem to create your build file either.. ...

September 8, 2012 · 3 min · 560 words · Micha Kops

Lucene Snippets: Faceting Search

The latest snippet from my Lucene examples demonstrates how to achieve a facet search using the Lucene 4.0 API and how easy it is to define multiple category paths to aggregate search results for different possible facets. In the following example we’re indexing some books as a classical example and create multiple category paths for author, publication date and category afterwards .. Lucene Dependencies We simply need two dependencies here .. lucene-core of course and in addition the lucene-facet library .. I’ve added the declarations needed for Maven and SBT here .. if you’re using Gradle or Buildr you should’t have a problem to transfer the information needed ;) ...

August 28, 2012 · 4 min · 837 words · Micha Kops

Hibernate Search Faceting: Discrete and Range Faceting by Example

In today’s tutorial we’re exploring the world of faceted searches like the one we’re used to see when we’re searching for an item on Amazon.com or other websites. We’re using Hibernate Search here that offers an API to perform discrete as well as range faceted searches on our persisted data. Maven Dependencies Needed For simplicity’s sake am I going to use an HSQL database for persistence, in addition the dependencies for hibernate-entitymanager and hibernate-search (of course) should be added to your pom.xml ...

March 26, 2012 · 5 min · 986 words · Micha Kops

Extending the Confluence Search Index

Developing plugins for the Confluence Wiki a developer sometimes needs to save additional metadata to a page object using Bandana or the ContentPropertyManager. Wouldn’t it be nice if this metadata was available in the built-in Lucene index? That is were the Confluence Extractor Module comes into play.. Overview An extractor allows the developer to add new fields to the lucene search index. Creating a new extractor is quite simple – just implement the interface com.atlassian.bonnie.search.Extractor or bucket.search.lucene.extractor.BaseAttachmentContentExtractor if you want to build a new file extractor. ...

May 23, 2010 · 4 min · 713 words · Micha Kops

How to build a quick Lucene Search

Helo – today I wanted to post a small tutorial for a small index and search operation using the Lucene indexer and Maven for the project setup. Setup Create an empty Maven sample project using the Eclipse Maven Plugin or use the following console command: mvn archetype:create -DgroupId=com.hascode.demo.search -DartifactId=lucene-sample Here is my pom.xml there are some dependencies for Lucene defined: <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>com.hascode.demo.search</groupId> <artifactId>lucene-sample</artifactId> <version>0.0.1-SNAPSHOT</version> <name>My Lucene Search Sample</name> <description>Lucene Search Sample</description> <dependencies> <dependency> <groupId>org.apache.lucene</groupId> <artifactId>lucene-core</artifactId> <version>2.4.1</version> </dependency> <dependency> <groupId>lucene</groupId> <artifactId>lucene</artifactId> <version>1.4.3</version> </dependency> </dependencies> </project> ...

March 25, 2010 · 3 min · 540 words · Micha Kops

Docker Snippets

Inspect Docker Image with dive Install dive brew install dive Now we can run dive against any Docker image we wish to inspect…​ Run dive dive confluentinc/cp-kafka:5.4.3 Figure 1. Screenshot of dive analyzing the Kafka Docker image Resources: dive on GitHub Introspect Private Docker Registry List images: curl -s https://the-registry-url/v2/_catalog Get tags for an image curl -s https://the-registry-url/v2/the-image-name/tags/list An example: curl -s https://registry.local/v2/alpine/rabbitmq/tags/list {"name":"alpine/rabbitmq","tags":["3.9.17"]} Run Trivy Scan for Docker Image docker run aquasec/trivy image IMAGE:TAG ...

March 1, 2010 · 2 min · 310 words · Micha Kops

Firefox Snippets

Configure address bar to return search results Enter about:config in the address bar Search for the key keyword.url Modify the value for your search engine of choice .. e.g. for the google search: http://www.google.com/search?q=

March 1, 2010 · 1 min · 34 words · Micha Kops