Content Detection, Metadata and Content Extraction with Apache Tika

Encountering the situation that you want to extract meta-data or content from a file – might it be an office document, a spreadsheet or even a mp3 or an image – or you’d like to detect the content type for a given file then Apache Tika might be a helpful tool for you. Apache Tika supports a variety of document formats and has a nice, extendable parser and detection API with a lot of built-in parsers available. ...

December 2, 2012 · 11 min · 2222 words · Micha Kops

Creating a LDAP server for your development environment in 5 minutes

I am currently working on a plugin that needs to receive some information from an LDAP/Active Directory using JNDI. That’s why I needed to set up a directory server in a short time and I didn’t want to waste much effort for here. Luckily for me the Apache Directory Studio saved my day and allowed me to set up everything I needed in a few minutes. Short and sweet: In this tutorial I’m going to show you how to configure everything you need in your Eclipse IDE and finally how to query the created LDAP server with a tiny java client using JNDI. ...

June 13, 2011 · 5 min · 914 words · Micha Kops

Dependency management in Grails 1.2

Sometimes I get the impression that there are many Maven haters in the Groovy/Grails community – now with version 1.2 of the Grails framework they are able to abandon the evil satanic Grails Maven Plugin and embrace the neverending joys of a slim, nice, sexy dependency resolution dsl .. here we go .. lets define some dependencies wheee … Our dependency configuration is defined in grails-app/config/BuildConfig.groovy as a property named grails.project.dependency.resolution: grails.project.dependency.resolution = { // here will be some dependencies } ...

May 23, 2010 · 2 min · 336 words · Micha Kops

Apache Webserver Snippets

Deny all methods excepting POST and GET RewriteCond %{REQUEST_METHOD} ^(TRACE|TRACK|OPTIONS|HEAD) RewriteRule .* - [F] Rewrite all aliases for a domain to a single domain RewriteEngine On RewriteCond %{HTTP_HOST} ^(www\.)?mydomain1\.com [NC,OR] RewriteCond %{HTTP_HOST} ^(www\.)?mydomain2\.com [NC,OR] RewriteCond %{HTTP_HOST} ^(www\.)?mydomain3\.com [NC,OR] RewriteRule (.*) http://mydomain.com/$1 [R=301,L]

March 1, 2010 · 1 min · 42 words · Micha Kops

Kafka Snippets

Start an Image with kcat / kafka-cat for Debugging kubectl -n NAMESPACE run "$(whoami)-debug" -it --rm \ --image=confluentinc/cp-kafkacat:6.1.9 \ --restart=Never \ -- bash Dockerfile for Kafka Analysis Container with different Tools With jq, kafka console tools, schema registry tools and kafkacat installed …​. Dockerfile FROM confluentinc/cp-kafka:6.2.1 as cp-kafka FROM confluentinc/cp-schema-registry:6.2.1 as cp-schema-registry FROM debian:10-slim ARG DEBIAN_FRONTEND=noninteractive # Install necessary tools RUN apt-get update && apt-get install -y \ curl \ jq \ yq \ && rm -rf /var/lib/apt/lists/* # Install kafkacat binary RUN apt-get update && apt-get install -y kafkacat && rm -rf /var/lib/apt/lists/* # Copy Kafka binaries COPY --from=cp-kafka /usr/bin/kafka-* /usr/bin/ COPY --from=cp-schema-registry /usr/bin/schema-registry* /usr/bin/ # Copy entrypoint script COPY entrypoint.sh /usr/bin/entrypoint.sh RUN chmod +x /usr/bin/entrypoint.sh ENTRYPOINT ["/usr/bin/entrypoint.sh"] ...

March 1, 2010 · 7 min · 1333 words · Micha Kops