Random Thoughts on Coding

Whatever comes to mind at the moment.

Kafka Streams - the Processor API

If you work on systems delivering large quatinties of data, you have probably heard of Kafka if you aren’t using it already. At a very high level, Kafka is a fault tolerant, distributed publish-subscribe messaging system that is designed for speed and the ability to handle hundreds of thousands of messages. Kafka has many applications, one of which is real-time processing. Real time processing typically involves reading data from a topic (source) doing some analytic or transformation work then writing the results to another topic (sink). Currently to do this type of work your choices are:

  1. Using your own custom code by using a KafkaConsumer to read in the data then writing out data via a KafkaProducer.
  2. Use a full fledged stream-processing framework such as Spark Steaming,Flink, Storm.

While either of those approaches are great, in some cases it would be nice to have a solution that was somewhere in the middle of those two. To that end, a Processor API was proposed via the KIP – Kafka Improvement Proposals process. The aim of the Processor API is to introduce a client to enable processing data consumed from Kafka and writing the results back into Kafka. There are two components of the processor client:

  1. A “lower-level” processor that providea API’s for data-processing, composable processing and local state storage.
  2. A “higher-level” stream DSL that would cover most processor implementation needs.

Java 8 CompletableFutures Part I

When Java 8 was released a while ago, a great concurrency tool was added, the CompletableFuture class. The CompletableFuture is a Future that can have it’s value explicity set and more interestingly can be chained together to support dependent actions triggered by the CompletableFutures completion. CompletableFutures are analogous to the ListenableFuture class found in Guava. While the two offer similar functionality, there won’t be any comparisons done in this post. I have previously covered ListenableFutures in this post. While the coverage of ListenableFutures is a little dated, most of the information should still apply. The documentation for the CompletableFuture class is comprehensive, but lacks concrete examples of how to use them. My goal is show how to use CompletableFutures through a series of simple examples in unit tests. Originally I was going to cover the CompleteableFuture in one post, but there is so much information, it seems better to break up coverage into 3 parts –

  1. Creating/combining tasks and adding listeners for follow on work.
  2. Handling errors and error recovery
  3. Canceling and forcing completion.

Scripting Tmux for Kafka

I’ve known about tmux for some time now, but I kept putting off working with it. Lately I’ve started looking at the Processor API available in Kafka. After opening up 4-5 terminal windows (Zookeeper, Kafka, sending messgaes, recieving messages) and toggling in between them, I quickly realized I needed a way to open up these terminal sessions at one time and have them centralized. I had the perfect use case to use tmux! This post is about scripting tmux to save time during devlopment, in this case working with kafka.

Working With Java 8 Optionals

In this post we are going to cover working with the Optional class introduced in Java 8. The introduction of Optional was new only to Java. Guava has had a version of Optional and Scala has had the Option type for some time. Here’s a description of Optional from the Java API docs:

A container object which may or may not contain a non-null value.

Since so many others have done a good job of describing the Optional type, we won’t be doing so here. Rather for this post, we are going to cover how to use the Optional type without resorting to directly accessing the value contained or doing explicit checks if a value is present.

Guava ImmutableCollections, Multimaps and Java 8 Collectors

In this post we are going to discuss creating custom Collector instances. The Collector interface was introduced in the java.util.stream package when Java 8 was released. A Collector is used to “collect” the results of stream operations. Results are collected from a stream when the terminal operation Stream.collect method is called. While there are default implementations available, there are times we’ll want to use some sort of custom container. Our goal today will be to create Collector instances that produce Guava ImmutableCollections and Multimaps.