Random Thoughts on Coding

Whatever comes to mind at the moment.

Creating a Yelling App in Kafka Streams

This blog post will quickly get you off the ground and show you how Kafka Streams works. We’re going to make a toy application that takes incoming messages and upper-cases the text of those messages, effectively yelling at anyone who reads the message. This application is called the “Yelling Application”.

Applying Kafka Streams to the Purchase Transaction Flow

Maybe you know a bit about Kafka and/or Kafka Streams (and maybe you don’t and are burning up with anticipation…). Rather than tell you about how Kafka Streams works and what it does, I would like to jump straight into a practical example of how you can apply Kafka Streams directly to the purchase flow transaction – so you can see Kafka Streams in Action for yourself!

Machine Learning With Kafka Streams

The last two posts on Kafka Streams (Kafka Processor API, KStreams DSL) introduced kafka streams and described how to get started using the API. This post will demonstrate a use case that prior to the development of kafka streams, would have required using a separate cluster running another framework. We are going to take live a stream of data from twitter and perform language analysis to identify tweets in English, French and Spanish. The library we are going to do this with is LingPipe. Here’s a description of Ling-Pipe directly from the website:

LingPipe is tool kit for processing text using computational linguistics. LingPipe is used to do tasks like:

  • Find the names of people, organizations or locations in news
  • Automatically classify Twitter search results into categories
  • Suggest correct spellings of queries

The Hosebird Client from Twitter’s Streaming API will be used to create the stream of data that will serve as messages sent to kafka.

Kafka Streams - the KStreams API

The last post covered the new Kafka Streams library, specifically the “low-level” Processor API. This time we are going to cover the “high-level” API, the Kafka Streams DSL. While the Processor API gives you greater control over the details of building streaming applications, the trade off is more verbose code. In most cases, however, the level of detail provided by the Processor API is not required and the KStream API will get the job done. Compared to the declarative approach of the Processor API , KStreams uses a more functional style. You’ll find building an application is more a matter of stating “what” you want to accomplish versus “how”. Additionally, since many interfaces in the Kafka Streams API are Java 8 syntax compatible (method handles and lambda expressions can be substituted for concrete types), using the KStream DSL allows for building powerful applications quickly with minimal code. This post won’t be as detailed as the previous one, as the description of Kafka Streams applies to both APIs. Here we’ll focus on how to implement the same functionality presented in the previous post with the KStream API.

Kafka Streams - the Processor API

If you work on systems delivering large quatinties of data, you have probably heard of Kafka if you aren’t using it already. At a very high level, Kafka is a fault tolerant, distributed publish-subscribe messaging system that is designed for speed and the ability to handle hundreds of thousands of messages. Kafka has many applications, one of which is real-time processing. Real time processing typically involves reading data from a topic (source) doing some analytic or transformation work then writing the results to another topic (sink). Currently to do this type of work your choices are:

  1. Using your own custom code by using a KafkaConsumer to read in the data then writing out data via a KafkaProducer.
  2. Use a full fledged stream-processing framework such as Spark Steaming,Flink, Storm.

While either of those approaches are great, in some cases it would be nice to have a solution that was somewhere in the middle of those two. To that end, a Processor API was proposed via the KIP – Kafka Improvement Proposals process. The aim of the Processor API is to introduce a client to enable processing data consumed from Kafka and writing the results back into Kafka. There are two components of the processor client:

  1. A “lower-level” processor that providea API’s for data-processing, composable processing and local state storage.
  2. A “higher-level” stream DSL that would cover most processor implementation needs.