If you work on systems delivering large quatinties of data, you have probably heard of Kafka if you aren’t using it already. At a very high level, Kafka is a fault tolerant, distributed publish-subscribe messaging system that is designed for speed and the ability to handle hundreds of thousands of messages. Kafka has many applications, one of which is real-time processing. Real time processing typically involves reading data from a topic (source) doing some analytic or transformation work then writing the results to another topic (sink). Currently to do this type of work your choices are:
- Using your own custom code by using a KafkaConsumer to read in the data then writing out data via a KafkaProducer.
- Use a full fledged stream-processing framework such as Spark Steaming,Flink, Storm.
While either of those approaches are great, in some cases it would be nice to have a solution that was somewhere in the middle of those two. To that end, a Processor API was proposed via the KIP – Kafka Improvement Proposals process. The aim of the Processor API is to introduce a client to enable processing data consumed from Kafka and writing the results back into Kafka. There are two components of the processor client:
- A “lower-level” processor that providea API’s for data-processing, composable processing and local state storage.
- A “higher-level” stream DSL that would cover most processor implementation needs.