• Scripting Tmux for Kafka

    I’ve known about tmux for some time now, but I kept putting off working with it. Lately I’ve started looking at the Processor API available in Kafka. After opening up 4-5 terminal windows (Zookeeper, Kafka, sending messgaes, recieving messages) and toggling in between them, I quickly realized I needed a...


  • Working With Java 8 Optionals

    In this post we are going to cover working with the Optional class introduced in Java 8. The introduction of Optional was new only to Java. Guava has had a version of Optional and Scala has had the Option type for some time. Here’s a description of Optional from the...


  • Guava ImmutableCollections, Multimaps and Java 8 Collectors

    In this post we are going to discuss creating custom Collector instances. The Collector interface was introduced in the java.util.stream package when Java 8 was released. A Collector is used to “collect” the results of stream operations. Results are collected from a stream when the terminal operation Stream.collect method is...


  • Learning Scala Implicits with Spark

    A while back I wrote two posts on avoiding the use of the groupBy function in Spark. While I won’t re-hash both posts here, the bottom line was to take advantage of the combineByKey or aggreagateByKey functions instead. While both functions hold the potential for improved performance and efficiency in...


  • Spark And Guava Tables

    Last time we covered Secondary Sorting in Spark. We took airline performance data and sorted results by airline, destination airport and the amount of delay. We used id’s for all our data. While that approach is good for performance, viewing results in that format loses meaning. Fortunately, the Bureau of...