Tweet We continue with our series on implementing MapReduce algorithms found in Data-Intensive Text Processing with MapReduce book. Other posts in this series: Working Through Data-Intensive Text Processing with MapReduce Working Through Data-Intensive Text Processing with MapReduce – Local Aggregation Part II Calculating A Co-Occurrence Matrix with Hadoop MapReduce Algorithms – Order Inversion This post [...]
MapReduce Algorithms – Order Inversion
Tweet This post is another segment in the series presenting MapReduce algorithms as found in the Data-Intensive Text Processing with MapReduce book. Previous installments are Local Aggregation, Local Aggregation PartII and Creating a Co-Occurrence Matrix. This time we will discuss the order inversion pattern. The order inversion pattern exploits the sorting phase of MapReduce to [...]
Calculating A Co-Occurrence Matrix with Hadoop
Tweet This post continues with our series of implementing the MapReduce algorithms found in the Data-Intensive Text Processing with MapReduce book. This time we will be creating a word co-occurrence matrix from a corpus of text. Previous posts in this series are: Working Through Data-Intensive Text Processing with MapReduce Working Through Data-Intensive Text Processing with [...]
Testing Hadoop Programs with MRUnit
Tweet This post will take a slight detour from implementing the patterns found in Data-Intensive Processing with MapReduce to discuss something equally important, testing. I was inspired in part from a presentation by Tom Wheeler that I attended while at the 2012 Strata/Hadoop World conference in New York. When working with large data sets, unit [...]
Working Through Data-Intensive Text Processing with MapReduce – Local Aggregation Part II
Tweet This post continues with the series on implementing algorithms found in the Data Intensive Processing with MapReduce book. Part one can be found here. In the previous post, we discussed using the technique of local aggregation as a means of reducing the amount of data shuffled and transferred across the network. Reducing the amount [...]
Event Programming Example: Google Guava EventBus and Java 7 WatchService
Tweet This post is going to cover using the Guava EventBus to publish changes to a directory or sub-directories detected by the Java 7 WatchService. The Guava EventBus is a great way to add publish/subscribe communication to an application. The WatchService, new in the Java 7 java.nio.file package, is used to monitor a directory for [...]
What’s New in Java 7: WatchService
Tweet Of all the new features in Java 7, one of the more interesting is the WatchService, adding the capability to watch a directory for changes. The WatchService maps directly to the native file event notification mechanism, if available. If a native event notification mechanism is not available, then the default implementation will use polling. [...]
Creating An Asynchronous, Recursive DirectoryStream in Java 7
Tweet Continuing with my series on the Java 7 java.nio.file package, this time covering the DirectoryStream interface. In this post we are going implement our own DirectoryStream that will iterate over the files in an entire directory tree, not just a single directory. Our goal in the end is to have something that works similar [...]
What’s New In Java 7: Copy and Move Files and Directories
Tweet This post is a continuation of my series on the Java 7 java.nio.file package, this time covering the copying and moving of files and complete directory trees. If you have ever been frustrated by Java’s lack of copy and move methods, then read on, for relief is at hand. Included in the coverage is [...]
What’s new in Java 7 – The (Quiet) NIO File Revolution
Tweet Java 7 (“Project Coin”) has been out since July of last year. The additions with this release are useful, for example Try with resources – having closable resources handled automatically from try blocks, Strings in switch statements, multicatch for Exceptions and the ‘‘ operator for working with generics. The addition that everyone was anticipating [...]



