Random Thoughts on Coding - page 9

MapReduce Algorithms - Secondary Sorting

We continue with our series on implementing MapReduce algorithms found in Data-Intensive Text Processing with MapReduce book. Other posts in this series: Working Through Data-Intensive Text Processing with MapReduce Working Through Data-Intensive Text Processing with MapReduce – Local Aggregation Part II Calculating A Co-Occurrence Matrix with Hadoop MapReduce Algorithms –...

January 14, 2013

in General, Hadoop, Java, Mapreduce

MapReduce Algorithms - Order Inversion

This post is another segment in the series presenting MapReduce algorithms as found in the Data-Intensive Text Processing with MapReduce book. Previous installments are Local Aggregation, Local Aggregation PartII and Creating a Co-Occurrence Matrix. This time we will discuss the order inversion pattern. The order inversion pattern exploits the sorting...

December 13, 2012

in Hadoop, Java, Mapreduce

Calculating A Co-Occurrence Matrix with Hadoop

This post continues with our series of implementing the MapReduce algorithms found in the Data-Intensive Text Processing with MapReduce book. This time we will be creating a word co-occurrence matrix from a corpus of text. Previous posts in this series are: Working Through Data-Intensive Text Processing with MapReduce Working Through...

November 30, 2012

in Hadoop, Java

Testing Hadoop Programs with MRUnit

This post will take a slight detour from implementing the patterns found in Data-Intensive Processing with MapReduce to discuss something equally important, testing. I was inspired in part from a presentation by Tom Wheeler that I attended while at the 2012 Strata/Hadoop World conference in New York. When working with...

November 1, 2012

in Hadoop, Java, Testing

Working Through Data-Intensive Text Processing with MapReduce - Local Aggregation Part II

This post continues with the series on implementing algorithms found in the Data Intensive Processing with MapReduce book. Part one can be found here. In the previous post, we discussed using the technique of local aggregation as a means of reducing the amount of data shuffled and transferred across the...

October 16, 2012

in Hadoop, Java