Random Thoughts on Coding
A technical blog on whatever comes to mind
-
MapReduce Algorithms - Secondary Sorting
We continue with our series on implementing MapReduce algorithms found in Data-Intensive Text Processing with MapReduce book. Other posts in this series: Working Through Data-Intensive Text Processing with MapReduce Working Through Data-Intensive Text Processing with MapReduce – Local Aggregation Part II Calculating A Co-Occurrence Matrix with Hadoop MapReduce Algorithms –...
-
MapReduce Algorithms - Order Inversion
This post is another segment in the series presenting MapReduce algorithms as found in the Data-Intensive Text Processing with MapReduce book. Previous installments are Local Aggregation, Local Aggregation PartII and Creating a Co-Occurrence Matrix. This time we will discuss the order inversion pattern. The order inversion pattern exploits the sorting...
-
Calculating A Co-Occurrence Matrix with Hadoop
This post continues with our series of implementing the MapReduce algorithms found in the Data-Intensive Text Processing with MapReduce book. This time we will be creating a word co-occurrence matrix from a corpus of text. Previous posts in this series are: Working Through Data-Intensive Text Processing with MapReduce Working Through...
-
Testing Hadoop Programs with MRUnit
This post will take a slight detour from implementing the patterns found in Data-Intensive Processing with MapReduce to discuss something equally important, testing. I was inspired in part from a presentation by Tom Wheeler that I attended while at the 2012 Strata/Hadoop World conference in New York. When working with...
-
Working Through Data-Intensive Text Processing with MapReduce - Local Aggregation Part II
This post continues with the series on implementing algorithms found in the Data Intensive Processing with MapReduce book. Part one can be found here. In the previous post, we discussed using the technique of local aggregation as a means of reducing the amount of data shuffled and transferred across the...