Random Thoughts on Coding

Whatever comes to mind at the moment.

FlatMap in Guava

This is a short post about a method I recently discovered in Guava.

The Issue

I had a situation at work where I was working with objects structured something like this:

Sample Object Structures
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
public class Outer {
    String outerId;
    List<Inner> innerList;
    .......
}

public class Inner {
    String innerId;
    Date timestamp;
}

public class Merged {
    String outerId;
    String innerId;
    Date timestamp;
}

My task was flatten a list Outer objects (along with the list of Inner objects) into a list of Merged objects. Since I’m working with Java 7, using streams is not an option.

The First Solution

Instead I turn to the FluentIterable class from Guava. My first instinct is to go with the FluentIterable.transform method (which is essentially a map function):

An Iterable of Iterables
1
2
3
4
5
6
List<Outer> originalList = getListOfObjects();

Function<Outer,List<Merged>> flattenFunction //Details left out for clarity

//returns an Iterable of Lists!
Iterable<List<Merged>> mergedObjects = FluentIterable.from(originalList).tranform(flattenFunction);

But I really want a single collection of Merged objects, not an iterable of lists! The missing ingredient here is a flatMap function. Since I’m not using Scala, Clojure or Java 8, I feel that I’m out of luck.

A Better Solution

I decide to take a closer look at the FluentIterable class and I discover the FluentIterable.transformAndConcat method. The transformAndConcat method applies a function to each element of the fluent iterable and appends the results into a single iterable instance. I have my flatMap function in Guava! Now my solution looks like this:

FlatMap Solution in Guava
1
2
3
4
5
List<Outer> originalList = getListOfObjects();

Function<Outer,List<Merged>> flattenFunction //Details left out for clarity

Iterable<Merged> mergedObjects = FluentIterable.from(originalList).transformAndConcat(flattenFunction);

Conclusion

While this is a very short post, it goes to show how useful the Guava library is and how functional programming concepts can make our code more concise.

Sql for Lucene

A short time ago, I started a side project to learn the latest version of Antlr. I decided to do something that has always interested me, a sql parser for the Lucene search engine. Even though the parser is a learning exercise, I thought someone else could find this useful. This post will cover the functionality of the LuceneQueryParser. Building the parser using Antlr4 will be coming in later posts.

Introduction and Examples

The LuceneSqlParser supports a subset of standard sql. Here are some examples:

Sample sql query handled
1
2
3
4
5
6
7
8
9
Select last_name from '/path/to/index/' where first_name='Foo' and age <=30 and city='Boston' limit 25

Select * from 'path/index/' where age in (31, 30, 50)

Select first_name, last_name from '/path/index/' where city in ('Cincinatti', 'New York', 'Boyds')

Select first_name from '/path/index/' where age between 35 and 50 and first_name like 'Br*'
-- Also takes paths from Windows OS
Select first_name from 'C:/path/index/' where first_name='John' and (age<=45 and city not in ('New York', 'Boston', 'Atlanta'))

The LuceneSqlParser returns a BooleanQuery. The BooleanQuery will contain different types of lucene query objects depending on the predicates used. There is a class Searcher avaiable for use with the LuceneSqlParser. The Searcher abstracts away the opening of a lucene IndexSearcher, iterating over the ScoreDoc array and extracting results. Next, we’ll take a look at the rules used to parse the sql.

Java 8 Functional Interfaces and Checked Exceptions

The Java 8 lambda syntax and functional interfaces have been a productivity boost for Java developers. But there is one drawback to functional interfaces. None of the them as currently defined in Java 8 declare any checked exceptions. This leaves the developer at odds on how best to handle checked exceptions. This post will present one option for handling checked exceptions in functional interfaces. We will use use the Function in our example, but the pattern should apply to any of the functional interfaces.

Example of a Function with a Checked Exception

Here’s an example from a recent side-project using a Function to open a directory in Lucene. As expected, opening a directory for writing/searching throws an IOException:

Create a Lucene Directory
1
2
3
4
5
6
7
   private Function<Path, Directory> createDirectory =  path -> {
        try{
            return FSDirectory.open(path);
        }catch (IOException e){
            throw new RuntimeException(e);
        }
    };

I/O With Files That Aren’t Files

Recently at work I needed to search through our archived files and provide the results by the end of the day. Here’s the parameters of the request:

  1. The archive files are encrypted and stored in HDFS (Don’t ask why we store them in HDFS).
  2. The files vary in size form 3-9 GB.
  3. The total number of files to search was 300+
  4. It takes between 1 – 2 minutes to decrypt each file.

In the past there have been requests to search one archived file. In those cases we would copy the file out of HDFS to a server. Then run a shell script to decrypt the file and perform the search. The decrypting program requires 2 arguments: an encrypted file and a file to write the decrypted data to. This means the decrypted and encrypted file are on disk at the same time.

At an average rate of of 1.5 minutes to decrypt a single file, it was going to take 450 minutes (7.5 hours) for 300 files. To add to my dilema, there wasn’t enough time to write custom RecordReader. The only solution would be to stream the files in parallel. But there 2 problems with that approach:

  1. The server does not have enough space for 20 (10 encrypted and 10 decrypted) files at a time.
  2. The decrypting code does read from stdin or write to stdout.

What to do? Use named pipes of course!

Whats New in Java 8 - Date API Part II

This post is continues our review of the Date API that came with the release of Java 8. We are going to continue our concentration on classes that make working with dates/times very easy. Working with date objects in previous releases of Java was very challenging with respect to adding time or getting the difference between dates. Hopefully after looking at the classes we present here, your opinion of working with dates and times in Java will change. Specifically, we are going to take a look at the following classes:

  • Other classes to represent dates/times ZonedDateTime and OffsetDateTime
  • Getting the current snapshot in time with Instant
  • Using the Clock class to get system time but specify different time zones
  • Represent arbitrary number of days with the Period class
  • Represent arbitrary amount of hours with the Duration class