Syntax coloring

sábado, 2 de fevereiro de 2013

The beauty of Querydsl: calling database functions

It's a common pattern to have a self-referencing table, in order to model an hierarchy tree. Now, sorting results by hierarchy, is an entirely different subject. Some databases, like Oracle, has the start with / connect by clauses. But to bring that to the JPA world is another different story. And imagine that with the über ugly JPA 2 criteria queries. I was already using Querydsl with JPA (using EclipseLink), and it allowed a very clean solution.

First, I brought database functions to the rescue, and wrote a function which receives an entity id, a table name, the name column and the parent id column. So, the function can be reused on any type of entity. As the DB is Postgres, here comes the code:

create or replace function name_hierarchy
    (p_id bigint, 
     p_table varchar, 
     p_name_col varchar, 
     p_parent_id_col varchar)
    returns varchar
    as $$
        declare
            sql text;
            current_id bigint;
            v_name varchar;
            v_parent_id bigint;
            path varchar[];
        begin
            current_id := p_id;
            while current_id is not null loop
                sql := 'select id, ' 
                     || p_name_col || ', ' 
                     || p_parent_id_col
                     || ' from ' || p_table 
                     || ' where id = ' || current_id;
                execute sql into current_id, v_name, v_parent_id;
                path := array_prepend(v_name, path);
                current_id := v_parent_id;
            end loop;
            return array_to_string(path, ' > ');
        end;
    $$ language plpgsql
    stable;

Then, I needed to create a Querydsl operator to represent the function. It contains a name and the argument types.

public class CustomOperators {
    public static final Operator<String>
        NAME_HIERARCHY = new OperatorImpl<String>(
            "name_hierarchy", 
            Long.class, String.class, String.class, String.class);
}

There was also a custom Querydsl templates class to actually convert that operator into JPQL code (note that EclipseLink uses FUNC('name', args...) to invoke native database functions):

    public static class CustomTemplates
        extends EclipseLinkTemplates {

        private static final CustomTemplates INSTANCE = 
            new CustomTemplates();

        public static CustomTemplates getInstance() {
            return INSTANCE;
        }

        private CustomTemplates() {
            add(CustomOperators.NAME_HIERARCHY, 
                "FUNC('name_hierarchy', {0}, {1}, {2}, {3})");
        }
    }

As the example entity is Configuration, the final plumbing needed is a method annotated with @QueryDelegate(Configuration.class), so the extension method can be created in any class:

    @QueryDelegate(Configuration.class)
    public static StringExpression 
        nameHierarchy(EntityPath<Configuration> configuration) {
        NumberPath<Long> id = (NumberPath<Long>)
            FieldUtils.readField(configuration, "id");
        return StringOperation.create(
            CustomOperators.NAME_HIERARCHY, 
            id,
            StringTemplate.create("'configurations'"),
            StringTemplate.create("'name'"),
            StringTemplate.create("'parent_id'"));
    }

Finally, I can use the nameHierarchy method anywhere on queries, like:

QConfiguration c = QConfiguration.configuration;
List<Configuration> configs = new JPAQuery()
    .from(c)
    .orderBy(c.nameHierarchy.asc())
    .list(c);

Looks like that nameHierarchy method was always there, doesn't it? And the same idea can be reused on any other function, seamlessly blending them on the query metamodel. Now try to make something similar with JPA's criteria api!

domingo, 20 de janeiro de 2013

Replaced Hibernate as JPA provider... To never look back!!!

Hibernate is probably the most well-known ORM tool for Java. I first used it on version 1.X back on 2002. It even influenced the JPA (Java Persistence API), which is a standard ORM API.
The problem is: the application (has about 200 entities) was taking up +- 350MB of heap size on startup right after forcing a garbage collect (using jvisualvm).
That was too much. But things would improve. There was a setting which I've always mislooked as a batch size equivalent, called hibernate.default_batch_fetch_size, which we had with value 20.
After some investigation, I found it was used to load several records at once, at the expense of memory. So, just to test out, I changed it to 1 and, surprise... The same application was now taking up +- 150MB! What a change for something misunderstood!
But I was not satisfied, and decided to try another JPA provider. From some researches, I decided to go with EclipseLink. Result? The same application now starts up (after a garbage collection) with +- 35 MB!!!
Ok, what a huge difference! But performance should be worst, shouldn't it? No!!! On the load tests I did, EclipseLink was actually 2.5x faster than Hibernate!
There were some bumps, several queries were done with some non-standard (from JPA's point of view) elements, and so on. But they all could be resolved, one at a time.
Conclusion: After being a loyal Hibernate user for several years (well, not that loyal, as some projects I did with plain JDBC using Querydsl SQL module), I'll now try to avoid it as much as possible, and use EclipseLink instead. Even a future possibility is Batoo JPA, which claims to be 15-20x faster than Hibernate. However, as it cannot be used with Spring's LocalContainerEntityManagerFactoryBean (at least for now, as it requires a persistence.xml, and I like bootstrapping things programmatically), I'll stick with EclipseLink for now.