Syntax coloring

Mostrando postagens com marcador java. Mostrar todas as postagens
Mostrando postagens com marcador java. Mostrar todas as postagens

sexta-feira, 27 de março de 2015

Java / GWT development with tmpfs (or, why is my development slower because I've switched to Arch Linux)

I've recently switched from Ubuntu to Arch Linux (Antergos, really, as it provides a decent installer and nice defaults, and is just Arch in the end).
Arch uses systemd (the controversial init system that has caused lots of debates, but I have personally loved it), and since the upcoming 15.04 version, Ubuntu will use it too.
And systemd by default mounts the /tmp directory files using a tmpfs, which is mounted on RAM, limited by default to half the physical available memory. Also, in my installation (don't know if this is Arch's or Antergos' default), the /etc/fstab file also had the /tmp to be mounted as tmpfs.
This might be nice for most people, but if you run applications that store huge amounts of data in /tmp, it can be terrible.
In my case it is GWT which writes hundreds of megabytes to the temporary folder.
Result? When GWT is compiling Java to JavaScript code, things get SLOOOOOW, because:
  • Eclipse uses1.0-1.5GB of RAM
  • The GWT SuperDevMode, together with the running application, takes another 1.0-1.5GB
  • Chrome uses 1.0-2.0GB of RAM with several open tabs (and when dev tools is open, it eats a lot of RAM too)
Counting up to 4GB (max) of tmpfs, and the desktop environment (Cinnamon in my case), Skype, DropBox and others, my 8GB quickly get short.

At least I found that I can improve things by disabling mounting /tmp as tmpfs.
To do that I've resorted, as usual, to the excellent Arch Wiki (by the way, one of the best pieces of documentation for any project I've ever seen): https://wiki.archlinux.org/index.php/Tmpfs.

I have removed the line in /etc/fstab which declares /tmp as tmpfs and created the /etc/tmpfiles.d/tmp.conf file with the following content:

# see tmpfiles.d(5)
# always enable /tmp folder cleaning
D! /tmp 1777 root root 0

# remove files in /var/tmp older than 10 days
D /var/tmp 1777 root root 10d

# namespace mountpoints (PrivateTmp=yes) are excluded from removal
x /tmp/systemd-private-*
x /var/tmp/systemd-private-*
X /tmp/systemd-private-*/tmp
X /var/tmp/systemd-private-*/tmp

Then we need to tell systemd to not mount /tmp as tmpfs automatically, with the following command:
systemctl mask tmp.mount

Afterwards, just rebooted the system and.... magic! I can nicely work with GWT compilation again.

As Ubuntu 15.04 will switch to systemd, and systemd by default mounts /tmp as tmpfs even without anything defined in /etc/fstab, this might affect Ubuntu too in the future. The same is true to other major distributions.

quarta-feira, 12 de novembro de 2014

Generating DDL with EclipseLink JPA and PostgreSQL

I normally say that either the project I work on (http://www.cyclos.org) is too special or we're just unlucky with the default operation in most libraries we use.
As we need streaming BLOBs (we don't want to load entire images into memory), and EclipseLink by default doesn't handle streaming.

So I had to do a subclass of org.eclipse.persistence.platform.database.PostgreSQLPlatform. The following methods were implemented:

    @Override
    public Object getObjectFromResultSet(ResultSet resultSet, int columnNumber, int type, AbstractSession session) throws SQLException {
        String name;
        if (type == Types.BIGINT) {
            // May be a number or an OID
            name = resultSet.getMetaData().getColumnTypeName(columnNumber);
            if ("OID".equalsIgnoreCase(name)) {
                return resultSet.getBlob(columnNumber);
            }
        }
        return super.getObjectFromResultSet(resultSet, columnNumber, type, session);
    }

    @Override
    public void setParameterValueInDatabaseCall(Object parameter, PreparedStatement statement, int index, AbstractSession session) throws SQLException {
        if (parameter instanceof DatabaseField) {
            DatabaseField field = (DatabaseField) parameter;
            if (Blob.class.equals(field.getType())) {
                statement.setBlob(index, (Blob) null);
            } else {
                super.setParameterValueInDatabaseCall(parameter, statement, index, session);
            }
        } else if (parameter instanceof Blob) {
            statement.setBlob(index, ((Blob) parameter));
        } else {
            super.setParameterValueInDatabaseCall(parameter, statement, index, session);
        }
    }

    @Override
    public boolean shouldUseCustomModifyForCall(DatabaseField field) {
        if (Blob.class.equals(field.getType())) {
            return true;
        }
        return super.shouldUseCustomModifyForCall(field);
    }

    @Override
    @SuppressWarnings({ "rawtypes", "unchecked" })
    protected Hashtable buildFieldTypes() {
        Hashtable types = super.buildFieldTypes();
        types.put(Blob.class, new FieldTypeDefinition("OID", false));
        return types;
    }

This way we can control: small binary data is mapped in entities via byte[]. Large binary data, via java.sql.Blob.

Then, to generate the schema:

    EntityManagerFactoryImpl emf = (EntityManagerFactoryImpl) realEMF;
    DatabaseSessionImpl databaseSession = emf.getDatabaseSession();

    StringWriter sw = new StringWriter();
    SchemaManager schemaManager = new SchemaManager(databaseSession);
    schemaManager.outputDDLToWriter(sw);

    DefaultTableGenerator tableGenerator = new DefaultTableGenerator(databaseSession.getProject()) {
        @Override
        protected void resetFieldTypeForLOB(DirectToFieldMapping mapping) {
            // Hack to avoid the workaround for oracle 4k thin driver bug
        }
    };
    TableCreator tableCreator = tableGenerator.generateDefaultTableCreator();
    tableCreator.createTables(databaseSession, schemaManager);

    String script = sw.toString();

That DefaultTableGenerator inner class took me some hours debugging EclipseLink to figure out. The method comments says it is there to fix issues with oracle 4k thin driver. And it messed up the other use cases, as Blob was being handled as Byte[], and we want OID type specifically for Blobs.

Congratulations, Oracle! (facepalm)

sábado, 2 de fevereiro de 2013

The beauty of Querydsl: calling database functions

It's a common pattern to have a self-referencing table, in order to model an hierarchy tree. Now, sorting results by hierarchy, is an entirely different subject. Some databases, like Oracle, has the start with / connect by clauses. But to bring that to the JPA world is another different story. And imagine that with the über ugly JPA 2 criteria queries. I was already using Querydsl with JPA (using EclipseLink), and it allowed a very clean solution.

First, I brought database functions to the rescue, and wrote a function which receives an entity id, a table name, the name column and the parent id column. So, the function can be reused on any type of entity. As the DB is Postgres, here comes the code:

create or replace function name_hierarchy
    (p_id bigint, 
     p_table varchar, 
     p_name_col varchar, 
     p_parent_id_col varchar)
    returns varchar
    as $$
        declare
            sql text;
            current_id bigint;
            v_name varchar;
            v_parent_id bigint;
            path varchar[];
        begin
            current_id := p_id;
            while current_id is not null loop
                sql := 'select id, ' 
                     || p_name_col || ', ' 
                     || p_parent_id_col
                     || ' from ' || p_table 
                     || ' where id = ' || current_id;
                execute sql into current_id, v_name, v_parent_id;
                path := array_prepend(v_name, path);
                current_id := v_parent_id;
            end loop;
            return array_to_string(path, ' > ');
        end;
    $$ language plpgsql
    stable;

Then, I needed to create a Querydsl operator to represent the function. It contains a name and the argument types.

public class CustomOperators {
    public static final Operator<String>
        NAME_HIERARCHY = new OperatorImpl<String>(
            "name_hierarchy", 
            Long.class, String.class, String.class, String.class);
}

There was also a custom Querydsl templates class to actually convert that operator into JPQL code (note that EclipseLink uses FUNC('name', args...) to invoke native database functions):

    public static class CustomTemplates
        extends EclipseLinkTemplates {

        private static final CustomTemplates INSTANCE = 
            new CustomTemplates();

        public static CustomTemplates getInstance() {
            return INSTANCE;
        }

        private CustomTemplates() {
            add(CustomOperators.NAME_HIERARCHY, 
                "FUNC('name_hierarchy', {0}, {1}, {2}, {3})");
        }
    }

As the example entity is Configuration, the final plumbing needed is a method annotated with @QueryDelegate(Configuration.class), so the extension method can be created in any class:

    @QueryDelegate(Configuration.class)
    public static StringExpression 
        nameHierarchy(EntityPath<Configuration> configuration) {
        NumberPath<Long> id = (NumberPath<Long>)
            FieldUtils.readField(configuration, "id");
        return StringOperation.create(
            CustomOperators.NAME_HIERARCHY, 
            id,
            StringTemplate.create("'configurations'"),
            StringTemplate.create("'name'"),
            StringTemplate.create("'parent_id'"));
    }

Finally, I can use the nameHierarchy method anywhere on queries, like:

QConfiguration c = QConfiguration.configuration;
List<Configuration> configs = new JPAQuery()
    .from(c)
    .orderBy(c.nameHierarchy.asc())
    .list(c);

Looks like that nameHierarchy method was always there, doesn't it? And the same idea can be reused on any other function, seamlessly blending them on the query metamodel. Now try to make something similar with JPA's criteria api!

domingo, 20 de janeiro de 2013

Replaced Hibernate as JPA provider... To never look back!!!

Hibernate is probably the most well-known ORM tool for Java. I first used it on version 1.X back on 2002. It even influenced the JPA (Java Persistence API), which is a standard ORM API.
The problem is: the application (has about 200 entities) was taking up +- 350MB of heap size on startup right after forcing a garbage collect (using jvisualvm).
That was too much. But things would improve. There was a setting which I've always mislooked as a batch size equivalent, called hibernate.default_batch_fetch_size, which we had with value 20.
After some investigation, I found it was used to load several records at once, at the expense of memory. So, just to test out, I changed it to 1 and, surprise... The same application was now taking up +- 150MB! What a change for something misunderstood!
But I was not satisfied, and decided to try another JPA provider. From some researches, I decided to go with EclipseLink. Result? The same application now starts up (after a garbage collection) with +- 35 MB!!!
Ok, what a huge difference! But performance should be worst, shouldn't it? No!!! On the load tests I did, EclipseLink was actually 2.5x faster than Hibernate!
There were some bumps, several queries were done with some non-standard (from JPA's point of view) elements, and so on. But they all could be resolved, one at a time.
Conclusion: After being a loyal Hibernate user for several years (well, not that loyal, as some projects I did with plain JDBC using Querydsl SQL module), I'll now try to avoid it as much as possible, and use EclipseLink instead. Even a future possibility is Batoo JPA, which claims to be 15-20x faster than Hibernate. However, as it cannot be used with Spring's LocalContainerEntityManagerFactoryBean (at least for now, as it requires a persistence.xml, and I like bootstrapping things programmatically), I'll stick with EclipseLink for now.

domingo, 6 de fevereiro de 2011

Using Querydsl SQL to handle persistence in Java programs

After several years working with Hibernate (since version 1.X - about 2001/2002) and then JPA, I'm quite convinced that for new projects I'd try a new approach: Querydsl SQL. Why? Well, Even though full ORM solutions like Hibernate have several advantages (managing relationships, an easier query language and so on), they also have their drawbacks. I found out that:
  • What I really wanted is an easier way to manipulate databases / resultsets;
  • It always selects all attributes when dealing with entities. I know you CAN select individual attributes, but this is more an exception than a rule. People tend to just read the entire record, and then accessing the needed attributes. Some argue that this is not something which impacts performance, but after some fine tuning on my current project, I realized that every gain matters;
  • Pure OO in data manipulation is nice, but the impedance mismatch just can't be negleted. It will bite you sooner or later;
  • You always end up with a few cases where native query is needed, or the performance is just not acceptable. I think that programs are coded by developers, but those who really needed to be pleased are the end users. And poor performance just produces bad mood on users;
  • Even though JPA 2 has most of the features Hibernate has, it brings a problem: Just like most (all?) JCP specifications, it always has points left out of the specification. So, having a (relatively complex) system working with a JPA provider (say, Hibernate) and migrating it to another one (EclipseLink, OpenJPA, ...) is not failproof. This just leads to frustration...

Ok, I know no framework / library / technology is perfect, but I think Querydsl SQL is quite promising. Here are a few points:
  • You have full power of native queries, with type-safe queries. Java classes are generated based on the database tables, so you have the full power of IDE's (autocomplete, finding references, code analysis...). This is a boost on productivity;
  • Queries can return several types of data, like iterators, lists, maps or single objects. The projection type can be beans, arrays, tuples or custom expressions. Querydsl is very easy to extended;
  • It can also handle data manipulation (inserts, updates and deletes). This kind of removes all cases one would need to touch the connection;
  • Besides to generating the Q-types (Java classes representing the database tables), it is also possible to generate beans (DTOs) for the tables. This is nice for cases where you want all columns of the table, but optional. Using them can boost the productivity, as avoids having to create each bean by hand.

So, enough talking! Let's take a look on some code. The example here is of a simple blog: We have users, which can create posts and commenting existing posts. So, here is the DDL for MySQL:
drop table if exists comment;
drop table if exists post;
drop table if exists user;

create table user (
    id bigint not null,
    name varchar(100) not null,
    username varchar(20) not null,
    password varchar(20) not null,
    primary key (id)
) engine innodb;

create table post (
    id bigint not null,
    user_id bigint not null,
    title varchar(250) not null,
    date datetime not null,
    contents text not null,
    primary key (id),
    constraint fk_post_user foreign key (user_id) references user(id)
) engine innodb;

create table comment (
    id bigint not null,
    user_id bigint not null,
    post_id bigint not null,
    date datetime not null,
    comments text not null,
    primary key (id),
    constraint fk_comment_user foreign key (user_id) references user(id),
    constraint fk_comment_post foreign key (post_id) references post(id)
) engine innodb;

So, we need to invoke Querydsl to read the database tables and generate the Java classes. Beans will be generated as well:
Configuration configuration = new Configuration(new MySQLTemplates());
NamingStrategy namingStrategy = new DefaultNamingStrategy();
MetaDataExporter exporter = new MetaDataExporter();
exporter.setConfiguration(configuration);
exporter.setNamePrefix("Q");
exporter.setTargetFolder(new File("generated"));
exporter.setSerializer(new MetaDataSerializer("Q", namingStrategy));
exporter.setBeanSerializer(new BeanSerializer());
exporter.setNamingStrategy(namingStrategy);
exporter.setPackageName("demo.blog");
        
Connection connection = ... //Get connection
exporter.export(connection.getMetaData());

If you are in Eclipse, just refresh the project and add the generated folder as source folder. There you will find the QUser, QPost and QComment classes, as well as the beans: User, Post and Comment.

Before showing some data manipulation code, here are some methods used by the examples (the configuration can be created the same way as in the example above):
SQLDeleteClause delete(RelationalPath path) {
    return new SQLDeleteClause(
        getConnection(), getConfiguration(), path);
}

SQLQuery from(Expression from) {
    SQLQueryImpl query = new SQLQueryImpl(
        getConnection(), getConfiguration());
    query.from(from);
    return query;
}

SQLInsertClause insert(RelationalPath path) {
    return new SQLInsertClause(
        getConnection(), getConfiguration(), path);
}

SQLUpdateClause update(RelationalPath path) {
    return new SQLUpdateClause(
        getConnection(), getConfiguration(), path);
}

So, here are some examples for manipulating data:
QUser user = QUser.user; //Generated Q-type

// Create an user
User john = new User();
john.setName("John Smith");
john.setUsername("jsmith");
john.setPassword("john_secret");
Long johnId = insert(user)
    .populate(john)
    .executeWithKey(user.id);
john.setId(johnId);

// Create a post
QPost post = QPost.post;
Post newPost = new Post();
newPost.setDate(new Date());
newPost.setUserId(john.getId());
newPost.setTitle("A very interesting Java post!");
newPost.setContents("For more posts, visit http://freeit.inf.br");
Long postId = insert(post)
    .popupate(newPost)
    .executeWithKey(post.id);
newPost.setId(postId);

// Without using generated beans
Long maryId = 10L;
QComment comment = QComment.comment;
insert(comment)
    .set(comment.date, new Date())
    .set(comment.postId, post.getId())
    .set(comment.userId, maryId)
    .set(comment.comments, "Love your post... Keep on!")
    .execute();

// Then, john decides to edit the post title
update(post)
    .set(post.title, "Using Querydsl...")
    .set(post.contents, post.contents.concat("\\n\\n[updated]"))
    .where(post.id.eq(post.getId()))
    .execute();

// And Mary removes all her comments on all posts!
delete(comment)
    .where(comment.userId.eq(maryId)
    .execute();

Enough DML examples. Let's perform some queries (using the same user, post and comment variables from above):
//Listing comments using the generated bean
List<Comment> postComments = 
    from(comment)
    .where(comment.postId.eq(postId))
    .list(comment);

//Iterating through all users with comments
CloseableIterator<User> usersWithComments = 
    from(user)
    .rightJoin(comment.commentUserFk, user)
    .where(comment.id.isNotNull())
    .iterateDistinct(user);

On the last example, it's possible to see that even the foreign keys are imported into the model, and can be used on joins. You can also use subqueries, factory expressions to invoke custom SQL functions and so on. Visit www.querydsl.com for documentation and downloads.

So, here is my tip. If you are looking for an alternative in data access in Java, give Querydsl SQL a try.

sábado, 8 de janeiro de 2011

My [terrible] experiences with EJB

In the past (around 2004), I had worked with EJB 2. I hated it. Too much xml, too much complexity. Home / remote / local interfaces... So, at the time, as a workaround, I built a framework which implemented the command pattern, having a single EJB deployed, and passing the command and parameters for it. Not good.
Then, in my current project, we started with EJB 3 (in late-2008). Mostly because it's standard. It is surely much easier, with annotations and such. However, the project has security requirements way beyond the standard JAAS can handle. Besides roles (the only concept handled by JAAS in EJBs), we have permission sets, which can be applied to either groups or individual users, and they can be dynamically changed by the application admins. So we had to create some sort of custom mechanism to check permissions. However, we still need JAAS to propagate the user identity (we use remote EJB interfaces).
Ok, but why am I so disappointed with EJBs? Here are some points:

  • To propagate the caller identity we need JAAS. It is insufficient to the application I'm working on (and would be too for some others which I had already worked, so, either I'm too unlucky or the standard is weak).
  • Again on JAAS: For the application side, it's standard. However, for every application server out there, there is a distinct way to configure it. Ok, for applications which find users and plain or hashed passwords in a DB table, probably there is an easy way to configure some sort of login module with an SQL query. However, in our application, the credentials are dynamic as well, depends on the application configuration and on the application channel being accessed, and, as logic to validate all those is on the application, I'd like to use the application itself to validate users. However, in some containers, it's quite complicated to invoke the application to validate users. Go figure... 
  • Once again on JAAS: Because the JAAS configuration is specific for each application server, it's virtually impossible to just deploy an application in more than one application server without headache. So, if you code your application in an standard way, and cannot reliably deploy that same application on distinct standard-compliant application servers, then that standard is void and defective by design. In the other hand, a self-contained web application can be deployed on ANY web container (as long as you don't use that little friend, JAAS).
  • Application servers take a lot of time to startup and deploy the application. Even though they are faster than some years ago, they are still slow. Compare the startup time with a simple web application running on a Tomcat or Jetty - just (very) few seconds.
  • JPA has it's share of guilty in the slow startup times. For the current project, Hibernate alone takes about 30-40 seconds to map every entity (about 200 tables).
  • The runtime performance for EJBs is also likely to be slower than a regular web applications. You can cluster, I know. But there are way too much proxies, interceptors, lookups, injections... I can't prove with numbers. But common sense tells me that.
  • The standard API for type-safe queries in JPA (criteria API) is an abomination. But I had already discussed that, and solved it by using Querydsl.

Conclusions? For future projects I'll try to avoid EJBs as much as possible. Spring is likely to always be in the game for me, as it's a fantastic piece of software. Also, after so many years working with Hibernate (since 1.2.x) and later JPA, I'm pretty sure I would likely choose Querydsl SQL mode, which has no ORM, but has type-safe queries and DMLs (insers / updates / deletes). Also, it's metamodel is generated at compilation time, it has almost zero overhead on the application initialization (Hibernate as already stated, for us, takes 30-40 seconds just to initialize the persistence).
That is the beauty (but also for me, as a software architect, a frustration) of Java development. You have thousands of frameworks and libraries to choose from, and when most non-trivial projects are finished and set to production, they already use legacy technologies.

segunda-feira, 3 de março de 2008

Linguagens de programação - De programador Java para programador Java

Aprendi programação em Visual Basic 3 (eca!), lá por 1995. De lá passei pro ASP (bah) em 2000, pro Java em 2001 e aí estou até hoje.
Mas desde o ano passado, venho estudando linguagens dinâmicas, como o Ruby e o Groovy. Já conheço há anos (e acho bem legal) o JavaScript, uma linguagem tão odiada entre os programadores para a web (o que acho uma injustiça). Mas cito Ruby e Groovy porque são linguagens que estão bem faladas hoje em dia, e têm uma infraestrutura bem completa.
Primeiro foi o Ruby. A linguagem é nota 10. É possível fazer coisas do tipo:

class String
def bold
"#{self}"
end
end
Resultado: todas as Strings possuem um método bold, e se for invocado "teste".bold retornará teste. Para horror dos programadores Java mais conservadores, foi reaberta a definição de uma classe existente (e importante como a String), adicionado um método que nem tem um comando return e ainda por cima invocado sem usar parênteses!
E o que dizer de código tipo:
[1, 2, 3].collect {|x| x + 2}

O resultado? Um array contendo [3, 4, 5]. Assim também vários outros exemplos interessantes.
O Ruby tem sido muito impulsionado ultimamente pelo framework Ruby on Rails, que mudou drasticamente o conceito de framework web (Struts, Spring MVC...).
Apesar de muito interessante, a linguagem é bem distante do Java, e quem investiu pesado em formação e tecnologia Java (quem dirá servidores de aplicação) não vai querer jogar tudo fora...
Claro que existe o JRuby, que é uma implementação do Ruby para Java que é muito boa e permite fácil integração com classes Java, mas o "gap" ainda é grande.
Então conheci o Groovy, que tenta ser o mais próximo possível do Ruby, mantendo uma sintaxe bem semelhante à do Java. E tem também o framework Grails, que tenta trazer para o Groovy as facilidades do Ruby on Rails.
E o Groovy, junto com o Grails, é a minha aposta para o framework web. Certamente minha escolha para o próximo projeto Java. Ele "nasceu" integrado ao Spring e ao Hibernate, e pode ser instalado (algum termo melhor para deployed?) em qualquer servidor web Java.
Vale a pena dar uma olhada.
Claro que muitos ainda resistem à idéia de linguagens dinâmicas, pois se "perde o controle" que a tipagem estática oferece (apesar do Groovy permitir usar tipagem estática) e que "efeitos colaterais" de adicionar métodos dinâmicos em classes. Mas assim foi quando se passou do assembly para o C. Assim foi quando se passou do C para o Java. E assim está sendo para tornar o próprio Java obsoleto. Muito se fala hoje que o Java como linguagem está condenado. Mas a JVM é um ambiente sólido, performático e confiável, e esse sim, vai durar muito tempo.