Fetching all the data in JPA?

I’m really not much into blogging recently, but I couldn’t not share this)

ORM is cool. No, it’s really cool! In Java EE that’s the standard for data persisting. It is so cool that by using it people start to forget that there is a database underneath the wonderful and extremely beautiful object model that gracefully expresses the business needs of the app. People don’t have to think about all of this chemistry happening under the curtains. Yes, in RDBMs the data lives in a totally different world, there are absolutely different rules, and the mathematics of it is absolutely different, but who cares! Yes, it is so much automatic! You don’t have to care about the low level ResultSets, mapping the data, their relations, etc etc… And the modern ORMs are so much smart that the data is loaded lazily and only when its need, making the app so much optimized.

And the app works perfectly while it’s been developed. Even when its accepted it works beautiful and as required.

But suddenly something goes wrong. Unexpectedly there are more than one users connected to the app. How could this happen? And the server treacherously crashes with some OutOfMemory exceptions! OMG!

No problem! That can be easily fixed – just by adding some more RAM. For several week its working wonderful. But then the OutOfMemory exception is back! No, that can’t be possible!

That’s the bloody Java EE! It’s very heavy and bad! Should have used Spring instead! Just because it’s cool! Ah.. the app should be rewritten! And of course there is no budget for that!

But still this production issue has to be fixed. And some more RAM is being added jut to keep it alive. Several iterations like that lead back to the same situation. Then, to keep the app running, the cleverest decision is made: a limit of users per time is put on the app. And yay! The scalability problem is solved! Solved, but not exactly, after another period of time the app crashes with the same exception even with one user! 

Common story? I believe that yes. But it’s not much popular to talk about issues like that, since we are all cool programmers, and never make such mistakes.

This story happened to me. But luckily I was not the one to code this app. On my freelance period I was hired as a consultant to solve this issue on a live relatively (not so much) legacy system.

Imagine you have quite an ordinary Java EE webapp, with JSF in front, some business logic expressed with mostly stateless beans and hibernate underneath. Nothing special.

The user logs in and navigates to a certain page which should display a table with some information. One of the columns has links to some downloadable items.

So I’ve logged in, clicked on the required link… and got the error. It was so easy to reproduce the bug. So the first action was to see the Heap Dump with MAT.. and the results were terrifying! A list with several thousand objects having a string field with a several megs of XML inside!!! That’s a WOW!

The JSF itself is designed in the simplest way as well, all the several thousands of items are displayed on one page and the user has to be able to download the XML directly from the object from the table. And you understand it yourself: that’s so wrong! With 10 test items during the development process this approach worked perfectly, and it’s hard to see what it leads to. Yep, for a developer and XML is just a string, which can be a member of an object. What can go wrong…

So imagine the following approximation:


Two very easy classes.


And Child:

There is a bidirectional OneToMany relation between them:

The field names are self-explanatory. Everything which is important, very important and shown has to be visualized in front to the user in the JSF. The other stuff – not on this page.

Looks ok for now… but have you spotted the problem?

Of course its the hugeXMLPayload that should always be loaded, although never displayed…

But, haha, this is so easy to fix! With just few annotations like:

At some point you are right. This should work if the code instrumentation/enhancement is supported and enabled in the JPA persistence provider.

But let’s rethink the way CLOBs and BLOBs are stored. Should this be directly in an object field which is then mapped to some table column? I don’t think that’s a good idea. Big data should be accessed rarely and the access should be really targeted. You may say: but, I have annotated them, and according to StackOverflow the code should be instrumented and, all this data should be loaded lazily. But just annotating them the correct way will not help in Hibernate versions lower then 5, as this code instrumentation was not properly ready yet. As seen, that is just a usual String getter method on an object field, this is not even a relationship which could be expressed with a join underneath.

This didn’t work for us.

So the first quick fix that was applied is naturally dynamic pagination in the JSF table. Thus in memory we had only a small subset of the data. Still loading XML CLOBS in memory every time the page is loaded is a little bit stupid.

What may be the other option effectively NOT to load the entire data? And without changing the data model.. and on a live system? It is not possible to create another “Attachment” object and make a OneToOne relationship to the Child to keep it separated.

So how to make it the most natural and portable way?

The answer I have found in the book “High-Performance Java Persistence” by Vlad Mihalcea. I believe this is an absolutely “A Must Read Book” for every full stack developer.

So the answer to our problem was unexpectedly simple. The first idea was to create a projection and to fetch only the desired fields. But the results of those fetches are arrays with objects. And that’s a little bit raw level approach. There should be something more civilized.

And the civilized approach is called subentities. In JPA it is absolutely legal to map different objects to the same table in the DB. These objects may contain a subset of the data of the original object. And JPA/Hibernate will always fetch only the fields relevant to the object. So we have created a parallel read-only structure with subentities and made just one additional service to serve our JSF.

ParentEntity transformed to ParentEntitySummary:

And the ChildEntity transformed to ChildEntitySummary:

We have just thrown away all unnecessary and left what’s important! So good! If the JPA provider is Hibernate we could add an @Immutable annotation to declare our objects as read-only.

The magic here is in the @Table annotation. Its easy to spot that two classes are actually pointing to the same table in the DB.

For Parent it is:

For Child it is:

And we can absolutely legally to work with them as with all other entities:

So much awesome! And it worked!

This of course is not a silver bullet. I would rather call it a good patching.

So if it is impossible to redesign the DB scheme and have to work with live data, this approach may be a nice escape, especially on a read-only data.

As a result, from 8 GB RAM consumption  we went only to 89 MB on a JBoss machine (server memory footprint included). And its obvious that Java EE is very lightweight and quick!

So the idea of this blog post is to remind that developer is not the only user of the system. And “works on my machine” doesn’t mean it production ready!

And ones again the way the RDBM represents data is different from the way objects do. JPA/Hibernate is awesome tool. It saves great efforts! But its is necessary to make correct decisions on how the data is stored!

Have fun with JPA!