Fetching all the data in JPA?

I’m really not much into blogging recently, but I couldn’t not share this)

ORM is cool. No, it’s really cool! In Java EE that’s the standard for data persisting. It is so cool that by using it people start to forget that there is a database underneath the wonderful and extremely beautiful object model that gracefully expresses the business needs of the app. People don’t have to think about all of this chemistry happening under the curtains. Yes, in RDBMs the data lives in a totally different world, there are absolutely different rules, and the mathematics of it is absolutely different, but who cares! Yes, it is so much automatic! You don’t have to care about the low level ResultSets, mapping the data, their relations, etc etc… And the modern ORMs are so much smart that the data is loaded lazily and only when its need, making the app so much optimized.

And the app works perfectly while it’s been developed. Even when its accepted it works beautiful and as required.

But suddenly something goes wrong. Unexpectedly there are more than one users connected to the app. How could this happen? And the server treacherously crashes with some OutOfMemory exceptions! OMG!

No problem! That can be easily fixed – just by adding some more RAM. For several week its working wonderful. But then the OutOfMemory exception is back! No, that can’t be possible!

That’s the bloody Java EE! It’s very heavy and bad! Should have used Spring instead! Just because it’s cool! Ah.. the app should be rewritten! And of course there is no budget for that!

But still this production issue has to be fixed. And some more RAM is being added jut to keep it alive. Several iterations like that lead back to the same situation. Then, to keep the app running, the cleverest decision is made: a limit of users per time is put on the app. And yay! The scalability problem is solved! Solved, but not exactly, after another period of time the app crashes with the same exception even with one user! 

Common story? I believe that yes. But it’s not much popular to talk about issues like that, since we are all cool programmers, and never make such mistakes.

This story happened to me. But luckily I was not the one to code this app. On my freelance period I was hired as a consultant to solve this issue on a live relatively (not so much) legacy system.

Imagine you have quite an ordinary Java EE webapp, with JSF in front, some business logic expressed with mostly stateless beans and hibernate underneath. Nothing special.

The user logs in and navigates to a certain page which should display a table with some information. One of the columns has links to some downloadable items.

So I’ve logged in, clicked on the required link… and got the error. It was so easy to reproduce the bug. So the first action was to see the Heap Dump with MAT.. and the results were terrifying! A list with several thousand objects having a string field with a several megs of XML inside!!! That’s a WOW!

The JSF itself is designed in the simplest way as well, all the several thousands of items are displayed on one page and the user has to be able to download the XML directly from the object from the table. And you understand it yourself: that’s so wrong! With 10 test items during the development process this approach worked perfectly, and it’s hard to see what it leads to. Yep, for a developer and XML is just a string, which can be a member of an object. What can go wrong…

So imagine the following approximation:

1

Two very easy classes.

Parent:

And Child:

There is a bidirectional OneToMany relation between them:
2

The field names are self-explanatory. Everything which is important, very important and shown has to be visualized in front to the user in the JSF. The other stuff – not on this page.

Looks ok for now… but have you spotted the problem?

Of course its the hugeXMLPayload that should always be loaded, although never displayed…

But, haha, this is so easy to fix! With just few annotations like:

At some point you are right. This should work if the code instrumentation/enhancement is supported and enabled in the JPA persistence provider.

But let’s rethink the way CLOBs and BLOBs are stored. Should this be directly in an object field which is then mapped to some table column? I don’t think that’s a good idea. Big data should be accessed rarely and the access should be really targeted. You may say: but, I have annotated them, and according to StackOverflow the code should be instrumented and, all this data should be loaded lazily. But just annotating them the correct way will not help in Hibernate versions lower then 5, as this code instrumentation was not properly ready yet. As seen, that is just a usual String getter method on an object field, this is not even a relationship which could be expressed with a join underneath.

This didn’t work for us.

So the first quick fix that was applied is naturally dynamic pagination in the JSF table. Thus in memory we had only a small subset of the data. Still loading XML CLOBS in memory every time the page is loaded is a little bit stupid.

What may be the other option effectively NOT to load the entire data? And without changing the data model.. and on a live system? It is not possible to create another “Attachment” object and make a OneToOne relationship to the Child to keep it separated.

So how to make it the most natural and portable way?

The answer I have found in the book “High-Performance Java Persistence” by Vlad Mihalcea. I believe this is an absolutely “A Must Read Book” for every full stack developer.

So the answer to our problem was unexpectedly simple. The first idea was to create a projection and to fetch only the desired fields. But the results of those fetches are arrays with objects. And that’s a little bit raw level approach. There should be something more civilized.

And the civilized approach is called subentities. In JPA it is absolutely legal to map different objects to the same table in the DB. These objects may contain a subset of the data of the original object. And JPA/Hibernate will always fetch only the fields relevant to the object. So we have created a parallel read-only structure with subentities and made just one additional service to serve our JSF.

ParentEntity transformed to ParentEntitySummary:

And the ChildEntity transformed to ChildEntitySummary:

We have just thrown away all unnecessary and left what’s important! So good! If the JPA provider is Hibernate we could add an @Immutable annotation to declare our objects as read-only.

The magic here is in the @Table annotation. Its easy to spot that two classes are actually pointing to the same table in the DB.

For Parent it is:

For Child it is:

And we can absolutely legally to work with them as with all other entities:

So much awesome! And it worked!

This of course is not a silver bullet. I would rather call it a good patching.

So if it is impossible to redesign the DB scheme and have to work with live data, this approach may be a nice escape, especially on a read-only data.

As a result, from 8 GB RAM consumption  we went only to 89 MB on a JBoss machine (server memory footprint included). And its obvious that Java EE is very lightweight and quick!

So the idea of this blog post is to remind that developer is not the only user of the system. And “works on my machine” doesn’t mean it production ready!

And ones again the way the RDBM represents data is different from the way objects do. JPA/Hibernate is awesome tool. It saves great efforts! But its is necessary to make correct decisions on how the data is stored!

Have fun with JPA!

Some tips on TomEE development environment setup

I was recently interested to take a look what’s inside the TomEE app server.

We’re using it “heavily” in production. I was interested how some stuff was done.

The great stuff what it’s just a usual maven project. Actually all the info can be found here – http://tomee.apache.org/dev/source-code.html but step by step instruction may be of a help.

So I’ve started from getting the source with executing
git clone https://git-wip-us.apache.org/repos/asf/tomee.git tomee

Before importing the project into IDE a very good idea is to make a full build by calling

mvn -Dsurefire.useFile=false -DdisableXmlReport=true -DuniqueVersion=false -ff -Dassemble -DskipTests -DfailIfNoTests=false clean install

It will take a while, it will download all the dependencies and build TomEE without executing all the tests. Something that pleased me a lot is there is no need for any other special setup.

The integration with IDE is also quite seamless.

In IntelliJ Idea for example Just import the project from POM:

1

Then check ALL:

2

and in about 30 min you will get fully indexed working environment:

3

For Eclipse its almost the same. First import existing maven project into workspace:

14 марта 2016 г. 09-36-52

Eclipse finally supports hierarchical maven projects:

14 марта 2016 г. 09-38-04

And we are done!

11

A good idea may be to select hierarchical project representation of the project 🙂

The TomEE team has done a great job to make the development setup extremely simple! Within just few action you can start developing for TomEE.

KUDOS to the team!

 

Nashorn Eclipse Development Environment Setup

Yo fellows!

I recently had some time, so I’ve played with Eclipse to set it up for Nashorn. This article is adopted from the previous one regarding Intllij Idea.

The current setup is made on OS X Yosemite and the latest Eclipse (Luna). But the setup for other OSs should run almost the same way.

1. Verify you have JDK8 installed.

2. Using mercurial clone the repository http://hg.openjdk.java.net/jdk9/dev to the folder you want and execute get_sources.sh

3. Now switch to Eclipse and create a project called Nashorn directly in the folder “<JDK9_SOURCES>/nashorn”

17 июня 2015 г., 11:09:56

4. Eclipse is quite clever to find source folders:

17 июня 2015 г., 11:35:25

 

5. A bit tricky part: The compilation, development and debugging currently is done against JDK8 since the IDE still does not support JDK9 and the jimage distribution mechanism. So, in order JDK’s Nashorn not to interfere with one we build it’s a good idea to make a new copy of JDK8 and to remove the nashorn.jar from the JDK located under <JDK8_ROOT>/jre/lib/ext/nashorn.jar:

1

 

 

6. Now add this JDK to the IDE:

17 июня 2015 г., 11:39:25

7.  Almost done. Another tricky part: In Nashorn the so called “JavaScript” classes are been generated. There is a special tool “nasgen” for that and it is locates in the “buildtools/nasgen” directory.

17 июня 2015 г., 11:52:04

 

8. Before running the Nashorn itself the nasgen “all” ant target should be run.

9. Add the resulting “nasgen.jar” to the “Build Path” if it’s not there already.

17 июня 2015 г., 11:54:16

10. Navigate to “<nashorn>/make” folder and run the “all” ant target.

11. Add the resulting classed in “<nashorn>/build/classes/” to the “Build Path”:

17 июня 2015 г., 11:59:31

 

12. Now it is possible to run the Shell.java to explore and debug the code:

17 июня 2015 г., 12:02:24

 

Debugging is available directly in the IDE :

17 июня 2015 г., 12:09:38

 

Warning: Have in mind that some of classes – so called “JavaScript” classes are been generated. Their “bootstrapping” classes are annotated with @ScriptObject. Take some time to explore them. They cannot be debugged from that perspective. But “System.out.println” might help :)

Have fun developing the Nashorn!

Nashorn IntelliJ Idea development environment setup

Yo fellows!

As a part of the Bulgarian JUG I’m interested in contribution to the Nashorn Project!

In this post I’ll try describe how to setup the IntelliJ based development environment for Nashorn.

The current setup is made on OS X Yosemite and IntelliJ Idea version 14. But the setup for other OSs should run almost the same way.

1. Verify you have JDK8 installed.

2. Using mercurial clone the repository http://hg.openjdk.java.net/jdk9/dev to the folder you want and execute get_sources.sh

3. Create an empty Java project somewhere in your system but NOT in the folder where the pulled JDK9 sources are.

4. Make a project module with root “<JDK9_SOURCES>/nashorn”, and assign sources to” src/jdk/scripting/nashorn/share/classes”

1

5. A bit tricky part: The compilation, development and debugging currently is done against JDK8 since the IDE Idea 14 does not support JDK9 and the jimage distribution mechanism. So, in order JDK’s Nashorn not to interfere with one we build it’s a good idea to make a new copy of JDK8 and to remove the nashorn.jar from the JDK located under <JDK8_ROOT>/jre/lib/ext/nashorn.jar:

1

6. Now add this JDK to the IDE:

1

 

7.  Almost done. Another tricky part: In Nashorn the so called “JavaScript” classes are been generated. There is a special tool “nasgen” for that and it is locates in the “buildtools/nasgen” directory.

1

 

8. Before running the Nashorn itself the nasgen “all” ant target should be run.

9. Add the resulting “nasgen.jar” to the module dependencies.

10. Navigate to “<nashorn>/make” folder and run the “all” ant target.

11. Add the resulting classed in “<nashorn>/build/classes/” to the module dependencies:

1

12. Now it is possible to run the Shell.java to explore and debug the code:

1

 

Debugging is available directly in the IDE :

1

 

Warning: Have in mind that some of classes – so called “JavaScript” classes are been generated. Their “bootstrapping” classes are annotated with @ScriptObject. Take some time to explore them. They cannot be debugged from that perspective. But “sout” might help 🙂

 

Have fun developing the Nashorn!