Opened 6 years ago

Last modified 6 years ago

#2000 closed enhancement

Batch API for annotation handling — at Version 2

Reported by: Nicklas Nordborg Owned by: everyone
Priority: critical Milestone: BASE 3.8
Component: core Version:
Keywords: Cc:

Description (last modified by Nicklas Nordborg)

When updating a large number of annotation on a large number of items it easy to run into problems:

  • The first-level cache in Hibernate can easily use up all available memory
  • Dirty-checking and SQL execution by Hibernate takes a long time
  • If the change history logging is enabled, this also takes a long time

The current annotation importer plug-in has been used as for testing. It was used to import values for 140+ different annotation types to 4900+ items (samples).

The data file is 4MB large. The work done by the annotation importer can be divided into the following steps. JConsole is used to check the memory usage and debug output to check the time.

Action Time Memory
Parse the file and find the item to update (loaded by ID) 7 sec ~500MB
Update annotations 5 min ~500MB -> 1.5GB
Commit - Hibernate 12 min ~1.5GB
Commit - Change log 13 min ~1.5GB -> 1.9GB

CPU usage may also be interesting. This is usually below 10% (less than a full single core). The CPU usage for Postgres is in the same range.

The main problems here are that the memory usage grows in the second step and that the last two steps takes a long time.

In theory it should be possible to improve the second step a lot since in this stage the annotation importer is only working with a single item at a time. We do not need Hibernate to keep things in the first-level cache. If we can manage this it may be that the Hibernate commit step is also automatically solved. The change log step may be harder, since we are already using the stateless session here. However, it is maybe possible to replace this with our own batch SQL implementation as we have done for reporters and raw data already.

It turned out that using the annotation importer to delete the existing annotations proved to be much worse. The initial parsing and updating of items used about the same time and amount of memory as when creating items. When committing BASE need to go through relations that may point to the deleted items and either delete them as well or nullify the reference (for example, any-to-any links and inherited/cloned annotations). This consumed more and more memory and reached a point where most of the time was spent doing GC. After 1.5 hours (60 minutes GC) I gave up and killed Tomcat. I'll see what happens if Tomcat get more memory...

Giving Tomcat 4GB instead of 2GB memory helped. The maximum low level was near 3GB. The steps outlined in the table above took more or less the same time as when inserting annotations. An additional hour was spent checking/removing references to the deleted annotation. Total time was over 1 hour 20 minutes.

Change History (2)

comment:1 Changed 6 years ago by Nicklas Nordborg

Component: webcore
Description: modified (diff)
Priority: majorcritical

comment:2 Changed 6 years ago by Nicklas Nordborg

Description: modified (diff)
Note: See TracTickets for help on using tickets.