Opened 8 years ago

Closed 8 years ago

#2009 closed defect (fixed)

PostgreSQL seems to be loading entire result set into memory

Reported by: Nicklas Nordborg Owned by: Nicklas Nordborg
Priority: critical Milestone: BASE 3.9
Component: core Version:
Keywords: Cc:

Description (last modified by Nicklas Nordborg)

This was investigated for MySQL a long time ago and resulted in setting some parameters on the connection string (useCursorFetch=true&defaultFetchSize=100&useServerPrepStmts=true, see #953).

I guess the corresponding investigations were never made for PostgreSQL because it seems like the JDBC driver is loading the entire resultset into memory (this was seen when testing the Reggie release exporter http://baseplugins.thep.lu.se/ticket/887).

We need to investigate and make sure that PostgreSQL is using server-side cursors for streaming the result.

Attachments (2)

loading-142-out-of-memory.png (29.2 KB ) - added by Nicklas Nordborg 8 years ago.
Trying to load 142 x 100k raw data entries
loading-142-by-100.png (35.5 KB ) - added by Nicklas Nordborg 8 years ago.
Load 142 x 100k raw data entries in batches by 100

Download all attachments as: .zip

Change History (10)

comment:1 by Nicklas Nordborg, 8 years ago

Description: modified (diff)

by Nicklas Nordborg, 8 years ago

Trying to load 142 x 100k raw data entries

comment:2 by Nicklas Nordborg, 8 years ago

The test case is now a list with 142 raw bioassays containing 100k+ raw data entries each. Using the new DynamicRawDataQuery implemented in #2004 to load the data results in query that runs for a few minutes. Then, the memory usage increases very steeply until 100% CPU is used for garbage collection and the server hangs.

Trying to load 142 x 100k raw data entries

comment:3 by Nicklas Nordborg, 8 years ago

The main suspect is that the JDBC fetch size is not set. The default value of 0 means that the PostgreSQL JDBC driver will try to load the entire result into memory: https://jdbc.postgresql.org/documentation/head/connect.html (see the defaultRowFetchSize option a bit down on the page).

Trying to set this option in the db.url setting in base.config unfortunately seems to have no effect.

comment:4 by Nicklas Nordborg, 8 years ago

It should be possible to create the setting hibernate.jdbc.fetch_size in hibernate.cfg.xml (http://docs.jboss.org/hibernate/orm/5.0/userguide/html_single/Hibernate_User_Guide.html#configurations-database-fetch).

However, it seems like it is only used internally by Hibernate and that it is not applied to SQL statements that we use for the *dynamic* part were we bypass Hibernate (not so hard to guess that!).

by Nicklas Nordborg, 8 years ago

Attachment: loading-142-by-100.png added

Load 142 x 100k raw data entries in batches by 100

comment:5 by Nicklas Nordborg, 8 years ago

Using the JDBC API to programmatically set the fetch size to 100. About the same time as before to execute the query (13:22:00-13:25:30). The next two minutes is loading/processing the raw data and writing it to the file. Memory usage is going up and down as expected. The important thing is that the "Old generation" (=left bar in the "Heap" section) is not filling up at all, which means that the memory is used only by temporary objects that can be garbage collected as soon as a data row has been processed. The total time for the exporter was less than 6 minutes.

Load 142 x 100k raw data entries in batches by 100

comment:6 by Nicklas Nordborg, 8 years ago

(In [7156]) References #2009: PostgreSQL seems to be loading entire result set into memory

Adding jdbc.fetch_size configuration property to hibernate.cfg.xml. Hopefully Hibernate will pick it up wherever possible.

We are also picking it up in the BASE core API and use the same setting for queries were we bypass Hibernate.

It is verified to work for the raw data query in #2004 and http://baseplugins.thep.lu.se/ticket/887

comment:7 by Nicklas Nordborg, 8 years ago

Owner: changed from everyone to Nicklas Nordborg
Status: newassigned

comment:8 by Nicklas Nordborg, 8 years ago

Resolution: fixed
Status: assignedclosed

Seems to work now.

Note: See TracTickets for help on using tickets.