Opened 14 years ago

Closed 14 years ago

#1070 closed defect (fixed)

Excess memory usage when creating root bioassayset with different array designs

Reported by: Nicklas Nordborg Owned by: Nicklas Nordborg
Priority: critical Milestone: BASE 2.7.2
Component: core Version:
Keywords: Cc:

Description

Although the implementation is correct I tag this as a defect since for larger experiments the memory usage of the IntensityCalculatorUtil.createRootBioAssaySet() is so high that it will not have a chance to work.

The root of the problem is that we don't use Hibernate's cache for reporters. The reason for this was to decrease memory usage! This normally works very well since spots are usually treated one-by-one. In the scenario were there are multiple array design we must, however, create a new position->reporter mapping for the root bioassayset to make sure that each position has a unique reporter. In the current implementation this is done by keeping a Map of used positions and reporters. In the worst case we have to re-map every position on all raw bioassays, which means that the Map can grow to the same size as the total number of spots.

Since Hibernate's cache is not used, this means that we get a new reporter object for every spot. This is not good!

How can this problem be solved?

  1. We still don't want to use the cache in Hibernate. This would create problems elsewhere.
  2. We actually only need to store the internal reporter id for each position. It should be possible to use the same technique with a proxy for the reporter. It will still be a lot of reporter object but each will only need to store the ID and not all annotations.
  3. I think we can make the position->reporter mapping smarter. As it is now it may just create a new position for every spot, since we are ignoring the array design information. It should be possible to create a mapping based on the features on each of the used array designs.

Change History (4)

comment:1 Changed 14 years ago by Nicklas Nordborg

Owner: changed from everyone to Nicklas Nordborg
Status: newassigned

comment:2 Changed 14 years ago by Nicklas Nordborg

I think this may be fixed now. The memory usage is is much lower. Here is a table with approximate values for before/after the fix. The test data consisted of 3 raw data sets each of 3 different array designs with 32K, 33K and 38K features respectively. The total number of spots was 339120 and the number of unique reporters was 36746. The data files usind have been checked in to the testdata repository (http://dev.thep.lu.se/basetestdata/svn/trunk/3designs/).

Memory usage of intensity calculator plug-in (MB)

Progress Before fix After fix
25% 110 50
35% 130 50
45% 170 50
55% 200 50
65% 220 50
75% 260 55
85% 240 60
95% 290 60
100% 300 60

I need to run TestAll?, and maybe some other test before I close this ticket.

comment:3 Changed 14 years ago by Nicklas Nordborg

(In [4353]) References #1070: Excess memory usage when creating root bioassayset with different array designs

comment:4 Changed 14 years ago by Nicklas Nordborg

Resolution: fixed
Status: assignedclosed

(In [4354]) Fixes #1070: Excess memory usage when creating root bioassayset with different array designs

Note: See TracTickets for help on using tickets.