Opened 17 years ago
Closed 17 years ago
#1070 closed defect (fixed)
Excess memory usage when creating root bioassayset with different array designs
Reported by: | Nicklas Nordborg | Owned by: | Nicklas Nordborg |
---|---|---|---|
Priority: | critical | Milestone: | BASE 2.7.2 |
Component: | core | Version: | |
Keywords: | Cc: |
Description
Although the implementation is correct I tag this as a defect since for larger experiments the memory usage of the IntensityCalculatorUtil.createRootBioAssaySet()
is so high that it will not have a chance to work.
The root of the problem is that we don't use Hibernate's cache for reporters. The reason for this was to decrease memory usage! This normally works very well since spots are usually treated one-by-one. In the scenario were there are multiple array design we must, however, create a new position->reporter mapping for the root bioassayset to make sure that each position has a unique reporter. In the current implementation this is done by keeping a Map
of used positions and reporters. In the worst case we have to re-map every position on all raw bioassays, which means that the Map
can grow to the same size as the total number of spots.
Since Hibernate's cache is not used, this means that we get a new reporter object for every spot. This is not good!
How can this problem be solved?
- We still don't want to use the cache in Hibernate. This would create problems elsewhere.
- We actually only need to store the internal reporter id for each position. It should be possible to use the same technique with a proxy for the reporter. It will still be a lot of reporter object but each will only need to store the ID and not all annotations.
- I think we can make the position->reporter mapping smarter. As it is now it may just create a new position for every spot, since we are ignoring the array design information. It should be possible to create a mapping based on the features on each of the used array designs.
Change History (4)
comment:1 by , 17 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
comment:2 by , 17 years ago
comment:3 by , 17 years ago
comment:4 by , 17 years ago
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
I think this may be fixed now. The memory usage is is much lower. Here is a table with approximate values for before/after the fix. The test data consisted of 3 raw data sets each of 3 different array designs with 32K, 33K and 38K features respectively. The total number of spots was 339120 and the number of unique reporters was 36746. The data files usind have been checked in to the testdata repository (http://dev.thep.lu.se/basetestdata/svn/trunk/3designs/).
Memory usage of intensity calculator plug-in (MB)
I need to run TestAll, and maybe some other test before I close this ticket.