Opened 16 years ago

Closed 16 years ago

Last modified 16 years ago

#797 closed enhancement (fixed)

Enhance performance for LOWESS and Medin-ratio plug-ins

Reported by: Nicklas Nordborg Owned by: Nicklas Nordborg
Priority: critical Milestone: BASE 2.5
Component: coreplugins Version:
Keywords: Cc:

Description

The performance tests (#294) shows that the LOWESS plug-in has bad performance. It is does not gain much if the indexes are dropped (#796). I think the problem is that the number of selects grows when the "blockgroup size" parameter get lower. Below are the SQL that is executed as part of the LOWESS plug-in. We should also check if the Median-ratio plug-in has the same behaviour.

-- Get min and max block - one time only ~ 1s
SELECT MAX(`raw`.`block`) AS `max`, MIN(`raw`.`block`) AS `min` 
FROM `base2dynamic`.`D4Spot` `spt` 
INNER JOIN `base2dynamic`.`D4Filter` AS `flt` ON `flt`.`cube` = `spt`.`cube` 
  AND `flt`.`column` = `spt`.`column` AND `flt`.`position` = `spt`.`position`  
LEFT JOIN `base2dynamic`.`D4RawParents` AS `rawmap` ON `rawmap`.`cube` = `spt`.`cube` 
  AND `rawmap`.`column` = `spt`.`column` AND `rawmap`.`position` = `spt`.`position`  
LEFT JOIN `RawDataGenePix` AS raw ON raw.id = rawmap.`rawdata_id`  
WHERE `spt`.`cube` = 1 AND `spt`.`layer` = 1 AND `flt`.`filter` = 2 AND `spt`.`column` = 1 
  AND ((`spt`.`ch1` > 0.0) AND (`spt`.`ch2` > 0.0))

-- Count number of spots in block group - one time for each block group ~ 0,5 s
SELECT COUNT(*)  FROM `base2dynamic`.`D4Spot` `spt` 
INNER JOIN `base2dynamic`.`D4Filter` AS `flt` ON `flt`.`cube` = `spt`.`cube` 
  AND `flt`.`column` = `spt`.`column` AND `flt`.`position` = `spt`.`position`  
LEFT JOIN `base2dynamic`.`D4RawParents` AS `rawmap` ON `rawmap`.`cube` = `spt`.`cube` 
  AND `rawmap`.`column` = `spt`.`column` AND `rawmap`.`position` = `spt`.`position`  
LEFT JOIN `RawDataGenePix` AS raw ON raw.id = rawmap.`rawdata_id`  
WHERE `spt`.`cube` = 1 AND `spt`.`layer` = 1 AND `flt`.`filter` = 2 AND `spt`.`column` = 1 
  AND ((`spt`.`ch1` > 0.0) AND (`spt`.`ch2` > 0.0)) 
  AND (1 <= `raw`.`block` AND `raw`.`block` < 2)

-- Load A and M for each spot in spot group - one time for each block group ~ 2s
SELECT 
  (LOG((`spt`.`ch1` / `spt`.`ch2`)) / 0.6931471805599453) AS `m`, 
  (LOG(SQRT((`spt`.`ch1` * `spt`.`ch2`))) / 2.302585092994046) AS `a`, 
  `raw`.`block` AS `block` 
FROM `base2dynamic`.`D4Spot` `spt` 
INNER JOIN `base2dynamic`.`D4Filter` AS `flt` ON `flt`.`cube` = `spt`.`cube` 
  AND `flt`.`column` = `spt`.`column` AND `flt`.`position` = `spt`.`position`  
LEFT JOIN `base2dynamic`.`D4RawParents` AS `rawmap` ON `rawmap`.`cube` = `spt`.`cube` 
  AND `rawmap`.`column` = `spt`.`column` AND `rawmap`.`position` = `spt`.`position`  
LEFT JOIN `RawDataGenePix` AS raw ON raw.id = rawmap.`rawdata_id`  
WHERE `spt`.`cube` = 1 AND `spt`.`layer` = 1 AND `flt`.`filter` = 2 AND `spt`.`column` = 1 
  AND ((`spt`.`ch1` > 0.0) AND (`spt`.`ch2` > 0.0)) 
  AND (1 <= `raw`.`block` AND `raw`.`block` < 2) 
ORDER BY a ASC

-- Load ch(1), ch(2) and A for each spot in spot group - one time for each block group ~ 1s
SELECT `spt`.`column` AS `column`, `spt`.`position` AS `position`, 
  `spt`.`ch1` AS `ch1`, `spt`.`ch2` AS `ch2`, 
  (LOG(SQRT((`spt`.`ch1` * `spt`.`ch2`))) / 2.302585092994046) AS `a` 
FROM `base2dynamic`.`D4Spot` `spt` 
INNER JOIN `base2dynamic`.`D4Filter` AS `flt` ON `flt`.`cube` = `spt`.`cube` 
  AND `flt`.`column` = `spt`.`column` AND `flt`.`position` = `spt`.`position`  
LEFT JOIN `base2dynamic`.`D4RawParents` AS `rawmap` ON `rawmap`.`cube` = `spt`.`cube` 
  AND `rawmap`.`column` = `spt`.`column` AND `rawmap`.`position` = `spt`.`position`  
LEFT JOIN `RawDataGenePix` AS raw ON raw.id = rawmap.`rawdata_id`  
WHERE `spt`.`cube` = 1 AND `spt`.`layer` = 1 AND `flt`.`filter` = 2 AND `spt`.`column` = 1 
  AND ((`spt`.`ch1` > 0.0) AND (`spt`.`ch2` > 0.0)) 
  AND (1 <= `raw`.`block` AND `raw`.`block` < 2) 
ORDER BY a ASC

Attachments (1)

perftest.txt (740 bytes ) - added by Nicklas Nordborg 16 years ago.
Results of performance test with 10-40 bioassays

Download all attachments as: .zip

Change History (7)

comment:1 by Nicklas Nordborg, 16 years ago

Owner: changed from everyone to Nicklas Nordborg
Status: newassigned

comment:2 by Nicklas Nordborg, 16 years ago

(In [3877]) References #797: Enhance performance for LOWESS and Medin-ratio plug-ins

LOWESS has now been optimized to use fewer database queries. Initial test results indicates at least 80% reduced execution time, but I have only tested on a 1-4 bioassays.

comment:3 by Nicklas Nordborg, 16 years ago

(In [3878]) References #797: Enhance performance for LOWESS and Median-ratio plug-ins

Median-ratio has now been optimized to use fewer database queries. Initial test results indicates at least 90% reduced execution time, but I have only tested on a 1-4 bioassays.

comment:4 by Nicklas Nordborg, 16 years ago

Here is a useful SQL statement that can be used to compare intensity values in two bioassay sets. The must be located in the same datacube:

SELECT 
orig.position,
orig.ch1 as origCh1, mf.ch1 as ch1, orig.ch1-mf.ch1 as dCh1, 
orig.ch2 as origCh2, mf.ch2 as ch2, orig.ch2-mf.ch2 as dCh2

FROM `D1Spot` as orig
JOIN  D1Spot as mf ON orig.cube=2 and orig.position = mf.position and orig.column = mf.column 
WHERE orig.layer = 3 and mf.layer = 4

comment:5 by Nicklas Nordborg, 16 years ago

Resolution: fixed
Status: assignedclosed

(In [3879]) Fixes #797: Enhance performance for LOWESS and Medin-ratio plug-ins

by Nicklas Nordborg, 16 years ago

Attachment: perftest.txt added

Results of performance test with 10-40 bioassays

comment:6 by Nicklas Nordborg, 16 years ago

(In [3887]) References #797: Enhance performance for LOWESS and Median-ratio plug-ins

Must sort by 'A' or 'ratio' in memory instead of in the database to avoid interference with the sorting on 'block' when blockgroupsize is > 1

Note: See TracTickets for help on using tickets.