Opened 17 years ago
Closed 17 years ago
#837 closed defect (fixed)
Lowess plugin. Error: Sum of weigths in line_fit is not positive
Reported by: | base | Owned by: | everyone |
---|---|---|---|
Priority: | trivial | Milestone: | BASE 2.6 |
Component: | coreplugins | Version: | |
Keywords: | Cc: |
Description
Sorry if this should go to baseplugins trac, but as a core plugin maybe it should go here.
I just ran BASE2 lowess on 54 bioassays and got the error below (right at the end of the job). I think the problem could relate to having three different array designs in the same experiment. When I run on a subset with the same array design it works fine (I just ran it again to check).
cheers, Bob.
View job -- Run plugin: Normalization: Lowess Name Run plugin: Normalization: Lowess Description Priority 5 (1 = highest, 10 = lowest) Status Error: Sum of weigths in line_fit is not positive Percent complete 100% Created 2007-11-22 11:58:24 Started 2007-11-22 11:58:39 Ended 2007-11-22 12:10:50 Server bio-iisrv1.bio.ic.ac.uk User Bob MacCallum Experiment - none - Plugin Normalization: Lowess Configuration - none - net.sf.basedb.core.BaseException: Sum of weigths in line_fit is not positive at net.sf.basedb.plugins.LowessNormalization.weightedLeastSquaresRegression(LowessNormalization.java:574) at net.sf.basedb.plugins.LowessNormalization.lowess(LowessNormalization.java:487) at net.sf.basedb.plugins.LowessNormalization.run(LowessNormalization.java:297) at net.sf.basedb.core.PluginExecutionRequest.invoke(PluginExecutionRequest.java:89) at net.sf.basedb.core.InternalJobQueue$JobRunner.run(InternalJobQueue.java:421) at java.lang.Thread.run(Thread.java:619) Job parameters Blockgroup size 1 Child description Child name All hybs - bg subtracted - no bad spots - lowess Minimum log(intensity) step 0.1 Window size (fraction of points) 0.33 Iterations 4 Source bioassay set All hybs - bg subtracted - no bad spots
BASE version info:
Version BASE 2.4.6pre (build #3938; schema #40) Web server Apache Tomcat/5.5.20 Database Server MySQL 5.0.21-max-log Database Dialect org.hibernate.dialect.MySQLInnoDBDialect JDBC Driver com.mysql.jdbc.Driver (version 5.0) Java runtime Java(TM) SE Runtime Environment (1.6.0-b105), Sun Microsystems Inc. Operating system Linux amd64 2.6.16.53-0.16-smp Memory Total: 359.1 MB Free: 99.1 MB Max: 910.3 MB
Attachments (1)
Change History (11)
comment:1 by , 17 years ago
comment:2 by , 17 years ago
Line 279-280 makes sure that a block is ignored if there are no data points. Maybe the limit should be 1?
comment:3 by , 17 years ago
This is not a defect but a limitation in the algorithm. The reason is that the dataset that weightedLeastSquaresRegression is working on is to small and it is to small because you have set the parameter 'Blockgroup size' to 1. That means that Lowess is working on one block at a time. When I use it in my analysis I would set that value to 16, 24, or 48 when using an ArrayDesign that contains 48 blocks. In your case using different ArrayDesigns I would set that value to the larges number of blocks in the designs.
Having 1 as a default value is not good and i have opend a ticket on that, #838.
comment:4 by , 17 years ago
I think it is a defect since it obviosuly produces a result for some bioassays in the bioassay set but not all. I think the plug-in should filter out/ignore data sets that are too small to be usable. If the algorithm is fed to few data points (because of splitting into blocks or because of heavy filteringen or any other reason).
I checked the BASE 1 plug-in code and it seems to behave differently and output a warning only if that happens. It still produced as result for the cases that works. I attach the BASE 1 code.
follow-up: 6 comment:5 by , 17 years ago
I have investigated the code a bit more and done some test runs. I have found that the calculation fails if there if there is not enough data points in a block group to make the windowSize > 1. Even I, with limited statistical knowledge, can understand that doing a least square fitting with only one data point is rather useless.
The calculations seems to work if windowSize >= 2, but... How many data points is really needed in a window to make Lowess useful? With only two data points in a window the least square fitting will not change anything?
How should we handle this in the plug-in? It is easy to check the window size and filter out block groups that doesn't contain enough data points.
Why does the block group parameter exists in the first place? Shouldn't we always use all spots in a bioassay?
comment:6 by , 17 years ago
Replying to nicklas:
The calculations seems to work if windowSize >= 2, but... How many data points is really needed in a window to make Lowess useful? With only two data points in a window the least square fitting will not change anything?
I we make a filter I think that we are only obligated to make sure that the algorithm runs. Set the filter on windowSize >= 2. To help the user I think that we should print an error if windowSize is less then 100.
Why does the block group parameter exists in the first place? Shouldn't we always use all spots in a bioassay?
In most cases you want to run lowess on all spots. That's way I opened #838. But in some cases your slides has been effected by some technical issues and you don't want to effect the normalization. This is of course a critical decision because it affects the algorithm.
comment:7 by , 17 years ago
Bob here...
You are right, it's not the different array designs.
I've narrowed it down to a set of 14 hybs (same design). None of them look particularly strange - indeed they all lowessed fine in BASE1. Some of them look very centred around zero already, could this be causing the problem?
I'm just running lowess on alread-lowessed bioassays to see if this triggers the bug. No that worked fine.
A little more info: The bioassayset has 14 hybs, 41216 spots and 3386 reporters. The overview plots all have roughly the same number of spots (around 3000).
Will look into this more later today.
comment:8 by , 17 years ago
Milestone: | → BASE 2.6 |
---|
A more informative printout is needed explaining why the normalization was aborted. The default block number should be 0 (zero).
comment:9 by , 17 years ago
Priority: | major → trivial |
---|
comment:10 by , 17 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
I don't think it has anything to do with the array designs being different. Lowess works on a single bioassay at a time and the algorithm doesn't depend on the array design information. I am pretty sure that the bug is triggered by the the specific data in one (or more) of the bioassays.
I took a quick look at the code and I think there is a risk that "Sum of weigths" can be 0. The weights are calculated on line 605 and if 'distance < 1' for all values in the list then all weights will also be 0. I don't know enough about the Lowess algorithm to say if it is possible that all distances are < 1 or not. Maybe if there is only a single value in the list... There seems to be code doing medians, least squares, etc. and as far as I know that doesn't work very good if there are not enough data points.
Does that give you any clue? Can you find any bioassay which has few data points? The Lowess plug-in works on each block separately so you should look for a bioassays with few data points in a single block.