Opened 17 years ago

Last modified 17 years ago

#573 closed defect

Trim whitespace when checking for unique values — at Initial Version

Reported by: Nicklas Nordborg Owned by: Nicklas Nordborg
Priority: major Milestone: BASE 2.4
Component: core Version:
Keywords: Cc:

Description

From the mailing list by Bob MacCallum:
http://sourceforge.net/mailarchive/forum.php?thread_name=17964.59444.651455.264772%40bio-iisrv1.bio.ic.ac.uk&forum_name=basedb-users

  1. I think there's some inconsistent handling of trailing spaces in the "reporter ID" column of a genepix .gpr file. For example I can import reporters, and create an array design from the file pasted below, but I can't then import the raw data!

(the following is just 8 lines long - if the long lines get mangled, I'll send a copy by mail on request)

> ATF    1.0
> 27    43    Type=GenePix Results 1.4
> "Block"    "Column"    "Row"    "Name"    "ID"    "X"    "Y"    "Dia."    "F635 Median"    "F635 Mean"    "F635 SD"    "B635 Median"    "B635 Mean"    "B635 SD"    "% > B635+1SD"    "% > B635+2SD"    "F635 % Sat."    "F532 Median"    "F532 Mean"    "F532 SD"    "B532 Median"    "B532 Mean"    "B532 SD"    "% > B532+1SD"    "% > B532+2SD"    "F532 % Sat."    "Ratio of Medians"    "Ratio of Means"    "Median of Ratios"    "Mean of Ratios"    "Ratios SD"    "Rgn Ratio"    "Rgn R²"    "F Pixels"    "B Pixels"    "Sum of Medians"    "Sum of Means"    "Log Ratio"    "F635 Median - B635"    "F532 Median - B532"    "F635 Mean - B635"    "F532 Mean - B532"    "Flags"
> 1    1    1    "demoA"    "demorep1"    1690    5730    110    183    181    42    59    62    25    100    98    0    276    270    48    64    65    13    100    100    0    0.585    0.592    0.570    0.576    1.357    0.591    0.782    80    621    336    328    -0.774    124    212    122    206    0
> 1    2    1    "demoB"    "demorep2 "    1910    5730    120    114    137    175    57    61    37    71    21    0    346    341    80    63    65    35    96    95    0    0.201    0.288    0.192    0.209    2.379    0.398    0.094    120    716    340    358    -2.312    57    283    80    278    0
> 1    3    1    "demoC"    "demorep3"    2110    5740    110    145    148    43    63    68    30    92    68    0    208    214    48    69    74    43    98    93    0    0.590    0.586    0.599    0.541    1.987    0.504    0.582    80    566    221    230    -0.761    82    139    85    145    0
> 1    4    1    "demoD"    "demorep4"    2300    5730    110    185    187    51    59    63    23    100    96    0    298    294    57    64    67    24    100    98    0    0.538    0.557    0.526    0.538    1.599    0.549    0.730    80    590    360    358    -0.893    126    234    128    230    0

the stacktrace from the raw data import is:

> net.sf.basedb.core.BaseException: Item not found: Reporter mismatch: The feature has reporter 'demorep2' whereas you have given 'demorep2 ' on line 6: 1 2 1 "demoB" "de...
> at net.sf.basedb.plugins.AbstractFlatFileImporter.doImport(AbstractFlatFileImporter.java:592)
> at net.sf.basedb.plugins.AbstractFlatFileImporter.run(AbstractFlatFileImporter.java:442)
> at net.sf.basedb.core.PluginExecutionRequest.invoke(PluginExecutionRequest.java:88)
> at net.sf.basedb.core.InternalJobQueue$JobRunner.run(InternalJobQueue.java:420)
> at java.lang.Thread.run(Thread.java:619)
> Caused by: net.sf.basedb.core.ItemNotFoundException: Item not found: Reporter mismatch: The feature has reporter 'demorep2' whereas you have given 'demorep2 '
> at net.sf.basedb.core.RawDataBatcher.doInsert(RawDataBatcher.java:390)
> at net.sf.basedb.core.RawDataBatcher.insert(RawDataBatcher.java:343)
> at net.sf.basedb.plugins.RawDataFlatFileImporter.handleData(RawDataFlatFileImporter.java:544)
> at net.sf.basedb.plugins.AbstractFlatFileImporter.doImport(AbstractFlatFileImporter.java:570)
> ... 4 more

I think BASE1 was more tolerant.

Leading and trailing blanks are trimmed from more or less all values before they are inserted in the database and that explains why you get "demorep2" instead of "demorep2 ". I guess we never though of doing the same when checking if a reporter (or something else with a unique value) exists in the database or not. I think there are several other places affected by the same thing. I'll add this as a bug in our trac database. In the meantime you can try using a splitter regexp that also removes white-space. Try something like \s*\t\s* instead of just \t. I have not tested this but it might be enough to make it work.

Change History (0)

Note: See TracTickets for help on using tickets.