Opened 18 years ago
Last modified 17 years ago
#573 closed defect
Trim whitespace when checking for unique values — at Initial Version
Reported by: | Nicklas Nordborg | Owned by: | Nicklas Nordborg |
---|---|---|---|
Priority: | major | Milestone: | BASE 2.4 |
Component: | core | Version: | |
Keywords: | Cc: |
Description
From the mailing list by Bob MacCallum:
http://sourceforge.net/mailarchive/forum.php?thread_name=17964.59444.651455.264772%40bio-iisrv1.bio.ic.ac.uk&forum_name=basedb-users
- I think there's some inconsistent handling of trailing spaces in the "reporter ID" column of a genepix .gpr file. For example I can import reporters, and create an array design from the file pasted below, but I can't then import the raw data!
(the following is just 8 lines long - if the long lines get mangled, I'll send a copy by mail on request)
> ATF 1.0 > 27 43 Type=GenePix Results 1.4 > "Block" "Column" "Row" "Name" "ID" "X" "Y" "Dia." "F635 Median" "F635 Mean" "F635 SD" "B635 Median" "B635 Mean" "B635 SD" "% > B635+1SD" "% > B635+2SD" "F635 % Sat." "F532 Median" "F532 Mean" "F532 SD" "B532 Median" "B532 Mean" "B532 SD" "% > B532+1SD" "% > B532+2SD" "F532 % Sat." "Ratio of Medians" "Ratio of Means" "Median of Ratios" "Mean of Ratios" "Ratios SD" "Rgn Ratio" "Rgn R²" "F Pixels" "B Pixels" "Sum of Medians" "Sum of Means" "Log Ratio" "F635 Median - B635" "F532 Median - B532" "F635 Mean - B635" "F532 Mean - B532" "Flags" > 1 1 1 "demoA" "demorep1" 1690 5730 110 183 181 42 59 62 25 100 98 0 276 270 48 64 65 13 100 100 0 0.585 0.592 0.570 0.576 1.357 0.591 0.782 80 621 336 328 -0.774 124 212 122 206 0 > 1 2 1 "demoB" "demorep2 " 1910 5730 120 114 137 175 57 61 37 71 21 0 346 341 80 63 65 35 96 95 0 0.201 0.288 0.192 0.209 2.379 0.398 0.094 120 716 340 358 -2.312 57 283 80 278 0 > 1 3 1 "demoC" "demorep3" 2110 5740 110 145 148 43 63 68 30 92 68 0 208 214 48 69 74 43 98 93 0 0.590 0.586 0.599 0.541 1.987 0.504 0.582 80 566 221 230 -0.761 82 139 85 145 0 > 1 4 1 "demoD" "demorep4" 2300 5730 110 185 187 51 59 63 23 100 96 0 298 294 57 64 67 24 100 98 0 0.538 0.557 0.526 0.538 1.599 0.549 0.730 80 590 360 358 -0.893 126 234 128 230 0
the stacktrace from the raw data import is:
> net.sf.basedb.core.BaseException: Item not found: Reporter mismatch: The feature has reporter 'demorep2' whereas you have given 'demorep2 ' on line 6: 1 2 1 "demoB" "de... > at net.sf.basedb.plugins.AbstractFlatFileImporter.doImport(AbstractFlatFileImporter.java:592) > at net.sf.basedb.plugins.AbstractFlatFileImporter.run(AbstractFlatFileImporter.java:442) > at net.sf.basedb.core.PluginExecutionRequest.invoke(PluginExecutionRequest.java:88) > at net.sf.basedb.core.InternalJobQueue$JobRunner.run(InternalJobQueue.java:420) > at java.lang.Thread.run(Thread.java:619) > Caused by: net.sf.basedb.core.ItemNotFoundException: Item not found: Reporter mismatch: The feature has reporter 'demorep2' whereas you have given 'demorep2 ' > at net.sf.basedb.core.RawDataBatcher.doInsert(RawDataBatcher.java:390) > at net.sf.basedb.core.RawDataBatcher.insert(RawDataBatcher.java:343) > at net.sf.basedb.plugins.RawDataFlatFileImporter.handleData(RawDataFlatFileImporter.java:544) > at net.sf.basedb.plugins.AbstractFlatFileImporter.doImport(AbstractFlatFileImporter.java:570) > ... 4 more
I think BASE1 was more tolerant.
Leading and trailing blanks are trimmed from more or less all values before they are inserted in the database and that explains why you get "demorep2" instead of "demorep2 ". I guess we never though of doing the same when checking if a reporter (or something else with a unique value) exists in the database or not. I think there are several other places affected by the same thing. I'll add this as a bug in our trac database. In the meantime you can try using a splitter regexp that also removes white-space. Try something like \s*\t\s* instead of just \t. I have not tested this but it might be enough to make it work.