Opened 18 years ago
Closed 17 years ago
#573 closed defect (fixed)
Trim whitespace when checking for unique values
Reported by: | Nicklas Nordborg | Owned by: | Nicklas Nordborg |
---|---|---|---|
Priority: | major | Milestone: | BASE 2.4 |
Component: | core | Version: | |
Keywords: | Cc: |
Description (last modified by )
Relates to ticket:574 and ticket:469
From the mailing list by Bob MacCallum:
http://sourceforge.net/mailarchive/forum.php?thread_name=17964.59444.651455.264772%40bio-iisrv1.bio.ic.ac.uk&forum_name=basedb-users
- I think there's some inconsistent handling of trailing spaces in the "reporter ID" column of a genepix .gpr file. For example I can import reporters, and create an array design from the file pasted below, but I can't then import the raw data!
(the following is just 8 lines long - if the long lines get mangled, I'll send a copy by mail on request)
> ATF 1.0 > 27 43 Type=GenePix Results 1.4 > "Block" "Column" "Row" "Name" "ID" "X" "Y" "Dia." "F635 Median" "F635 Mean" "F635 SD" "B635 Median" "B635 Mean" "B635 SD" "% > B635+1SD" "% > B635+2SD" "F635 % Sat." "F532 Median" "F532 Mean" "F532 SD" "B532 Median" "B532 Mean" "B532 SD" "% > B532+1SD" "% > B532+2SD" "F532 % Sat." "Ratio of Medians" "Ratio of Means" "Median of Ratios" "Mean of Ratios" "Ratios SD" "Rgn Ratio" "Rgn R²" "F Pixels" "B Pixels" "Sum of Medians" "Sum of Means" "Log Ratio" "F635 Median - B635" "F532 Median - B532" "F635 Mean - B635" "F532 Mean - B532" "Flags" > 1 1 1 "demoA" "demorep1" 1690 5730 110 183 181 42 59 62 25 100 98 0 276 270 48 64 65 13 100 100 0 0.585 0.592 0.570 0.576 1.357 0.591 0.782 80 621 336 328 -0.774 124 212 122 206 0 > 1 2 1 "demoB" "demorep2 " 1910 5730 120 114 137 175 57 61 37 71 21 0 346 341 80 63 65 35 96 95 0 0.201 0.288 0.192 0.209 2.379 0.398 0.094 120 716 340 358 -2.312 57 283 80 278 0 > 1 3 1 "demoC" "demorep3" 2110 5740 110 145 148 43 63 68 30 92 68 0 208 214 48 69 74 43 98 93 0 0.590 0.586 0.599 0.541 1.987 0.504 0.582 80 566 221 230 -0.761 82 139 85 145 0 > 1 4 1 "demoD" "demorep4" 2300 5730 110 185 187 51 59 63 23 100 96 0 298 294 57 64 67 24 100 98 0 0.538 0.557 0.526 0.538 1.599 0.549 0.730 80 590 360 358 -0.893 126 234 128 230 0
the stacktrace from the raw data import is:
> net.sf.basedb.core.BaseException: Item not found: Reporter mismatch: The feature has reporter 'demorep2' whereas you have given 'demorep2 ' on line 6: 1 2 1 "demoB" "de... > at net.sf.basedb.plugins.AbstractFlatFileImporter.doImport(AbstractFlatFileImporter.java:592) > at net.sf.basedb.plugins.AbstractFlatFileImporter.run(AbstractFlatFileImporter.java:442) > at net.sf.basedb.core.PluginExecutionRequest.invoke(PluginExecutionRequest.java:88) > at net.sf.basedb.core.InternalJobQueue$JobRunner.run(InternalJobQueue.java:420) > at java.lang.Thread.run(Thread.java:619) > Caused by: net.sf.basedb.core.ItemNotFoundException: Item not found: Reporter mismatch: The feature has reporter 'demorep2' whereas you have given 'demorep2 ' > at net.sf.basedb.core.RawDataBatcher.doInsert(RawDataBatcher.java:390) > at net.sf.basedb.core.RawDataBatcher.insert(RawDataBatcher.java:343) > at net.sf.basedb.plugins.RawDataFlatFileImporter.handleData(RawDataFlatFileImporter.java:544) > at net.sf.basedb.plugins.AbstractFlatFileImporter.doImport(AbstractFlatFileImporter.java:570) > ... 4 more
I think BASE1 was more tolerant.
Leading and trailing blanks are trimmed from more or less all values before they are inserted in the database and that explains why you get "demorep2" instead of "demorep2 ". I guess we never though of doing the same when checking if a reporter (or something else with a unique value) exists in the database or not. I think there are several other places affected by the same thing. I'll add this as a bug in our trac database. In the meantime you can try using a splitter regexp that also removes white-space. Try something like \s*\t\s* instead of just \t. I have not tested this but it might be enough to make it work.
Change History (8)
comment:1 by , 17 years ago
Milestone: | BASE 2.4 → BASE 2.3 |
---|
comment:2 by , 17 years ago
Description: | modified (diff) |
---|
comment:3 by , 17 years ago
Description: | modified (diff) |
---|---|
Priority: | minor → major |
Status: | new → assigned |
comment:4 by , 17 years ago
comment:5 by , 17 years ago
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
comment:6 by , 17 years ago
comment:7 by , 17 years ago
Resolution: | fixed |
---|---|
Status: | closed → reopened |
As it turns out, BASE 1 only removed trailing white spaces...not leading. The migration from the demo server fail beacuse there a reporters having ID: '25' and ' 25'.
What do we do about this since only one of them can be migrated to BASE 2? Can we safely assume that it is a mistake and that both reporter are actaully the same. In BASE 2 they would be imported as the same reporter.
- Ignore the second one and map all references to the first?
- Rename the second one? How do make sure that the new name is unique? What if there are ' 25' (two leading spaces) or ' 25' (three leading spaces)
- Other ideas...
Since BASE 2 would map both entries to the same reporter if doing an export from BASE 1 and then an import to BASE 2 I think that is how the migration should work as well. I am going for option 1 unless someone objects in the next hour or so...
comment:8 by , 17 years ago
Resolution: | → fixed |
---|---|
Status: | reopened → closed |
Milestone BASE 2.4 deleted