Opened 18 years ago
Last modified 17 years ago
#574 closed enhancement
Handle database differences with case sensitivity — at Initial Version
Reported by: | Nicklas Nordborg | Owned by: | Nicklas Nordborg |
---|---|---|---|
Priority: | major | Milestone: | BASE 2.4 |
Component: | core | Version: | |
Keywords: | Cc: |
Description
From the maling list by Bob MacCallum:
- case sensitivity in the reporter ID (external id) column
I get "Error: Duplicate entry 'demoBLANK' for key 2" if I import reporters from this file:
> ATF 1.0 > 27 43 Type=GenePix Results 1.4 > "Block" "Column" "Row" "Name" "ID" "X" "Y" "Dia." "F635 Median" "F635 Mean" "F635 SD" "B635 Median" "B635 Mean" "B635 SD" "% > B635+1SD" "% > B635+2SD" "F635 % Sat." "F532 Median" "F532 Mean" "F532 SD" "B532 Median" "B532 Mean" "B532 SD" "% > B532+1SD" "% > B532+2SD" "F532 % Sat." "Ratio of Medians" "Ratio of Means" "Median of Ratios" "Mean of Ratios" "Ratios SD" "Rgn Ratio" "Rgn R²" "F Pixels" "B Pixels" "Sum of Medians" "Sum of Means" "Log Ratio" "F635 Median - B635" "F532 Median - B532" "F635 Mean - B635" "F532 Mean - B532" "Flags" > 1 1 1 "demoA" "demorep1" 1690 5730 110 183 181 42 59 62 25 100 98 0 276 270 48 64 65 13 100 100 0 0.585 0.592 0.570 0.576 1.357 0.591 0.782 80 621 336 328 -0.774 124 212 122 206 0 > 1 2 1 "demoB" "demorep2" 1910 5730 120 114 137 175 57 61 37 71 21 0 346 341 80 63 65 35 96 95 0 0.201 0.288 0.192 0.209 2.379 0.398 0.094 120 716 340 358 -2.312 57 283 80 278 0 > 1 3 1 "demoblank" "demoblank" 2110 5740 110 145 148 43 63 68 30 92 68 0 208 214 48 69 74 43 98 93 0 0.590 0.586 0.599 0.541 1.987 0.504 0.582 80 566 221 230 -0.761 82 139 85 145 0 > 1 4 1 "demoBLANK" "demoBLANK" 2300 5730 110 185 187 51 59 63 23 100 96 0 298 294 57 64 67 24 100 98 0 0.538 0.557 0.526 0.538 1.599 0.549 0.730 80 590 360 358 -0.893 126 234 128 230 0
However in BASE1 it was possible to import files with problems like this. For example, see
​http://base.vectorbase.org/raw_edit.phtml?i_r=102 (just click ok to log in)
you can compare the imported .gpr file (scroll down to 2 17 20) with the table of data (position 857)
you see that "BLANK" was imported as "Blank" because "Blank" was already in the table.
This problem is affected how the database handles strings. MySQL is case-insensitive. Postgres on the other hand is case-sensitive and the same problem would never have appeared. The important question is if the "demoblank" and "demoBLANK" should be treated as the same reporters or not?
In Postgres they are already treated as different and it would be rather hard to change that. The only way is to convert all ID:s to the same case before storing them in the database.
In MySQL they are treated as the same and it is equally hard to change that. The problem appears here because the two reporters are in the same file. If there had been two different raw data files, both "demoblank" and "demoBLANK" would have mapped to the same reporter. The bug in our code is that when the lines are in the same file we do case-sensitive comparison to check what has already been inserted. I'll add a ticket for that as well.