Opened 18 years ago

Last modified 17 years ago

#574 closed enhancement

Handle database differences with case sensitivity — at Initial Version

Reported by: Nicklas Nordborg Owned by: Nicklas Nordborg
Priority: major Milestone: BASE 2.4
Component: core Version:
Keywords: Cc:

Description

From the maling list by Bob MacCallum:

​http://sourceforge.net/mailarchive/forum.php?thread_name=17964.59444.651455.264772%40bio-iisrv1.bio.ic.ac.uk&forum_name=basedb-users

  1. case sensitivity in the reporter ID (external id) column

I get "Error: Duplicate entry 'demoBLANK' for key 2" if I import reporters from this file:

> ATF    1.0
> 27    43    Type=GenePix Results 1.4
> "Block"    "Column"    "Row"    "Name"    "ID"    "X"    "Y"    "Dia."    "F635 Median"    "F635 Mean"    "F635 SD"    "B635 Median"    "B635 Mean"    "B635 SD"    "% > B635+1SD"    "% > B635+2SD"    "F635 % Sat."    "F532 Median"    "F532 Mean"    "F532 SD"    "B532 Median"    "B532 Mean"    "B532 SD"    "% > B532+1SD"    "% > B532+2SD"    "F532 % Sat."    "Ratio of Medians"    "Ratio of Means"    "Median of Ratios"    "Mean of Ratios"    "Ratios SD"    "Rgn Ratio"    "Rgn R²"    "F Pixels"    "B Pixels"    "Sum of Medians"    "Sum of Means"    "Log Ratio"    "F635 Median - B635"    "F532 Median - B532"    "F635 Mean - B635"    "F532 Mean - B532"    "Flags"
> 1    1    1    "demoA"    "demorep1"    1690    5730    110    183    181    42    59    62    25    100    98    0    276    270    48    64    65    13    100    100    0    0.585    0.592    0.570    0.576    1.357    0.591    0.782    80    621    336    328    -0.774    124    212    122    206    0
> 1    2    1    "demoB"    "demorep2"    1910    5730    120    114    137    175    57    61    37    71    21    0    346    341    80    63    65    35    96    95    0    0.201    0.288    0.192    0.209    2.379    0.398    0.094    120    716    340    358    -2.312    57    283    80    278    0
> 1    3    1    "demoblank"    "demoblank"    2110    5740    110    145    148    43    63    68    30    92    68    0    208    214    48    69    74    43    98    93    0    0.590    0.586    0.599    0.541    1.987    0.504    0.582    80    566    221    230    -0.761    82    139    85    145    0
> 1    4    1    "demoBLANK"    "demoBLANK"    2300    5730    110    185    187    51    59    63    23    100    96    0    298    294    57    64    67    24    100    98    0    0.538    0.557    0.526    0.538    1.599    0.549    0.730    80    590    360    358    -0.893    126    234    128    230    0

However in BASE1 it was possible to import files with problems like this. For example, see

​http://base.vectorbase.org/raw_edit.phtml?i_r=102 (just click ok to log in)

you can compare the imported .gpr file (scroll down to 2 17 20) with the table of data (position 857)

you see that "BLANK" was imported as "Blank" because "Blank" was already in the table.

This problem is affected how the database handles strings. MySQL is case-insensitive. Postgres on the other hand is case-sensitive and the same problem would never have appeared. The important question is if the "demoblank" and "demoBLANK" should be treated as the same reporters or not?

In Postgres they are already treated as different and it would be rather hard to change that. The only way is to convert all ID:s to the same case before storing them in the database.

In MySQL they are treated as the same and it is equally hard to change that. The problem appears here because the two reporters are in the same file. If there had been two different raw data files, both "demoblank" and "demoBLANK" would have mapped to the same reporter. The bug in our code is that when the lines are in the same file we do case-sensitive comparison to check what has already been inserted. I'll add a ticket for that as well.

Change History (0)

Note: See TracTickets for help on using tickets.