Opened 9 years ago

Closed 9 years ago

#1717 closed defect (fixed)

Annotation importer create annotation set even if there are no annotations for an item

Reported by: Nicklas Nordborg Owned by: everyone
Priority: major Milestone: BASE 3.2
Component: coreplugins Version: 3.1
Keywords: Cc:

Description

Don't know how long this has been around. Discovered it while testing the BASE 3.2 release and have verified that the issue also exists in 3.1.

How to reproduce:

  • Create 2 samples with different names and one annotation type. It doesn't matter what kind of annotation.
  • Create a text file with two columns: sample name and annotation value. Add one line for each sample and enter the names in the name column. For one of the samples, enter a value in the annotation column, but leave the other one empty.
  • Import the annotation with the annotation importer. IT should report that 1 sample has been annotated with 1 value.
  • Check the BioMaterials table in the database. Both samples have a value in the annotationset_id column, but only one of them actually have a real entry in the Annotations table.

The annotation importer should not create an annotation set if there is no annotation value.

Change History (7)

comment:1 Changed 9 years ago by Nicklas Nordborg

Hmmm.... this could potentially be a big performance issue when handling annotations... I checked the production server:

mysql> select count(a.id), count(distinct a.annotationset_id) 
from `Annotations` a;
+-------------+------------------------------------+
| count(a.id) | count(distinct a.annotationset_id) |
+-------------+------------------------------------+
|      210135 |                              50255 |
+-------------+------------------------------------+
1 row in set (0.31 sec)

It has 210K annotations divided among 50K annotation sets, eg. about 4 annotations per item. But...

mysql> select count(at.id) 
from `AnnotationSets` at;
+--------------+
| count(at.id) |
+--------------+
|       682985 |
+--------------+
1 row in set (0.36 sec)

Total number of annotation sets is 683K...

mysql> select count(at.id) 
from `AnnotationSets` at 
left join `Annotations` a on a.annotationset_id= at.id
where a.id is null;
+--------------+
| count(at.id) |
+--------------+
|       632730 |
+--------------+
1 row in set (3.18 sec)

...but 632K of them have no annotation values at all! So it seems like we are wasting a lot of rows in the AnnotationSets table. I think the database update should remove all extra entries.

comment:2 Changed 9 years ago by Nicklas Nordborg

Just a note about inherited annotations. Some of the 632K annotation sets that doesn't have any primary annotations have inherited annotations from other items. So before we can delete any annotation sets, we must also check the InheritedAnnotations and InheritedAnnotationSets tables. There are about 20K such annotation sets on the production server.

comment:3 Changed 9 years ago by Nicklas Nordborg

mysql> select count(at.id) 
from `AnnotationSets` at 
left join `Annotations` a on a.annotationset_id=at.id 
left join `InheritedAnnotations` ia on ia.annotationset_id=at.id 
left join `InheritedAnnotationSets` ias on ias.annotationset_id=at.id 
where a.id is null and ia.annotation_id is null and ias.inherited_id is null;
+--------------+
| count(at.id) |
+--------------+
|       623186 |
+--------------+
1 row in set (6.42 sec)

comment:4 Changed 9 years ago by Nicklas Nordborg

(In [6112]) References #1717: Annotation importer create annotation set even if there are no annotations for an item

This should fix the annotation import so that it doesn't create annotation sets when there are not annotation values.

comment:5 Changed 9 years ago by Nicklas Nordborg

Made another count to see which items that have all those bogus annotation sets:

mysql> select count(*), at.item_type, count(a.id) 
from `AnnotationSets` at 
left join `Annotations` a on a.annotationset_id=at.id 
group by item_type;
+----------+-----------+-------------+
| count(*) | item_type | count(a.id) |
+----------+-----------+-------------+
|     3981 |       201 |        3976 |
|   141145 |       202 |      141022 |
|    27194 |       203 |       26771 |
|   655604 |       226 |       32980 |
|      289 |       242 |         289 |
|        1 |       243 |           0 |
|    13305 |       264 |        3869 |
|     1296 |       267 |        1228 |
|       50 |       268 |           0 |
+----------+-----------+-------------+
9 rows in set (3.48 sec)

item_type: 201=BIOSOURCE, 202=SAMPLE, 203=EXTRACT, 226=WELL, 242=ARRAYBATCH, 243=ARRAYSLIDE, 264=RAWBIOASSAY, 267=PHYSICALBIOASSAY, 268=DERIVEDBIOASSAY

First column is the number of annotation sets for a given item_type (second column). The third column is the number of actual annotations.

The big difference is for item_type=226 which are WELL items. This is a bit strange since the annotation importer doesn't work with WELL items. This means that the empty annotation sets have been created by something else. I suspect that it is the BASE 1->BASE 2 migration, so this has been around for a very long time, and the fix should probably have a separate ticket.

comment:6 Changed 9 years ago by Nicklas Nordborg

See #1718.

comment:7 Changed 9 years ago by Nicklas Nordborg

Resolution: fixed
Status: newclosed
Note: See TracTickets for help on using tickets.