Opened 8 months ago
Closed 7 months ago
#2321 closed task (fixed)
Implement a tool for migrating raw bioassays to derived bioassays
Reported by: | Nicklas Nordborg | Owned by: | everyone |
---|---|---|---|
Priority: | major | Milestone: | BASE 3.19.11 |
Component: | core | Version: | |
Keywords: | Cc: |
Description (last modified by )
Future development of BASE may remove things that are not used much anymore. For example, raw bioassays with raw data imported into the database, experiments, array lims, etc.
We have a lot of data at the raw bioassay level but they are also a bit problematic since it is a dead end in the sense that it is not possible to create more child items for other analysis. Until now it has been solved by adding more files and/or annotations to an existing raw bioassay.
It would be a lot more flexible if we could move existing raw bioassays to the derived bioassay level instead. In theory it could be done with batch importers, but in practice it would be better to implement a tool for that.
The general idea is to create a derived bioassay copy of each raw bioassay. The platform/variant/rawdatatype are used to map to a subtype. Existing annotations, files, any-to-any links, etc. are re-linked to the new derived bioassays (they will no longer be available on the raw bioassay).
Below is more detailed description (not yet complete):
Database columns
RawBioAssays | DerivedBioAssays | Comment |
id | id | A new ID is generated |
version | version | Copy |
diskusage_id | - | Not used |
annotationset_id | annotationset_id | Copy and clear |
fileset_id | fileset_id | Copy and clear |
entry_date | entry_date | Copy |
platform_id variant_id rawdatatype | subtype_id | Platform and rawdata type is mapped to a subtype |
job_id | job_id | Copy |
protocol_id | protocol_id | Copy |
software_id | software_id | Copy |
arraydesign_id | - | Create an AnyToAny-link |
bioassay_id | - | Link via ParentDerivedBioassays and ParentPhysicalBioAssays |
extract_id | extract_id | Copy |
name | name | Copy |
description | description | Copy |
removed_by | removed_by | Copy |
itemkey_id | itemkey_id | Copy |
projectkey_id | projectkey_id | Copy |
owner | owner | Copy |
has_data | - | Not used |
spots | - | Not used |
file_spots | - | Not used |
bytes | - | Not used |
- | is_root | false |
- | kit_id | null |
- | hardware_id | null |
Annotations
Annotations can be moved to the new derived bioassay by updating the AnnotationSets table with the new id.
AnnotationSets | Comment |
id | Keep |
version | +1 |
item_type | Change 264 (=RAWBIOASSAY) to 268 (=DERIVEDBIOASSAY) |
item_id | Change to new id |
NOTE! Cached annotations (in the static.cache/snapshots-v5
directory) must be deleted. ~The simplest thing is to delete the entire directory and everything in it.~ It is easy to delete from the code since we already have SnapshotManager.removeSnapshots()
method.
NOTE! Annotation types that are enabled for raw bioassays but not derived bioassays need to be updated. We can implement this in the tool as well. We would need to insert an entry into the AnnotationTypeItems table for each annotation type:
AnnotationTypeItems | Comment |
annotationtype_id | Id of annotation type |
item_type | 268 |
Files
Files can be moved to the new derived bioassay by updating the FileSets table with the new id.
FileSets | Comment |
id | Keep |
version | +1 |
item_type | Change 264 (=RAWBIOASSAY) to 268 (=DERIVEDBIOASSAY) |
But, we also need to address the fact that the file types of the members in the file set must match the new item type. We can either change the file types to new types or we can change the item type on the existing file types. In the first case we update the FileSetMembers table:
FileSetMembers | Comment |
id | Keep |
version | +1 |
fileset_id | Keep |
datafiletype_id | Update to new type |
other columns | Keep |
If there are any remaining file types for raw bioassays that has been migrated we need to update them to derived bioassays instead. This is in the DataFileTypes table:
DataFileTypes | Comment |
id | Keep |
version | +1 |
item_type | Change 264 (=RAWBIOASSAY) to 268 (=DERIVEDBIOASSAY) |
other columns | Keep |
Note that we don't do this for all file types, but only for the types that are used by the migrated raw bioassays.
Any-to-any links
Links are moved to the new derived bioassay by updating the AnyToAny table. We need to check both the source and target of the links.
AnyToAny | Comment |
id | Keep |
version | +1 |
name | Keep |
description | Keep |
from_id | Keep or change to new id |
from_type | Change 264 (=RAWBIOASSAY) to 268 (=DERIVEDBIOASSAY) |
to_id | Keep or change to new id |
to_type | Change 264 (=RAWBIOASSAY) to 268 (=DERIVEDBIOASSAY) |
uses_to | Keep |
Change history
The change history is moved to the new derived bioassay by updating the ChangeHistoryDetails table. This will leave the old raw bioassay without a change history. I think we should insert a new entry representing the migration. We should also insert a new entry for the derived bioassay.
ChangeHistoryDetails | Comment |
id | Keep |
version | +1 |
history_id | Keep |
change_type | Keep |
item_id | Change to new id |
item_type | Change 264 (=RAWBIOASSAY) to 268 (=DERIVEDBIOASSAY) |
change_info | Keep |
old_value | Keep |
new_value | Keep |
A new entry in the ChangeHistory table is created that represents the migration:
ChangeHistory | Comment |
id | Generated |
version | 0 |
time | Current timestamp |
user_id | Id of root user |
session_id | Id of current session |
client_id | Id of a new client (net.sf.basedb.clients.rba2dba-migration) |
project_id | Null |
plugin_id | Null |
job_id | Null |
name | Migrate raw bioassays to derived bioassays |
New entries for the raw bioassay and the new derived bioassay in the ChangeHistoryDetails table:
ChangeHistoryDetails | Comment |
id | Generated |
version | 0 |
history_id | Id of current history |
change_type | 2 (=UPDATE) |
item_id | Id of raw bioassay or derived bioassay |
item_type | 264 (=RAWBIOASSAY) or 268 (=DERIVEDBIOASSAY) |
change_info | Migrated <name-of-rba> from raw bioassay to derived bioassay |
old_value | Null |
new_value | Null |
Job parameters
Some jobs have items as parameters and the items can be raw bioassays. This information is stored in the ItemValues table.
ItemValues | Comment |
id | Keep |
data_class | Change net.sf.basedb.core.data.RawBioAssayData to net.sf.basedb.core.data.DerivedBioAssayData
|
data_class_id | Change to new id |
Item lists
All item lists with raw bioassays as members are converted to item lists with derived bioassays. We need to change entries in the ItemListMembers table:
ItemListMembers | Comment |
item_id | Change to new id |
list_id | Keep |
And in the ItemLists table:
ItemLists | Comment |
version | +1 |
member_type | Change 264 (=RAWBIOASSAY) to 268 (=DERIVEDBIOASSAY) |
subtype_id | Rawdata type is mapped to a subtype |
rawdatatype | Null |
size | Updated to match count |
All other columns | Keep |
NOTE! Synchronization filters are not updated since it is not possible to automatically make a working filter in all cases. It may not be enough to just change the type from raw bioassay to derived bioassay.
All item lists that have at least one synchronization filter that is used on the raw bioassay level are marked with {:}
to make them easy to spot in the web interface. The filters need to be manually updated and when they have been fixed the marking can be removed.
Change History (26)
comment:1 by , 8 months ago
Description: | modified (diff) |
---|
comment:2 by , 8 months ago
comment:8 by , 8 months ago
Description: | modified (diff) |
---|
comment:11 by , 8 months ago
Description: | modified (diff) |
---|
comment:12 by , 8 months ago
Description: | modified (diff) |
---|
comment:13 by , 8 months ago
Description: | modified (diff) |
---|
comment:15 by , 8 months ago
Description: | modified (diff) |
---|
comment:20 by , 8 months ago
Description: | modified (diff) |
---|
comment:22 by , 8 months ago
Description: | modified (diff) |
---|
comment:26 by , 7 months ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
In 8200: