Opened 6 months ago

Last modified 5 months ago

#2321 closed task

Implement a tool for migrating raw bioassays to derived bioassays — at Version 1

Reported by: Nicklas Nordborg Owned by: everyone
Priority: major Milestone: BASE 3.19.11
Component: core Version:
Keywords: Cc:

Description (last modified by Nicklas Nordborg)

Future development of BASE may remove things that are not used much anymore. For example, raw bioassays with raw data imported into the database, experiments, array lims, etc.

We have a lot of data at the raw bioassay level but they are also a bit problematic since it is a dead end in the sense that it is not possible to create more child items for other analysis. Until now it has been solved by adding more files and/or annotations to an existing raw bioassay.

It would be a lot more flexible if we could move existing raw bioassays to the derived bioassay level instead. In theory it could be done with batch importers, but in practice it would be better to implement a tool for that.

The general idea is to create a derived bioassay copy of each raw bioassay. The platform/variant/rawdatatype are used to map to a subtype. Existing annotations, files, any-to-any links, etc. are re-linked to the new derived bioassays (they will no longer be available on the raw bioassay).

Below is more detailed description (not yet complete):

Database columns

RawBioAssays DerivedBioAssays Comment
id id A new ID is generated
version version Copy
diskusage_id - Not used
annotationset_id annotationset_id Copy and clear
fileset_id fileset_id Copy and clear
entry_date entry_date Copy
platform_id
variant_id
rawdatatype
subtype_id Platform and rawdata type is mapped to a subtype
job_id job_id Copy
protocol_id protocol_id Copy
software_id software_id Copy
arraydesign_id - Create an AnyToAny-link
bioassay_id - Link via ParentDerivedBioassays and ParentPhysicalBioAssays
extract_id extract_id Copy
name name Copy
description description Copy
removed_by removed_by Copy
itemkey_id itemkey_id Copy
projectkey_id projectkey_id Copy
owner owner Copy
has_data - Not used
spots - Not used
file_spots - Not used
bytes - Not used
- is_root false
- kit_id null
- hardware_id null

Annotations

Annotations can be moved to the new derived bioassay by updating the AnnotationSets table with the new id.

AnnotationSets Comment
id Keep
version Keep
item_type Change 264 (=RAWBIOASSAY) to 268 (=DERIVEDBIOASSAY)
item_id Change to new id

NOTE! Cached annotations (in the static.cache/snapshots-v5 directory) must be deleted. The simplest thing is to delete the entire directory and everything in it.

NOTE! Annotation types that are enabled for raw bioassays but not derived bioassays need to be updated. We can either let the admin handle that manually or it might be possible to implement this in the tool as well. We would need to insert an entry into the AnnotationTypeItems table for each annotation type:

AnnotationTypeItems Comment
annotationtype_id Id of annotation type
item_type 268

Files

Files can be moved to the new derived bioassay by updating the FileSets table with the new id.

FileSets Comment
id Keep
version Keep
item_type Change 264 (=RAWBIOASSAY) to 268 (=DERIVEDBIOASSAY)

Links are moved to the new derived bioassay by updating the AnyToAny table. We need to check both the source and target of the links.

AnyToAny Comment
id Keep
version Keep
name Keep
description Keep
from_id Keep or change to new id
from_type Change 264 (=RAWBIOASSAY) to 268 (=DERIVEDBIOASSAY)
to_id Keep or change to new id
to_type Change 264 (=RAWBIOASSAY) to 268 (=DERIVEDBIOASSAY)
uses_to Keep

Change history

The change history is moved to the new derived bioassay by updating the ChangeHistoryDetails table. This will leave the old raw bioassay without a change history. I think we should insert a new entry representing the migration. We should also insert a new entry for the derived bioassay.

ChangeHistoryDetails Comment
id Keep
version Keep
history_id Keep
change_type Keep
item_id Change to new id
item_type Change 264 (=RAWBIOASSAY) to 268 (=DERIVEDBIOASSAY)
change_info Keep
old_value Keep
new_value Keep

A new entry in the ChangeHistory table is created that represents the migration:

ChangeHistory Comment
id Generated
version 0
time Current timestamp
user_id Id of root user
session_id Id of current session
client_id Id of a new client (net.sf.basedb.clients.rba2dba-migration)
project_id Null
plugin_id Null
job_id Null
name Migrate raw bioassays to derived bioassays

New entries for the raw bioassay and the new derived bioassay in the ChangeHistoryDetails table:

ChangeHistoryDetails Comment
id Generated
version 0
history_id Id of current history
change_type 8 (A new type is defined)
item_id Id of raw bioassay or derived bioassay
item_type 264 (=RAWBIOASSAY) or 268 (=DERIVEDBIOASSAY)
change_info Migrated <name-of-rba> from raw bioassay to derived bioassay
old_value Null
new_value Null

Change History (1)

comment:1 by Nicklas Nordborg, 6 months ago

Description: modified (diff)
Note: See TracTickets for help on using tickets.