Opened 7 months ago

Last modified 6 months ago

#2321 closed task

Implement a tool for migrating raw bioassays to derived bioassays — at Version 12

Reported by: Nicklas Nordborg Owned by: everyone
Priority: major Milestone: BASE 3.19.11
Component: core Version:
Keywords: Cc:

Description (last modified by Nicklas Nordborg)

Future development of BASE may remove things that are not used much anymore. For example, raw bioassays with raw data imported into the database, experiments, array lims, etc.

We have a lot of data at the raw bioassay level but they are also a bit problematic since it is a dead end in the sense that it is not possible to create more child items for other analysis. Until now it has been solved by adding more files and/or annotations to an existing raw bioassay.

It would be a lot more flexible if we could move existing raw bioassays to the derived bioassay level instead. In theory it could be done with batch importers, but in practice it would be better to implement a tool for that.

The general idea is to create a derived bioassay copy of each raw bioassay. The platform/variant/rawdatatype are used to map to a subtype. Existing annotations, files, any-to-any links, etc. are re-linked to the new derived bioassays (they will no longer be available on the raw bioassay).

Below is more detailed description (not yet complete):

Database columns

RawBioAssays DerivedBioAssays Comment
id id A new ID is generated
version version Copy
diskusage_id - Not used
annotationset_id annotationset_id Copy and clear
fileset_id fileset_id Copy and clear
entry_date entry_date Copy
platform_id
variant_id
rawdatatype
subtype_id Platform and rawdata type is mapped to a subtype
job_id job_id Copy
protocol_id protocol_id Copy
software_id software_id Copy
arraydesign_id - Create an AnyToAny-link
bioassay_id - Link via ParentDerivedBioassays and ParentPhysicalBioAssays
extract_id extract_id Copy
name name Copy
description description Copy
removed_by removed_by Copy
itemkey_id itemkey_id Copy
projectkey_id projectkey_id Copy
owner owner Copy
has_data - Not used
spots - Not used
file_spots - Not used
bytes - Not used
- is_root false
- kit_id null
- hardware_id null

Annotations

Annotations can be moved to the new derived bioassay by updating the AnnotationSets table with the new id.

AnnotationSets Comment
id Keep
version +1
item_type Change 264 (=RAWBIOASSAY) to 268 (=DERIVEDBIOASSAY)
item_id Change to new id

NOTE! Cached annotations (in the static.cache/snapshots-v5 directory) must be deleted. ~The simplest thing is to delete the entire directory and everything in it.~ It is easy to delete from the code since we already have SnapshotManager.removeSnapshots() method.

NOTE! Annotation types that are enabled for raw bioassays but not derived bioassays need to be updated. We can implement this in the tool as well. We would need to insert an entry into the AnnotationTypeItems table for each annotation type:

AnnotationTypeItems Comment
annotationtype_id Id of annotation type
item_type 268

Files

Files can be moved to the new derived bioassay by updating the FileSets table with the new id.

FileSets Comment
id Keep
version +1
item_type Change 264 (=RAWBIOASSAY) to 268 (=DERIVEDBIOASSAY)

Links are moved to the new derived bioassay by updating the AnyToAny table. We need to check both the source and target of the links.

AnyToAny Comment
id Keep
version +1
name Keep
description Keep
from_id Keep or change to new id
from_type Change 264 (=RAWBIOASSAY) to 268 (=DERIVEDBIOASSAY)
to_id Keep or change to new id
to_type Change 264 (=RAWBIOASSAY) to 268 (=DERIVEDBIOASSAY)
uses_to Keep

Change history

The change history is moved to the new derived bioassay by updating the ChangeHistoryDetails table. This will leave the old raw bioassay without a change history. I think we should insert a new entry representing the migration. We should also insert a new entry for the derived bioassay.

ChangeHistoryDetails Comment
id Keep
version +1
history_id Keep
change_type Keep
item_id Change to new id
item_type Change 264 (=RAWBIOASSAY) to 268 (=DERIVEDBIOASSAY)
change_info Keep
old_value Keep
new_value Keep

A new entry in the ChangeHistory table is created that represents the migration:

ChangeHistory Comment
id Generated
version 0
time Current timestamp
user_id Id of root user
session_id Id of current session
client_id Id of a new client (net.sf.basedb.clients.rba2dba-migration)
project_id Null
plugin_id Null
job_id Null
name Migrate raw bioassays to derived bioassays

New entries for the raw bioassay and the new derived bioassay in the ChangeHistoryDetails table:

ChangeHistoryDetails Comment
id Generated
version 0
history_id Id of current history
change_type 2 (=UPDATE)
item_id Id of raw bioassay or derived bioassay
item_type 264 (=RAWBIOASSAY) or 268 (=DERIVEDBIOASSAY)
change_info Migrated <name-of-rba> from raw bioassay to derived bioassay
old_value Null
new_value Null

Job parameters

Some jobs have items as parameters and the items can be raw bioassays. This information is stored in the ItemValues table.

ItemValues Comment
id Keep
data_class Change net.sf.basedb.core.data.RawBioAssayData to net.sf.basedb.core.data.DerivedBioAssayData
data_class_id Change to new id

Change History (12)

comment:1 by Nicklas Nordborg, 6 months ago

Description: modified (diff)

comment:2 by Nicklas Nordborg, 6 months ago

In 8200:

References #2321: Implement a tool for migrating raw bioassays to derived bioassays

Added an entry point to the OneTimeFix implementation. The migration can be started with onetimefix.sh migrate_rba2dba -u <root> -p <password> -c <config>

comment:3 by Nicklas Nordborg, 6 months ago

In 8201:

References #2321: Implement a tool for migrating raw bioassays to derived bioassays

Started to implement the migration utility. It currently count all raw bioassays and load the information about them that is needed. Nothing is yet created or moved.

comment:4 by Nicklas Nordborg, 6 months ago

In 8202:

References #2321: Implement a tool for migrating raw bioassays to derived bioassays

Implemented creation of derived bioassay.

comment:5 by Nicklas Nordborg, 6 months ago

In 8203:

References #2321: Implement a tool for migrating raw bioassays to derived bioassays

Implemented re-linking annotation sets to the new derived bioassay.

comment:6 by Nicklas Nordborg, 6 months ago

In 8204:

References #2321: Implement a tool for migrating raw bioassays to derived bioassays

Adding links from the new derived bioassay to the parent derived bioassay and physical bioassays.

comment:7 by Nicklas Nordborg, 6 months ago

In 8205:

References #2321: Implement a tool for migrating raw bioassays to derived bioassays

Link the file set to the new derived bioassay.

comment:8 by Nicklas Nordborg, 6 months ago

Description: modified (diff)

comment:9 by Nicklas Nordborg, 6 months ago

In 8206:

References #2321: Implement a tool for migrating raw bioassays to derived bioassays

Move any-to-any links to the new derived bioassays. Needed an extra index on the 'from_type' and 'from_id' columns to make the update quer perform well.

comment:10 by Nicklas Nordborg, 6 months ago

In 8207:

References #2321: Implement a tool for migrating raw bioassays to derived bioassays

Move the change history to the new derived bioassay. There was an extra complication with the two log entries that are added as part of the migration since we do not want to move them to the derived bioassay. Solved by focring a flush() before the log entries are added.

comment:11 by Nicklas Nordborg, 6 months ago

Description: modified (diff)

comment:12 by Nicklas Nordborg, 6 months ago

Description: modified (diff)
Note: See TracTickets for help on using tickets.