Summary of changes and additions for supporting sequencing ========================================================== 1. Remove [LabeledExtract] and [Label]. Existing labeled extracts are converted to [Extract] items with a subtype. Existing labels are converted to [Tag] items with a subtype. We define two subtypes for tags ('Label' and 'Barcode'), and two subtypes for extracts ('LabeledExtract' and 'Library'). Extracts can be tagged with a [Tag]. What about protocol types? What do we do with the current LABELING protocol type? It would be useful to define more protocol types, for example, 'Library preparation'. But how do we know which protocol type is the correct one? Can we add a link between [ItemSubtype] and [ProtcolType] which means that when an item of the given subtype is created the protocol should be from the linked protocol type? Do we need some kind of 'mode' setting in the GUI so that it uses the correct terminology as much as possible? The 'mode' setting could also control parts of the 'Validate' functionality in the 'Item overview' which may need to work differently. Particularly all rules for 'number of channels' which are only needed for microarray experiments. 2. Changes in [BioMaterialEvent] A new entity class [BioMaterialParent] is introduced instead of the "anonymous" link-table 'BioMaterialEventSources'. This should make it possible to get rid of the [UsedQuantity] "fulhack" that was used to support multi-array slides. Existing information can easily be moved to the new tables. The parent and pooled properties are modified so that the parent may hold a SINGLE biomaterial of the same type or of the parent type. The pooled flag should only be used when there are two or more parents (of the same type). The 'Hybridization' event type is changed to 'BioAssayCreation'. The link between [BioMaterialEvent] and [Hybridization] is replaced with a link to [PhysicalBioAssay]. 3. New entity class [PhysicalBioAssay] that replaces [Hybridization] Existing hybridizations are converted to [PhysicalBioAssay] items with a subtype. We define two subtypes: 'Hybridization' and '???'. The [PhysicalBioAssay] should implement [FileStoreEnabled] so that we can link files to it. New [ProtocolType]: 'Sequencing' New [HardwareType]: 'Sequencing station' New [Hardware]: 'HiSeq 2000' See also discussion above about protocol types and item subtypes. 4. Changes for [FileSet] and related classes [FileSetMember] is made into an [Annotatable] item so that we can add annotations on files. [FileSet] is modified so that it becomes possible to add more than one file for each [DataFileType]. But this is controlled by a flag ('allowMultiple'). See ticket #1604 for more information about this. 5. New entity classes [BioAssayEvent], [DerivedBioAssaySet] and [DerivedBioAssay] that replaces [Scan] and [Image] The [BioAssayEvent] and [DerivedBioAssaySet] makes up a loop that is started from a [PhysicalBioAssay] and ends with a [RawBioAssay]. This loop is similar to the loop with [Transformation] and [BioAssaySet] in the existing analysis section. Existing [Scan] and [Image] data is moved into a single "iteration" of that loop. The scan data is split between the bioassay event (protocol, scanner, date) and the derived bioassay set (name, description, owner). One or more derived bioassays are also created with links back to the biomaterial they are related to. Image data is moved to the file set of the derived bioassay set with the properties (jpeg, tiff, etc.) added as annotations. The new classes can be used to represent the multiple steps that are required before sequenced data can be boiled down to something that is similar to expression data. A bioassay event can be linked with [Job], [Protocol], [Hardware] and [Software]. It should be possible to create iterations both manually and with plug-ins. The existing 'Analysis' [PluginType] is re-used. Since plug-ins are required to implement context-checking there shouldn't be any risk of mixing things up. For gui things we probably need a new plug-in similar to the 'Manual transform' plug-in that exists for the experiment analysis section. Maybe we can provide configurations for some of the software mentioned below. New [Software]: BCL Converter, Casava, Bowtie, Myrna, Tophat, Cufflinks, and more??? New [DataFileType]: bcl, cif, fastq, qseq, bam, sam, and more??? Which ones do we really want to store/reference through BASE? New [FileType]: ???? New [Platform]/[PlatformVariant]: ??? Or should most of this be in an extensions package similar to the existing Illumina package?