Summary of changes and additions for supporting sequencing ========================================================== 1. New type of [MeasuredBioMaterial]: [Library] This is a new Java-class but is saved to the same database table as the other biomaterial types (discriminator-value=5) The parent type is [Extract] A new [ProtocolType] is created by the installation: 'Library preparation' A [Barcode] is needed when mixing multiple biomaterials on the same lane. I have three options: a) A new entity class: [Barcode] b) Re-use [Label] c) Use annotation on [Library] Which one should we choose? I guess it depends on what we need to store. Do we need more than what can currently be stored for a label (name and description)? An annotation can only store a single value. If we re-use label, should we rename it to something more generic? For example, [Tag] which may have a [TagType] attribute (eg. 'Label', 'Barcode', etc...). Hmmm... could we take this to the 'exteme' and rename [LabeledExtract] instead of creating [Library]? For example, [TaggedExtract] has a [Tag] which has a [TagType] that tells us what it is. Combined with the other ticket (#1597: Subtypes of biomaterial items) this may be a more future-proof solution... Do we need some kind of 'mode' setting in the GUI so that it uses the correct terminology as much as possible? The 'mode' setting could also control parts of the 'Validate' functionality in the 'Item overview' which may need to work differently. Particularly all rules for 'number of channels' which are only needed for microarray experiments. 2. Changes in [BioMaterialEvent] A new entity class [BioMaterialEventParticipant] is introduced instead of the "anonymous" link-table 'BioMaterialEventSources'. This should make it possible to get rid of the [UsedQuantity] "fulhack" that was used to support multi-array slides. Existing information can easily be moved to the new tables. We may need a new [BioMaterialEventType] (eg. 'Sequencing'). Or is it better to change the name of the 'Hybridization' event to, for example, 'BioAssayCreation'? 3. New entity class [PhysicalBioAssay] that replaces [Hybridization] All hybridization data is moved to the new database table. The information it can hold is more or less the same as before, but it can be linked with any type of [MeasuredBioMaterial] and not just [LabeledExtract]. A [PhysicalBioAssay] should be classified by a type ('Hybridization', 'Sequencing', etc.). This could be an entity class of it's own or a property/enum or we can know this from the 'creationEvent' type. The [PhysicalBioAssay] should implement [FileStoreEnabled] so that we can link files to it. New [ProtocolType]: 'Sequencing' New [HardwareType]: 'Sequencing station' New [Hardware]: 'HiSeq 2000' 4. Changes for [FileSet] and related classes [FileSetMember] is made into an [Annotatable] item so that we can add annotations on files. [FileSet] is modified so that it becomes possible to add more than one file for each [DataFileType]. But this is controlled by a flag ('allowMultiple'). 5. New entity classes [BioAssayEvent], [DerivedBioAssaySet] and [DerivedBioAssay] that replaces [Scan] and [Image] The [BioAssayEvent] and [DerivedBioAssaySet] makes up a loop that is started from a [PhysicalBioAssay] and ends with a [RawBioAssay]. This loop is similar to the loop with [Transformation] and [BioAssaySet] in the existing analysis section. Existing [Scan] and [Image] data is moved into a single "iteration" of that loop. The scan data is split between the bioassay event (protocol, scanner, date) and the derived bioassay set (name, description, owner). One or more derived bioassays are also created with links back to the biomaterial they are related to. Image data is moved to the file set of the derived bioassay set with The new classes can be used to represent the multiple steps that are required before sequenced data can be boiled down to something that is similar to expression data. A bioassay event can be linked with [Job], [Protocol], [Hardware] and [Software]. It should be possible to create iterations both manually and with plug-ins. Do we need a new [PluginType] or can 'Analysis' be re-used? We already implement context-checking so there shouldn't be any risk of mixing things up. New [Software]: BCL Converter, Casava, Bowtie, Myrna, Tophat, Cufflinks, and more??? New [DataFileType]: bcl, cif, fastq, qseq, bam, sam, and more??? Which ones do we really want to store/reference through BASE? New [FileType]: ???? New [Platform]/[PlatformVariant]: ??? Or should most of this be in an extensions package similar to the existing Illumina package?