Experiments and analysis

NOTE! This document is outdated and has been replaced with newer documentation. See The database schema and the Data Layer API

This document describes the experiment and analysis part of BASE, including the dynamic tables that are generated as part of each experiment.

See also

Last updated: $Date: 2009-04-06 14:52:39 +0200 (må, 06 apr 2009) $


Experiment

The ExperimentData [API] class is used to collect information about a single experiment. It links to any number of RawBioAssayData [API] items, which must all be of the same RawDataType [API].

Annotation types that are needed in the analysis must connected as experimental factors to the experiment and the annotations should be inherited to each raw bioassay.

The directory connected to the experiment is the default directory where plugins that generate files should store them.

BioAssaySet, BioAssay and Transformation

Each line of analysis starts with the creation of a root BioAssaySetData [API], which holds the intensities calculated from the raw data. A bioassayset can hold one intensity for each channel. The number of channels is defined by the raw data type. For each raw bioassay used a BioAssayData [API] is created.

Information about the process that calculated the intensities are stored in a TransformationData [API] object. The root transformation links with the raw bioassays that are used in this line of analysis and to a Job [API] which holds the plugin and parameters used in the calculation. A typical plugin may calculate the intensities by subtracting the mean background from the mean foreground.

Once the root bioassayset has been created it is possible to again apply a transformation to it. This time the TransformationData links to the source bioassayset instead of the raw bioassays. As before, it still links to a job which defines the plugin that does the actual work. The transformation must make sure that new bioassays are created and linked to the bioassays in the source bioassayset.

The above process may be repeated as many times as needed.

VirtualDb and DataCube

The above processes requires a flexible storage solution for the data. The VirtualDbData [API] object represents a set of tables that are created in the dynamic part of the database. In MySQL the dynamic part is a separate database, in other databases it may be tables with a certain prefix, or in another schema. One set of dynamic tables are needed for each experiment.

The DataCubeData [API] is the main item used in the analysis. A coordinate in the cube is given by a layer, a column and a position, represented by DataCubeLayerData [API] and DataCubeColumnData [API] object. The position has no separate object. At each coordinate we can store the intensities and extra values related to the intensities.

All data for a bioassayset is stored in a single layer, so we can say that a layer is equivalent to a bioassayset. It is, however, possible that two bioassaysets use the same layer. For example, a filter transformation that doesn't modify the data can reuse the same layer if the new bioassayset is linked to a DataCubeFilterData [API] item. The filter tells which coordinates should remain in the bioassayset.

All data for a bioassay is stored in a single column, so we can say that a column is equivalent to a bioassay. Two bioassays in different bioassaysets can only have the same column (but different layers) if one is the parent of the other.

The last coordinate, position, is tied to a single reporter. It is also possible to add extra data for a position.

It may happen that a transformation is such that the new bioassayset doesn't fit in same cube. For example, if we merge bioassays and/or reporters. In this case we must create a new cube and add mapping information to a DataCubeMappingData [API] The mapping is a many-to-many mapping between coordinates in the source and destination cubes. This allows us to track the source coordinates that were used to calculate the values for a specific destination coordinate.

The dynamic tables

The tables in a relational database is two-dimensional, the data cube discussed above is actually four-dimensional. So we need a smart way to map the data cube to a set of tables. For each table shown in the diagram the # sign is replaced by the id of the VirtualDbData object at run time, thus creating a set of unique tables for each root bioassayset.

There are no classes in the data layer for these tables and they are not mapped with Hibernate. When we work with these tables we are always using batcher classes and queries that works with integer, floats and strings.

Dynamic#PerSpot

This is the main table which keeps the intensities for a single spot in the data cube. Extra values attached to the spot are kept in separate tables, one for each type of value (Dynamic#SpotExtraInt, Dynamic#SpotExtraFloat and Dynamic#SpotExtraString).

Dynamic#PerPostion

This table stores the reporter id for each position in a cube. Extra values attached to the position are kept in separate tables, one for each type of value (Dynamic#PositionExtraInt, Dynamic#PositionExtraFloat and Dynamic#PositionExtraString).

Dynamic#Filter

This table stores the coordinates for the spots that remain after filtering. Note that each filter is related to a bioassayset which gives the cube and layer values. Each row in the filter table then adds the column and position allowing us to find the spots in the Dynamic#PerSpot table.

Dynamic#RawParents

This table holds mappings for a spot to the raw data it is calculated from. We don't need the layer coordinate since all layers in a cube must have the same mapping to raw data.

Dynamic#SpotParents

This table holds mappings for spots between two cubes. Note that each mapping is related to a source and destination cube. Each row in the table then adds the column and position. We don't need the layer coordinate since all layers in a cube must have the same mapping to the other cube.