This document describes the experiment and analysis part of BASE, including the dynamic tables that are generated as part of each experiment.
See also
The ExperimentData
[API]
class is used to collect information about a single experiment. It links
to any number of RawBioAssayData
[API]
items, which must all be of the same RawDataType
[API].
Annotation types that are needed in the analysis must connected as experimental factors to the experiment and the annotations should be inherited to each raw bioassay.
The directory connected to the experiment is the default directory where plugins that generate files should store them.
Each line of analysis starts with the creation of a root BioAssaySetData
[API],
which holds the intensities calculated from the raw data. A bioassayset can hold
one intensity for each channel. The number of channels is defined by the raw data type.
For each raw bioassay used a BioAssayData
[API]
is created.
Information about the process that calculated the intensities are stored in
a TransformationData
[API]
object. The root transformation links with the raw bioassays that are used in this
line of analysis and to a Job
[API]
which holds the plugin and parameters used in the calculation. A typical plugin
may calculate the intensities by subtracting the mean background from the
mean foreground.
Once the root bioassayset has been created it is possible to again apply a
transformation to it. This time the TransformationData
links to
the source bioassayset instead of the raw bioassays. As before, it still
links to a job which defines the plugin that does the actual work. The transformation
must make sure that new bioassays are created and linked to the bioassays in
the source bioassayset.
The above process may be repeated as many times as needed.
The above processes requires a flexible storage solution for the data.
The VirtualDbData
[API]
object represents a set of tables
that are created in the dynamic part of the database. In MySQL the dynamic
part is a separate database, in other databases it may be tables with a certain
prefix, or in another schema. One set of dynamic tables are needed for each experiment.
The DataCubeData
[API]
is the main item used in the analysis. A coordinate in the cube is given by
a layer, a column and a position, represented by DataCubeLayerData
[API]
and DataCubeColumnData
[API]
object. The position has no separate object. At each coordinate we can store the
intensities and extra values related to the intensities.
All data for a bioassayset is stored in a single layer, so we can say that
a layer is equivalent to a bioassayset. It is, however, possible that two
bioassaysets use the same layer. For example, a filter transformation that
doesn't modify the data can reuse the same layer if the new bioassayset is linked
to a DataCubeFilterData
[API]
item. The filter tells which coordinates should remain in the bioassayset.
All data for a bioassay is stored in a single column, so we can say that a column is equivalent to a bioassay. Two bioassays in different bioassaysets can only have the same column (but different layers) if one is the parent of the other.
The last coordinate, position, is tied to a single reporter. It is also possible to add extra data for a position.
It may happen that a transformation is such that the new bioassayset
doesn't fit in same cube. For example, if we merge bioassays and/or reporters.
In this case we must create a new cube and add mapping information to
a DataCubeMappingData
[API]
The mapping is a many-to-many mapping between coordinates in the source and destination
cubes. This allows us to track the source coordinates that were used to calculate
the values for a specific destination coordinate.
The tables in a relational database is two-dimensional, the data cube discussed
above is actually four-dimensional. So we need a smart way to map the data cube
to a set of tables. For each table shown in the diagram the # sign is replaced
by the id of the VirtualDbData
object at run time, thus creating
a set of unique tables for each root bioassayset.
There are no classes in the data layer for these tables and they are not mapped with Hibernate. When we work with these tables we are always using batcher classes and queries that works with integer, floats and strings.
This is the main table which keeps the intensities for a single spot in
the data cube. Extra values attached to the spot are kept in separate
tables, one for each type of value (Dynamic#SpotExtraInt
,
Dynamic#SpotExtraFloat
and Dynamic#SpotExtraString
).
This table stores the reporter id for each position in a cube.
Extra values attached to the position are kept in separate tables,
one for each type of value (Dynamic#PositionExtraInt
,
Dynamic#PositionExtraFloat
and
Dynamic#PositionExtraString
).
This table stores the coordinates for the spots that remain after
filtering. Note that each filter is related to a bioassayset which
gives the cube and layer values. Each row in the filter table then
adds the column and position allowing us to find the spots in the
Dynamic#PerSpot
table.
This table holds mappings for a spot to the raw data it is calculated from. We don't need the layer coordinate since all layers in a cube must have the same mapping to raw data.
This table holds mappings for spots between two cubes. Note that each mapping is related to a source and destination cube. Each row in the table then adds the column and position. We don't need the layer coordinate since all layers in a cube must have the same mapping to the other cube.