A hybridization is attached to a list
set of labeled extracts.
The same labeled extract may be used several times.
The position
in the list does not mean anything to BASE, but may be used by
plugins subsequently used to create derived data from raw data.
A hybridization may be attached to an array slide.
The hybridization may be dissociated from the array slide and the
labeled extracts at any time.
A hybridization protocol must may be picked.
A hybridiation may be annotated, but annotations on its
labeled extracts are not transferred to it.
A scan (or image acquisition) represents the scanning of a slide.
A hybridization may have any number of scans.
A scan is associated with a scanner as well as with a scanning
protocol.
Images may be attached to a scan.
An image consists of a pointer to an uploaded file,
information about what channel(s) it has to do with,
what format it's in (TIFF or JPEG), whether it's a preview or the full
image, and whether it should be used for generating spot images.
A raw data set describes the result of applying some software
to a set of images (obtained from scanning a microarray) in order
to quantify the spots and identify them with features or reporters.
This includes the generated spot quantifications, which we refer
to as raw data.
A raw data set normally belongs to a scan, but it should also
be possible to create raw data sets with no connection to a
scan.
It should be possible to attach a scan-less raw data set to a
scan at a later stage.
A raw data set is can be associated with an software item.
The file(s) generated by the software should be attached to
the raw data set. Typically this is one file, which we refer to
as a raw result file. [NOTE] As it is implemented, only
one file can be attached.
A raw data set may point to the array design it has to do
with, if any, but only if the array design has features.
The array design of a raw data set is typically that of its
hybridization's array slide, but it doesn't have to be. A raw
data set created without connection to a scan may still point to
an array design.
Because different software produces different sets of spot
measurements, it should be possible to define new types of raw
data.
There is a single table which is used for all raw data types,
in which information common to all types is stored.
The columns common to all types of raw data are at least:
id of the raw data set
position in the raw data set (typically N for the Nth spot
in a raw data file)
id of the reporter thought to occupy the spot
id of the feature which corresponds to this spot, if any.
This is only allowed if the raw data set has an array design,
and the features must match the spot's coordinates and reporter.
physical coordinates of the spot (possibly in pixels)
grid coordinates of the spot, including meta coordinates.
user-provided flagging (see below)
For each type of raw data, there is one table with type-specific
data. This table is described in detailed in the database. For
each column, the following is recorded:
Column name and type
Whether the column holds an intensity, a standard deviation,
or neither
Whether the column holds a foreground value, a background value,
or neither
Whether the column holds a mean or median or neither
An optional label id, in the case that the raw data type
concerns itself with labels.
In the table with raw data type specific columns, the spots should
be identified by raw data set and position. We use the id of the rawdata entry.
The raw data set should store not only what type of raw
data it contains, but also which of the type-specific and
non-type-specific columns it uses.
Raw spots may be flagged/commented by users. There should be a
table with possible comments (modifiable by some users only), and
each spot may point to such a comment. This is the only property of
a raw spot that may change after the raw data set is added. The raw
data set should know the datetime of the last change to one of its
spots.
By spot images we mean small images of the individual spots
of a raw data set, meant to convey information about the
morphology of spots. These images are meant to be shown to users,
often many at a time from an arbitrary set of spots.
It should be possible to generate spot images from a raw data
set whose spots have physical coordinates specified, if it is
connected to a scan which has sufficient images attached, and
if it has no more than three channels. If it has more than three
channels a user may select up to three images for the spot
image generation.
By sufficient images we mean one high-resolution TIFF image
per channel, possibly stored in a single file.
The user may need to enter information about how to scale and
offset the physical spot coordinates to get the corresponding
image coordinates. This information might be extracted from the
raw result file (or from the images).
The size of the area to cut out for each spot image needs to
be given by the user.
The input images cannot be visualized without modification, as
the dynamic range of the scanner far exceeds that of the user's
screen, and most spots would be completely black if rescaled to
8 bits per gun. Therefore, the colors of each spot should be
rescaled to use the full intensity range, with the same rescaling
done on all 1-3 channels. Gamma correction may also be applied
before going to 8 bpg.
The spot images should be saved as JPEG or some other format
with good compression. To avoid an excessive number of small
space-wasting files, they should be lumped together in reasonable
numbers before compression. With JPEG, this means that the
each spot image should be a square with side divisible by 8
pixels (to avoid interference between spots).
The scales, offsets, spotsize, gamma correction and JPEG quality
value should be stored in the database, along with the identities
of the images used to create spot images.
It should be possible to remove and re-generate spot images.
When no spot images exist for a raw data set, the parameters for
generating them may be altered, but when spot images exist they
may not.