Preliminary list of features
This is the list of features we would like to see in the core
of BASE 2, as seen somewhat from a user's perspective. There will be a
separate document describing the more technical aspects of these
features, and yet another document detailing features that do not need
to go into the core.
- User system
- Item class
- Access levels: read, reference, use, alter, remove
The difference between 'reference' and 'use' is that
with the 'use' right the user may e.g. decrease the remaining quantity
of a biomaterial.
- Sharing of Items with multiple users/groups
A key is a set of users/groups with access levels.
An Item can be shared with an anonymous key, so that all one sees
is that it's shared with a specified set of users/groups, and
if this set is changed nothing happens to other Items.
Alternatively, it can be shared with a named key, e.g. "my friends",
consisting of a set of users/groups that can be changed. This way
one can share multiple items with collaborators and easily change
the collaborator list without having to alter actual groups.
- Recursive sharing/unsharing of items
- Almost everything is an Item
- All modifications are logged to the database
- Annotation system
- Almost all Items can be annotated
- An Item may have one Annotation of each applicable AnnotationType.
- Each AnnotationType has a value type.
- Value types: Ontologies (GO, MGED, etc), numbers, text, sets, files, ...
- An AnnotationType is defined on one Item class
For instance: 'Sample age' is not the same thing
as 'Extract age'.
- Annotations on items are inherited between classes
E.g.: An extract has sample annotations, because it's
created from a sample. These annotations are not merely copied on
extract creation, but instead follow the sample's annotations at all
- On pooling of Items, consensus Annotations are created
When several Items (such as extracts or wells) pooled
into one, their annotations need to be combined in a meaningful way.
Ideally, a consensus annotation would be created in a type-dependent
way. A complication is that this should be done dynamically, so that
downstream annotations are kept current. This is a technical problem
which needs to be solved.
- Channels need to be considered
For Hybridizations and downstream Items, the
biomaterial annotations are associated with a channel. Also see the
dye-swap issues below (in the raw data section).
- Annotations are fully searchable
One could e.g. filter RawBioAssays on their channel 1
sample disease status.
- Different types of protocols
Protocol subtypes for plate events
Users may define new subtypes.
- SampleSource -> Sample -> Extract -> LabeledExtract
- Pooling (Samples -> Sample etc.)
E.g.: Five samples are pooled to make one sample. From
the samples and pool six extracts are created. Five extracts
are pooled, levaing the user with seven extracts...
- Resource tracking
Rather than just keeping track of the remaining
quantity, we should remember all withdrawals of material.
- Admin-defined columns
- Connection to external databases
Admin defines external databases, tables, columns and
how to pull data from them (e.g. what type of join).
- Store the type of the reporter ID.
Only allow well-defined types.
- Easy redefinition of web links
Even though these are somewhat interface-specific, the
data-dependent links corresponding to different columns should be part
of the core.
- Array LIMS: Plates
- Geometries (columns / rows)
Generalization of the 96 well / 384 well stuff in BASE
- Plate types with different geometries, event types and annotation
An event is date/protocol/comment (should this be a
general annotation value type?).
- Plates, each of a plate type and containing a number of wells
- Fully annotatable wells on plates
Do we allow pooling of wells? In either case,
annotations will be tricky to inherit. Consider a tree of wells. If
one is annotated as being contaminated, the rest may also be. There
will have to be different ways to search well annotations (annotations
on the same well, annotations on all related wells, etc.).
- Plate import from files with user-defined file formats
Creating plates with material from other plates in an
arbitrary (acyclic) way.
- Plate merging
E.g. from 96w to 384w, with definable well mapping.
Mappings should be defined for different geometries. Maybe maintain a
list of possible source and destination plate types for merging, as
well as for the simpler action of 1-to-1 plate creation.
- Array LIMS: Arrays
- ArrayDesign -> ArrayBatch -> ArraySlide
- Feature specified on ArrayDesign by block/column/row.
A Feature maps a position to a well or a reporter.
- Meta-row/meta-column should map to block number
- Association of Plates and ArrayDesign
It should be possible to dissociate, but only if
Features don't exist (yet).
- Features can be created by print maps or 'reporter maps'.
A 'reporter map' maps block/column/row to a reporter.
Maybe we should rename it to 'feature map'? A print map maps
plate-number/column-in-plate/row-in-plate to a position on the array
design, so that Features can be created given the list of
- Possible to destroy Features if unreferenced
Needed to redo plate association.
- Add new print map formats easily
- LabeledExtract(s) -> Hybridization -> Scan ->
ImageAcquisition -> Image
- Arbitrary number of channels
- Allow the removal of image files without losing image information
- Raw data
- (ImageAcquisition ->) RawBioAssay
The RawBioAssay may be the starting point for data
- User-definable formats for raw data
- Consistency with ArrayDesign
Raw data must be verified against ArrayDesign.
- Each spot has a position, a Reporter, and a Feature.
- Admin-defined RawBioAssayData table
Because of the sheer number of spots, this seems like a
reasonable way to handle the problem of platform differences.
- Spot image creation for 1 - 3 channels
Spot images are extracted from TIFFs belonging to the
ImageAcquisition and JPEG-compressed after rescaling and gamma
- Experiment and analysis
- The channel count is fixed within an Experiment
- An Experiment has BioAssaySets consisting of BioAssays
- BioAssays have parent BioAssays and RawBioAssays
Each may have multiple parents.
- Each spot has position, intensities, reporter in BioAssay.
- Flexible tracking of spot origin
Spots usually point back to spots with the same
position on the parent BioAssay(s), but may be mapped in an arbitrary
way. The same applies to parent RawBioAssays.
- Extra values may be attached to spots
The value type must be defined (at the very least
named), and values must be attached to all spots within the
- Values may be attached to positions within a BioAssaySet.
- Possibility to keep attached values through subsequent
- Possibility to retain positions / Reporters for which all
spots have been lost
- Allow raw channels to be mapped to channels in any way (dye-swap)
Ratios in the raw data need to be handled.
- Store and use info about experimental design
For experiments without a common reference sample, but
maybe also as a general way of handling dye-swaps.
- Advanced filtering without external applications
For instance, "keep spots whose reporters are up
twofold on at least 3 slides annotated as 'disease'" or "((Ch1 FG med
/ Ch1 BG med > 5) in Raw Data Set 1) OR ((Ch1 FG med / Ch1 BG med
> 5) in Raw Data Set 2)".
- Files and other data attached to transformations/jobs
For use by externals tools.
- General search interface
External programs / user
interfaces need to know what can be filtered on and what information can
- Specification of what can be searched on
- Specification of possible operators
- Specification of possible values
- Specification of what can be returned
- Specification of sortability
- What about grouping, discretization, histograms?
These are useful for generating plots, and leave a lot
of number crunching to the database.
- Filtering presets (core feature or not?)
- Presets transferrable between search types ??
- Retrieval of statistics on RawBioAssays and BioAssays
E.g., the percentage of flagged spots. This is useful
when combined with annotations and other upstream data (such as the
name of the experimenter).