BASE - Preliminary list of features

Preliminary list of features

This is the list of features we would like to see in the core of BASE 2, as seen somewhat from a user's perspective. There will be a separate document describing the more technical aspects of these features, and yet another document detailing features that do not need to go into the core.

Created by: Carl
Contributions by: Nicklas, Johan
Last updated: $Date: 2009-04-06 14:52:39 +0200 (mÃ¥, 06 apr 2009) $

User system

User accounts
Groups
Roles for global access rights
A role is a set of global access rights. Assigning users roles makes it easier to update the rights for a whole set of users.
Authentication system with multiple sessions per user
Anonymous guest accounts
Tracking of disk usage, with quotas

Miscellaneous

Logging system for easy troubleshooting
Powerful external API
For use by web interface, external applications, analysis tools.
Queueing system for resource-intensive tasks

Item class

Access levels: read, reference, use, alter, remove
The difference between 'reference' and 'use' is that with the 'use' right the user may e.g. decrease the remaining quantity of a biomaterial.
Sharing of Items with multiple users/groups
A key is a set of users/groups with access levels. An Item can be shared with an anonymous key, so that all one sees is that it's shared with a specified set of users/groups, and if this set is changed nothing happens to other Items. Alternatively, it can be shared with a named key, e.g. "my friends", consisting of a set of users/groups that can be changed. This way one can share multiple items with collaborators and easily change the collaborator list without having to alter actual groups.
Recursive sharing/unsharing of items
Almost everything is an Item
All modifications are logged to the database

Annotation system

Almost all Items can be annotated
An Item may have one Annotation of each applicable AnnotationType.
Each AnnotationType has a value type.
Value types: Ontologies (GO, MGED, etc), numbers, text, sets, files, ...
An AnnotationType is defined on one Item class
For instance: 'Sample age' is not the same thing as 'Extract age'.
Annotations on items are inherited between classes
E.g.: An extract has sample annotations, because it's created from a sample. These annotations are not merely copied on extract creation, but instead follow the sample's annotations at all times.
On pooling of Items, consensus Annotations are created
When several Items (such as extracts or wells) pooled into one, their annotations need to be combined in a meaningful way. Ideally, a consensus annotation would be created in a type-dependent way. A complication is that this should be done dynamically, so that downstream annotations are kept current. This is a technical problem which needs to be solved.
Channels need to be considered
For Hybridizations and downstream Items, the biomaterial annotations are associated with a channel. Also see the dye-swap issues below (in the raw data section).
Annotations are fully searchable
One could e.g. filter RawBioAssays on their channel 1 sample disease status.

Protocols

Different types of protocols
Protocol subtypes for plate events
Users may define new subtypes.

Uploads

Personal file upload area
The upload area may be structured ??
Directories in the upload area?

Projects

Grouping of some items into Projects
For instance: Biomaterials, Experiments, Protocols and Uploads.

Biomaterials

SampleSource -> Sample -> Extract -> LabeledExtract <- Label
Pooling (Samples -> Sample etc.)
E.g.: Five samples are pooled to make one sample. From the samples and pool six extracts are created. Five extracts are pooled, levaing the user with seven extracts...
Resource tracking
Rather than just keeping track of the remaining quantity, we should remember all withdrawals of material.

Reporters

Annotatable
Admin-defined columns
Connection to external databases
Admin defines external databases, tables, columns and how to pull data from them (e.g. what type of join).
Store the type of the reporter ID.
Only allow well-defined types.
Easy redefinition of web links
Even though these are somewhat interface-specific, the data-dependent links corresponding to different columns should be part of the core.

Array LIMS: Plates

Geometries (columns / rows)
Generalization of the 96 well / 384 well stuff in BASE 1.
Plate types with different geometries, event types and annotation types
An event is date/protocol/comment (should this be a general annotation value type?).
Plates, each of a plate type and containing a number of wells
Fully annotatable wells on plates
Do we allow pooling of wells? In either case, annotations will be tricky to inherit. Consider a tree of wells. If one is annotated as being contaminated, the rest may also be. There will have to be different ways to search well annotations (annotations on the same well, annotations on all related wells, etc.).
Plate import from files with user-defined file formats
Hitpicking
Creating plates with material from other plates in an arbitrary (acyclic) way.
Plate merging
E.g. from 96w to 384w, with definable well mapping. Mappings should be defined for different geometries. Maybe maintain a list of possible source and destination plate types for merging, as well as for the simpler action of 1-to-1 plate creation.

Array LIMS: Arrays

ArrayDesign -> ArrayBatch -> ArraySlide
Feature specified on ArrayDesign by block/column/row.
A Feature maps a position to a well or a reporter.
Meta-row/meta-column should map to block number
Association of Plates and ArrayDesign
It should be possible to dissociate, but only if Features don't exist (yet).
Features can be created by print maps or 'reporter maps'.
A 'reporter map' maps block/column/row to a reporter. Maybe we should rename it to 'feature map'? A print map maps plate-number/column-in-plate/row-in-plate to a position on the array design, so that Features can be created given the list of plates.
Possible to destroy Features if unreferenced
Needed to redo plate association.
Add new print map formats easily

Hybridizations

LabeledExtract(s) -> Hybridization -> Scan -> ImageAcquisition -> Image
Arbitrary number of channels
Allow the removal of image files without losing image information

Raw data

(ImageAcquisition ->) RawBioAssay
The RawBioAssay may be the starting point for data input.
User-definable formats for raw data
Consistency with ArrayDesign
Raw data must be verified against ArrayDesign.
Each spot has a position, a Reporter, and a Feature.
Admin-defined RawBioAssayData table
Because of the sheer number of spots, this seems like a reasonable way to handle the problem of platform differences.
Spot image creation for 1 - 3 channels
Spot images are extracted from TIFFs belonging to the ImageAcquisition and JPEG-compressed after rescaling and gamma correction.

Experiment and analysis

The channel count is fixed within an Experiment
An Experiment has BioAssaySets consisting of BioAssays
BioAssays have parent BioAssays and RawBioAssays
Each may have multiple parents.
Each spot has position, intensities, reporter in BioAssay.
Flexible tracking of spot origin
Spots usually point back to spots with the same position on the parent BioAssay(s), but may be mapped in an arbitrary way. The same applies to parent RawBioAssays.
Extra values may be attached to spots
The value type must be defined (at the very least named), and values must be attached to all spots within the BioAssaySet.
Values may be attached to positions within a BioAssaySet.
Possibility to keep attached values through subsequent analysis steps
Possibility to retain positions / Reporters for which all spots have been lost
Allow raw channels to be mapped to channels in any way (dye-swap)
Ratios in the raw data need to be handled.
Store and use info about experimental design
For experiments without a common reference sample, but maybe also as a general way of handling dye-swaps.
Advanced filtering without external applications
For instance, "keep spots whose reporters are up twofold on at least 3 slides annotated as 'disease'" or "((Ch1 FG med / Ch1 BG med > 5) in Raw Data Set 1) OR ((Ch1 FG med / Ch1 BG med > 5) in Raw Data Set 2)".
Files and other data attached to transformations/jobs
For use by externals tools.

General search interface

External programs / user interfaces need to know what can be filtered on and what information can be retrieved.

Specification of what can be searched on
Specification of possible operators
Specification of possible values
Specification of what can be returned
Specification of sortability
What about grouping, discretization, histograms?
These are useful for generating plots, and leave a lot of number crunching to the database.
Filtering presets (core feature or not?)
Presets transferrable between search types ??
Retrieval of statistics on RawBioAssays and BioAssays
E.g., the percentage of flagged spots. This is useful when combined with annotations and other upstream data (such as the name of the experimenter).