This document explains how to use BASE for storing data in original files instead of in the database. This API solves the following problems:
Contents
See also
Given that we have a FileXxxAble
object (for example
a RawBioAssay
or ArrayDesign
we use the
getPlatform()
to load the associated platform. This is a required
property. Now, after executing the query we get from Platform.getFileXxxTypes()
we have a list of FileXxxType
object. Each one describes a specific type
of file that can be used on the given platform. For example:
Affymetrix
platform defines a CEL
and CDF
file types for RAW_DATA
and FETURE_DATA
respectively.
If we have a RawBioAssay
we filter the query to only return raw data
file types. Now, we can ask the user for a CEL file.
In fact, we can get the list of FileXxxType
object for any
type of item using the simple code below:
DbControl dc = ... FileXxxAble item = .... Platform p = item.getPlatform(); List<FileXxxType> fileType = p.getFileXxxTypes(item.getItemType()).list(dc); // Now, ask the user to select one file for each type
When the user has selected the file(s) we must store the links to them
in the database. This is done via a FileSet
. A file set
contains 0, one or more files. The only limitation is that it can only contain
one file of each FileXxxType
. Call FileSet.addMember
to store a file in the file set. If a file already exists for the given
file type, it is replaced, otherwise a new entry is created.
Validation and extraction of metadata is an important part if we want data in files to be equivalent to data in the database. The validation and metadata extraction is normally performed when adding a file to a fileset.
Each FileXxxType
may store the classname of a FileValidator
and a MetadataReader
. If so, they are used when a file is
added to the file set. An important thing is that if the same class is used
for both validation and metadata reading, only one instance is created.
FileXxxAble item = ... FileXxxType type = ... File file = ... FileValidator validator = type.getValidator(); MetadataReader reader = type.getMetadataReader(); validator.setFile(file); validator.setItem(item); // Repeat for 'reader' if not same as 'validator' validator.validate(); reader.writeMetadata();
All validators and metadata readers should extend the AbstractFileHandler
.
The reason is that I feel that we may have to add more methods to the FileHandler
interface in the future. The AbstractFileHandler
will then provide default
implementations.
TODO....
...but I think this is done by the already existing plug-ins in
more or less the same manner as before. The may benfit from already
selected file(s), so it would probably be a good idea to make them
aware of the FileSet
to offer good default values.
// Get file to use a default value File defaultFile = null; RawBioAssay rba = ... FileSet fileSet = rba.getFileSet(); if (fileSet != null) { Listlist = fileSet.getMembers(DataType.RAW_DATA); if (list.size() > 0) { defaultFile = list.get(0).getFile(); } }
The auto detect option of the web interface should also be made aware of this.
BASE ships with a number of platforms already pre-installed. It is important that the external ID of the platform of file types are not changed.
Platform | File types | |||
---|---|---|---|---|
Name | ID | Data type | Name | ID |
Generic | generic | RAW_DATA | Raw data file | generic.raw |
FEATURE_DATA | Print map | generic.printmap | ||
FEATURE_DATA | Reporter map | generic.reportermap | ||
Affymetrix | affymetrix | RAW_DATA | Affymetrix CEL file | affymetrix.cel |
FEATURE_DATA | Affymetrix CDF file | affymetrix.cdf |
Servers that are upgrading from previous releases are assigned the generic platform unless the array design is an affy chip and the raw bioassay is Affymetrix raw data type.
RawDataType
fit
into this? There seems to be an overlap with Platform
. Should
we have a link from a platform to a raw data type? Can we mix any platform
with any raw data type? Can we mix platforms in an experiment? Should we
allow mixing of raw data types in an experiment? Can we still maintain
backwards compatibility?
ANSWER: There are two types of platforms: platforms that can't store data in
the database (file-only platforms) and platforms which may store data in the
database. File-only platforms will auto-generate a raw data type, there is
no need to define it in raw-data-types.xml
. Database-platforms
may be locked to a specific raw data type, but doesn't have to.
FileXxxType
in a FileSet
? Do we need a
'multiplicity' option for the FileXxxType
?
ANSWER: Only one file of each type is allowed.
FileSetMember
).
Is this too limiting? Maybe we need validation and metadata extraction based on the entire
FileSet
. This could happen if data is split into multiple files
(for example Imagene has one file for cy3 data and one for cy5 data). Do we need
a FileSetValidator
/ FileSetMetadataReader
that can
be assigned to Platform
?
ANSWER: This is solved by having the same validator class for more
than one file type. Only one instance is created and it is given access
to all files of the specified types in the file set before
validate()
is called.
FileXxxType
per File
enough? Probably in most cases, but maybe
for analysed data there is overlap/compatibility between file formats.
Can we do this by saying that file type X is compatible with file type Y
and if someone asks for Y we give them X? Can we do this by a directional many-to-many
relation between two FileXxxType
:s? Is this a real problem
which increases the user experience, or is it only theoretical?
ANSWER: We don't see a need for it right now.