BASE has support for storing data in files instead of importing it into the database.
Files can be attached to any item that implements the FileStoreEnabled
interface. For example, RawBioAssay
,
and ArrayDesign
and a few other
classes. The ability to store data in files is not a replacement for storing data in the
database. It is possible (for some platforms/raw data types) to have data in
files and in the database at the same time. There are three cases:
Data in files only
Data in the database only
Data in both files and in the database
Not all three cases are supported for all types of data. This is controlled
by the Platform
class, which may disallow
that data is stored in the database. To check this call
Platform.isFileOnly()
and/or
Platform.getRawDataType()
. If the isFileOnly()
method returns true
, the platform can't store data in
the database. If the value is false
more information
can be obtained by calling getRawDataType()
,
which may return:
null
: The platform can store data with any
raw data type in the database.
A RawDataType
that has isStoredInDb() == true
:
The platform can store data in the database but only data with the specified raw
data type.
A RawDataType
that has isStoredInDb() == false
:
The platform can't store data in the database.
Some FileStoreEnabled
items doesn't
have a platform (for example, DerivedBioAssay
).
In this case, the file storage ability is controlled by the subtype of the item.
See getDataFileTypes()
method in the
ItemSubtype
class.
For backwards compatibility reasons, each Platform
that can only store data in files will create "virtual" raw data type
objects internally. These raw data types all return false
from the RawDataType.isStoredInDb()
method. They also have a back-link to the platform/variant that
created it: RawDataType.getPlatform()
and RawDataType.getVariant()
. These two methods
will always return null
when called on a raw data type
that can be stored in the database.
See also
This is rather large set of classes and methods. The ultimate goal
is to be able to create links between a FileStoreEnabled
item and File
items and to provide some metadata about the files.
The FileStoreUtil
class is one of the most
important ones. It is intended to make it easy for plug-in (and other)
developers to access the files without having to mess with platform
or file type objects. The API is best described
by a set of use-case examples.
A client application must know what types of files it makes sense to ask the user for. In some cases, data may be split into more than one file so we need a generic way to select files.
Given that we have a FileStoreEnabled
item we want to find out which DataFileType
items that can be used for that item. The
Base.getDataFileTypes()
can be used for this. You'll need to supply information about the platform,
variant and subtype of the item. The method will create a query that returns
a list of DataFileType
items, each one representing a
specific file type that we should ask the user about. Examples:
The Affymetrix
platform defines CEL
as a raw data file and CDF
as an array design (reporter map)
file. If we have a RawBioAssay
the query will only return
the CEL file type and the client can ask the user for a CEL file.
The Generic
platform defines PRINT_MAP
and REPORTER_MAP
for array designs. If we have
an ArrayDesign
the query will return those two
items.
The Scan
subtype defines MICROARRAY_IMAGE
for derived bioassays.
It might also be interesting to know the currently selected file
for each file type and if the file is required
and if multiple
files are allowed.
Here is a simple code example
that may be useful to start from:
DbControl dc = ... FileStoreEnabled item = ... Platform platform = item.getPlatform(); PlatformVariant variant = item.getVariant(); Itemsubtype subtype = item instanceof Subtypable ? ((Subtypable)item).getItemSubtype() : null; // Get list of DataFileTypes used by the platform ItemQuery<DataFileType> query = Base.getDataFileTypes(item.getType(), item, platform, variant, subtype); List<DataFileType> types = query.list(dc); // Always check hasFileSet() method first to avoid // creating the file set if it doesn't exists FileSet fileSet = item.hasFileSet() ? null : item.getFileSet(); for (DataFileType type : types) { // Get the current file, if any FileSetMember member = fileSet == null || !fileSet.hasMember(type) ? null : fileSet.getMember(type); File current = member == null ? null : member.getFile(); // Check if a file is required by the platform/subtype PlatformFileType pft = platform == null ? null : platform.getFileType(type, variant, false); ItemSubtypeFileType ift = subtype == null ? null : subtype.getAssociatedDataFileType(type, false); boolean isRequired = pft == null ? false : pft.isRequired(); isRequired |= ift == null ? false : ift.isRequired(); // Now we can do something with this information to // let the user select a file ... }
Also remember to catch PermissionDeniedException | |
---|---|
The above code may look complicated, but this is mostly because
of all checks for |
When the user has selected the file(s) we must store the links
to them in the database. This is done with a FileSet
object. A file set can contain any number of files.
Call FileSet.setMember()
or FileSet.addMember()
to store a file in the file set. If a file already exists for the given file type
it is replaced if the setMember
method is called.
The following
program example assumes that we have a map where File
:s
are related to DataFileType
:s. When all files
have been added we call FileSet.validate()
to validate the files and extract metadata.
DbControl dc = ... FileStoreEnabled item = ... Map<DataFileType, File> files = ... // Store the selected files in the fileset FileSet fileSet = item.getFileSet(); for (Map.Entry<DataFileType, File> entry : files) { DataFileType type = entry.getKey(); File file = entry.getValue(); fileSet.setMember(type, file); } // Validate the files and extract metadata fileSet.validate(dc);
Validation and extraction of metadata is important since we want
data in files to be equivalent to data in the database. The validation
and metadata extraction is initiated by the core when the
FileSet.validate()
is called.
The validation and metadata extraction is handled by extensions
so the actual outcome depends on what has been installed on the
BASE server.
Note | |
---|---|
The |
Here is the general outline of what is going on in the core:
The core calls the main ExtensionsManager
and initiates the action factory for all file set validator extensions.
After inspecting the current item and file set, the factories create
one or more ValidationAction
:s.
For each file in the file set, the ValidationAction.acceptFile()
method is called on each action, which is supposed to either accept or deny
validation of the file.
If the file is accepted the ValidationAction.validateAndExtractMetadata()
method is called.
Only one instance of each validator class is created | |
---|---|
The validation is not done until all files have been
added to the fileset. If the same validator is
used for more than one file, the same instance is reused. Eg.
the |
This should be done by existing plug-ins in the same way as before.
A slight modification is needed since it is good if the importers
are made aware of already selected files in the FileSet
to provide good default values. The FileStoreUtil
class is very useful in cases like this:
RawBioAssay rba = ... DbControl dc = ... // Get the current raw data file, if any List<File> rawDataFiles = FileStoreUtil.getGenericDataFiles(dc, rba, FileType.RAW_DATA); File defaultFile = rawDataFiles.size() > 0 ? rawDataFiles.get(0) : null; // Create parameter asking for input file - use current as default PluginParameter<File> fileParameter = new PluginParameter<File>( "file", "Raw data file", "The file that contains the raw data that you want to import", new FileParameterType(defaultFile, true, 1) );
An import plug-in should also save the file that was used to the file set:
RawBioassay rba = ... // The file the user selected to import from File rawDataFile = (File)job.getValue("file"); // Save the file to the fileset. The method will check which file // type the platform uses as the raw data type. As a fallback the // GENERIC_RAW_DATA type is used FileStoreUtil.setGenericDataFile(dc, rba, FileType.RAW_DATA, DataFileType.GENERIC_RAW_DATA, rawDataFile);
Just as before, an experiment is still locked to a single
RawDataType
. This is a design issue that
would break too many things if changed. If data is stored in files
the experiment is also locked to a single Platform
.
This has been designed to have as little impact on existing
plug-ins as possible. In most cases, the plug-ins will continue
to work as before.
A plug-in (using data from the database that needs to check if it can be used within an experiment can still do:
Experiment e = ... RawDataType rdt = e.getRawDataType(); if (rdt.isStoredInDb()) { // Check number of channels, etc... // ... run plug-in code ... }
A newer plug-in which uses data from files should do:
Experiment e = ... DbControl dc = ... RawDataType rdt = e.getRawDataType(); if (!rdt.isStoredInDb()) { // Check that platform/variant is supported Platform p = rdt.getPlatform(dc); PlatformVariant v = rdt.getVariant(dc); // ... // Get data files File aFile = FileStoreUtil.getDataFile(dc, ...); // ... run plug-in code ... }