BASE has support for storing data in files instead of importing it into the database.
Files can be attached to any item that implements the FileStoreEnabled
RawBioAssay
ArrayDesign
Data in files only
Data in the database only
Data in both files and in the database
Not all three cases are supported for all types of data. This is controlled
by the Platform
Platform.isFileOnly()
and/or
Platform.getRawDataType()
. If the isFileOnly()
method returns true
, the platform can't store data in
the database. If the value is false
more information
can be obtained by calling getRawDataType()
,
which may return:
null
: The platform can store data with any
raw data type in the database.
A RawDataType
isStoredInDb() == true
:
The platform can store data in the database but only data with the specified raw
data type.
A RawDataType
isStoredInDb() == false
:
The platform can't store data in the database.
Some FileStoreEnabled
DerivedBioAssay
getDataFileTypes()
method in the
ItemSubtype
For backwards compatibility reasons, each Platform
false
from the RawDataType.isStoredInDb()
method. They also have a back-link to the platform/variant that
created it: RawDataType.getPlatform()
and RawDataType.getVariant()
. These two methods
will always return null
when called on a raw data type
that can be stored in the database.
See also
This is rather large set of classes and methods. The ultimate goal
is to be able to create links between a FileStoreEnabled
File
FileStoreUtil
A client application must know what types of files it makes sense to ask the user for. In some cases, data may be split into more than one file so we need a generic way to select files.
Given that we have a FileStoreEnabled
DataFileType
Base.getDataFileTypes()
can be used for this. You'll need to supply information about the platform,
variant and subtype of the item. The method will create a query that returns
a list of DataFileType
items, each one representing a
specific file type that we should ask the user about. Examples:
The Affymetrix
platform defines CEL
as a raw data file and CDF
as an array design (reporter map)
file. If we have a RawBioAssay
The Generic
platform defines PRINT_MAP
and REPORTER_MAP
for array designs. If we have
an ArrayDesign
The Scan
subtype defines MICROARRAY_IMAGE
for derived bioassays.
It might also be interesting to know the currently selected file
for each file type and if the file is required
and if multiple
files are allowed.
Here is a simple code example
that may be useful to start from:
DbControl dc = ... FileStoreEnabled item = ... Platform platform = item.getPlatform(); PlatformVariant variant = item.getVariant(); Itemsubtype subtype = item instanceof Subtypable ? ((Subtypable)item).getItemSubtype() : null; // Get list of DataFileTypes used by the platform ItemQuery<DataFileType> query = Base.getDataFileTypes(item.getType(), item, platform, variant, subtype); List<DataFileType> types = query.list(dc); // Always check hasFileSet() method first to avoid // creating the file set if it doesn't exists FileSet fileSet = item.hasFileSet() ? null : item.getFileSet(); for (DataFileType type : types) { // Get the current file, if any FileSetMember member = fileSet == null || !fileSet.hasMember(type) ? null : fileSet.getMember(type); File current = member == null ? null : member.getFile(); // Check if a file is required by the platform/subtype PlatformFileType pft = platform == null ? null : platform.getFileType(type, variant, false); ItemSubtypeFileType ift = subtype == null ? null : subtype.getAssociatedDataFileType(type, false); boolean isRequired = pft == null ? false : pft.isRequired(); isRequired |= ift == null ? false : ift.isRequired(); // Now we can do something with this information to // let the user select a file ... }
![]() |
Also remember to catch PermissionDeniedException |
---|---|
The above code may look complicated, but this is mostly because
of all checks for |
When the user has selected the file(s) we must store the links
to them in the database. This is done with a FileSet
FileSet.setMember()
or FileSet.addMember()
to store a file in the file set. If a file already exists for the given file type
it is replaced if the setMember
method is called.
The following
program example assumes that we have a map where File
DataFileType
FileSet.validate()
to validate the files and extract metadata.
DbControl dc = ... FileStoreEnabled item = ... Map<DataFileType, File> files = ... // Store the selected files in the fileset FileSet fileSet = item.getFileSet(); for (Map.Entry<DataFileType, File> entry : files) { DataFileType type = entry.getKey(); File file = entry.getValue(); fileSet.setMember(type, file); } // Validate the files and extract metadata fileSet.validate(dc);
Validation and extraction of metadata is important since we want
data in files to be equivalent to data in the database. The validation
and metadata extraction is initiated by the core when the
FileSet.validate()
is called.
The validation and metadata extraction is handled by extensions
so the actual outcome depends on what has been installed on the
BASE server.
![]() |
Note |
---|---|
The |
Here is the general outline of what is going on in the core:
The core calls the main ExtensionsManager
After inspecting the current item and file set, the factories create
one or more ValidationAction
For each file in the file set, the ValidationAction.acceptFile()
method is called on each action, which is supposed to either accept or deny
validation of the file.
If the file is accepted the ValidationAction.validateAndExtractMetadata()
method is called.
![]() |
Only one instance of each validator class is created |
---|---|
The validation is not done until all files have been
added to the fileset. If the same validator is
used for more than one file, the same instance is reused. Eg.
the |
This should be done by existing plug-ins in the same way as before.
A slight modification is needed since it is good if the importers
are made aware of already selected files in the FileSet
FileStoreUtil
RawBioAssay rba = ... DbControl dc = ... // Get the current raw data file, if any List<File> rawDataFiles = FileStoreUtil.getGenericDataFiles(dc, rba, FileType.RAW_DATA); File defaultFile = rawDataFiles.size() > 0 ? rawDataFiles.get(0) : null; // Create parameter asking for input file - use current as default PluginParameter<File> fileParameter = new PluginParameter<File>( "file", "Raw data file", "The file that contains the raw data that you want to import", new FileParameterType(defaultFile, true, 1) );
An import plug-in should also save the file that was used to the file set:
RawBioassay rba = ... // The file the user selected to import from File rawDataFile = (File)job.getValue("file"); // Save the file to the fileset. The method will check which file // type the platform uses as the raw data type. As a fallback the // GENERIC_RAW_DATA type is used FileStoreUtil.setGenericDataFile(dc, rba, FileType.RAW_DATA, DataFileType.GENERIC_RAW_DATA, rawDataFile);
Just as before, an experiment is still locked to a single
RawDataType
Platform
A plug-in (using data from the database that needs to check if it can be used within an experiment can still do:
Experiment e = ... RawDataType rdt = e.getRawDataType(); if (rdt.isStoredInDb()) { // Check number of channels, etc... // ... run plug-in code ... }
A newer plug-in which uses data from files should do:
Experiment e = ... DbControl dc = ... RawDataType rdt = e.getRawDataType(); if (!rdt.isStoredInDb()) { // Check that platform/variant is supported Platform p = rdt.getPlatform(dc); PlatformVariant v = rdt.getVariant(dc); // ... // Get data files File aFile = FileStoreUtil.getDataFile(dc, ...); // ... run plug-in code ... }