29.3.9. Using files to store data

BASE has support for storing data in files instead of importing it into the database. Files can be attached to any item that implements the FileStoreEnabled interface. For example, RawBioAssay, and ArrayDesign and a few other classes. The ability to store data in files is not a replacement for storing data in the database. It is possible (for some platforms/raw data types) to have data in files and in the database at the same time. There are three cases:

Not all three cases are supported for all types of data. This is controlled by the Platform class, which may disallow that data is stored in the database. To check this call Platform.isFileOnly() and/or Platform.getRawDataType(). If the isFileOnly() method returns true, the platform can't store data in the database. If the value is false more information can be obtained by calling getRawDataType(), which may return:

Some FileStoreEnabled items doesn't have a platform (for example, DerivedBioAssay). In this case, the file storage ability is controlled by the subtype of the item. See getDataFileTypes() method in the ItemSubtype class.

For backwards compatibility reasons, each Platform that can only store data in files will create "virtual" raw data type objects internally. These raw data types all return false from the RawDataType.isStoredInDb() method. They also have a back-link to the platform/variant that created it: RawDataType.getPlatform() and RawDataType.getVariant(). These two methods will always return null when called on a raw data type that can be stored in the database.

See also

Diagram of classes and methods

Figure 29.20. Store data in files

Store data in files

This is rather large set of classes and methods. The ultimate goal is to be able to create links between a FileStoreEnabled item and File items and to provide some metadata about the files. The FileStoreUtil class is one of the most important ones. It is intended to make it easy for plug-in (and other) developers to access the files without having to mess with platform or file type objects. The API is best described by a set of use-case examples.

Use case: Asking the user for files for a given item

A client application must know what types of files it makes sense to ask the user for. In some cases, data may be split into more than one file so we need a generic way to select files.

Given that we have a FileStoreEnabled item we want to find out which DataFileType items that can be used for that item. The Base.getDataFileTypes() can be used for this. You'll need to supply information about the platform, variant and subtype of the item. The method will create a query that returns a list of DataFileType items, each one representing a specific file type that we should ask the user about. Examples:

  1. The Affymetrix platform defines CEL as a raw data file and CDF as an array design (reporter map) file. If we have a RawBioAssay the query will only return the CEL file type and the client can ask the user for a CEL file.

  2. The Generic platform defines PRINT_MAP and REPORTER_MAP for array designs. If we have an ArrayDesign the query will return those two items.

  3. The Scan subtype defines MICROARRAY_IMAGE for derived bioassays.

It might also be interesting to know the currently selected file for each file type and if the file is required and if multiple files are allowed. Here is a simple code example that may be useful to start from:

DbControl dc = ...
FileStoreEnabled item = ...
Platform platform = item.getPlatform();
PlatformVariant variant = item.getVariant();
Itemsubtype subtype = item instanceof Subtypable ?
   ((Subtypable)item).getItemSubtype() : null;

// Get list of DataFileTypes used by the platform
ItemQuery<DataFileType> query =
   Base.getDataFileTypes(item.getType(), item, platform, variant, subtype);
List<DataFileType> types = query.list(dc);

// Always check hasFileSet() method first to avoid
// creating the file set if it doesn't exists
FileSet fileSet = item.hasFileSet() ? 
   null : item.getFileSet();
   
for (DataFileType type : types)
{
   // Get the current file, if any
   FileSetMember member = fileSet == null || !fileSet.hasMember(type) ?
      null : fileSet.getMember(type);
   File current = member == null ? 
      null : member.getFile();
   
   // Check if a file is required by the platform/subtype
   PlatformFileType pft = platform == null ? 
      null : platform.getFileType(type, variant, false);
   ItemSubtypeFileType ift = subtype == null ?
      null : subtype.getAssociatedDataFileType(type, false);
   boolean isRequired = pft == null ? 
      false : pft.isRequired();
   isRequired |= ift == null ?
      false : ift.isRequired();
      
   // Now we can do something with this information to
   // let the user select a file ...
}
[Note] Also remember to catch PermissionDeniedException

The above code may look complicated, but this is mostly because of all checks for null values. Remember that many things are optional and may return null. Another thing to look out for is PermissionDeniedException:s. The logged in user may not have access to all items. The above example doesn't include any code for this since it would have made it too complex.

Use case: Link, validate and extract metadata from the selected files

When the user has selected the file(s) we must store the links to them in the database. This is done with a FileSet object. A file set can contain any number of files. Call FileSet.setMember() or FileSet.addMember() to store a file in the file set. If a file already exists for the given file type it is replaced if the setMember method is called. The following program example assumes that we have a map where File:s are related to DataFileType:s. When all files have been added we call FileSet.validate() to validate the files and extract metadata.

DbControl dc = ...
FileStoreEnabled item = ...
Map<DataFileType, File> files = ...

// Store the selected files in the fileset
FileSet fileSet = item.getFileSet();
for (Map.Entry<DataFileType, File> entry : files)
{
   DataFileType type = entry.getKey();
   File file = entry.getValue();
   fileSet.setMember(type, file);
}

// Validate the files and extract metadata
fileSet.validate(dc);

Validation and extraction of metadata is important since we want data in files to be equivalent to data in the database. The validation and metadata extraction is initiated by the core when the FileSet.validate() is called. The validation and metadata extraction is handled by extensions so the actual outcome depends on what has been installed on the BASE server.

[Note] Note

The FileSet.validate() method doesn't throw any exceptions. Instead, all validation errors are returned a list of Throwable:s. The validation result is also stored for each file and can be access with FileSetMember.isValid() and FileSetMember.getErrorMessage().

Here is the general outline of what is going on in the core:

  1. The core calls the main ExtensionsManager and initiates the action factory for all file set validator extensions.

  2. After inspecting the current item and file set, the factories create one or more ValidationAction:s.

  3. For each file in the file set, the ValidationAction.acceptFile() method is called on each action, which is supposed to either accept or deny validation of the file.

  4. If the file is accepted the ValidationAction.validateAndExtractMetadata() method is called.

[Note] Only one instance of each validator class is created

The validation is not done until all files have been added to the fileset. If the same validator is used for more than one file, the same instance is reused. Eg. the acceptFile() is called one time for each file. Depending on the return value, the validateAndExtractMetadata() may be called either immediately or not until all files have been processed.

Use case: Import data into the database

This should be done by existing plug-ins in the same way as before. A slight modification is needed since it is good if the importers are made aware of already selected files in the FileSet to provide good default values. The FileStoreUtil class is very useful in cases like this:

RawBioAssay rba = ...
DbControl dc = ...

// Get the current raw data file, if any
List<File> rawDataFiles = 
   FileStoreUtil.getGenericDataFiles(dc, rba, FileType.RAW_DATA);
File defaultFile = rawDataFiles.size() > 0 ?
   rawDataFiles.get(0) : null;
   
// Create parameter asking for input file - use current as default
PluginParameter<File> fileParameter = new PluginParameter<File>(
   "file",
   "Raw data file",
   "The file that contains the raw data that you want to import",
   new FileParameterType(defaultFile, true, 1)
);

An import plug-in should also save the file that was used to the file set:

RawBioassay rba = ...
// The file the user selected to import from
File rawDataFile = (File)job.getValue("file");

// Save the file to the fileset. The method will check which file 
// type the platform uses as the raw data type. As a fallback the
// GENERIC_RAW_DATA type is used
FileStoreUtil.setGenericDataFile(dc, rba, FileType.RAW_DATA, 
   DataFileType.GENERIC_RAW_DATA, rawDataFile);

Use case: Using raw data from files in an experiment

Just as before, an experiment is still locked to a single RawDataType. This is a design issue that would break too many things if changed. If data is stored in files the experiment is also locked to a single Platform. This has been designed to have as little impact on existing plug-ins as possible. In most cases, the plug-ins will continue to work as before.

A plug-in (using data from the database that needs to check if it can be used within an experiment can still do:

Experiment e = ...
RawDataType rdt = e.getRawDataType();
if (rdt.isStoredInDb())
{
   // Check number of channels, etc...
   // ... run plug-in code ...
}

A newer plug-in which uses data from files should do:

Experiment e = ...
DbControl dc = ...
RawDataType rdt = e.getRawDataType();
if (!rdt.isStoredInDb())
{
   // Check that platform/variant is supported
   Platform p = rdt.getPlatform(dc);
   PlatformVariant v = rdt.getVariant(dc);
   // ...

   // Get data files
   File aFile = FileStoreUtil.getDataFile(dc, ...);
   
   // ... run plug-in code ...
}