This document contains information specific for writing import plugins.
Notably, it has some information about the AbstractFlatFileImporter
class which is useful if you are importing things from text files.
Contents
See also
A plugin becoms an import plugin simply by returning Plugin.MainType.IMPORT
from the Plugin.getMainType()
method.
The AbstractFlatFileImporter
is a very useful abstract class to inherit from
if your plugin uses regular text files that can be parsed by an instance of the
net.sf.basedb.util.FlatFileParser
class. This class parses a file
by checking each line against a few regular expressions. Depending on which regular
expression matches the line, it is classified as a header line, a section line, a comment,
a data line, a footer line or unknown. Header lines are inspected in a group, but data
lines individually, meaning that it consumes very little memory since only a few lines
at a time needs to be loaded.
The AbstractFlatFileImporter
defines PluginParameter
objects
for each of the regular expressions and other parameters used by the parser. It also
implements the Plugin.run()
method and does most of the ground work
for instantiating a FlatFileParser
and parsing the file. What you have to
do in your plugin is to put together the RequestInformation
objects
for configuring the plugin and creating a job and implement the
InteractivePlugin.configure()
method for validating and storing the
parameteters. You should also implement some abstract methods like handleHeader()
and handleData()
but more of that later.
Here is what you need to do:
getAbout()
and getMainType()
InteractivePlugin
methodsAbstractFlatFileImporter
has defined many parameters for regular expressions used by the parser
already. You should just pick them and put in your RequestInformation
object.
// Parameter that maps the items name from a column private PluginParameter<String> nameColumnMapping; // Parameter that maps the items description from a column private PluginParameter<String> descriptionColumnMapping; private RequestInformation getConfigurePluginParameters(GuiContext context) { if (configurePlugin == null) { // RequestInformation object for CONFIGURE_PLUGIN List<PluginParameter<?>> parameters = new ArrayList<PluginParameter<?>>(); // Parser regular expressions - from AbstractFlatFileParser parameters.add(parserSection); parameters.add(headerRegexpParameter); parameters.add(dataHeaderRegexpParameter); parameters.add(dataSplitterRegexpParameter); parameters.add(ignoreRegexpParameter); parameters.add(dataFooterRegexpParameter); parameters.add(minDataColumnsParameter); parameters.add(maxDataColumnsParameter); // Column mappings nameColumnMapping = new PluginParameter<String>( "nameColumnMapping", "Name", "Mapping that picks the items name from the data columns", new StringParameterType(255, null, true) ); descriptionColumnMapping = new PluginParameter<String>( "descriptionColumnMapping", "Description", "Mapping that picks the items description from the data columns", new StringParameterType(255, null, false) ); parameters.add(mappingSection); parameters.add(nameColumnMapping); parameters.add(descriptionColumnMapping); configurePlugin = new RequestInformation ( Request.COMMAND_CONFIGURE_PLUGIN, "File parser settings", "TODO - description", parameters ); } return configurePlugin; }
AbstractFlatFileParser
DbControl
object, read parameters from the job and configuration and
put them into more useful variables. The default implementation
does nothing, but we recommend that super.begin()
is
always called.
// Snippets from the RawDataFlatFileImporter class private DbControl dc; private RawDataBatcher batcher; private RawBioAssay rawBioAssay; private Map<String, String> columnMappings; private int numInserted; @Override protected void begin() throws BaseException { super.begin(); // Get DbControl dc = sc.newDbControl(); rawBioAssay = (RawBioAssay)job.getValue(rawBioAssayParameter.getName()); // Reload raw bioassay using current DbControl rawBioAssay = RawBioAssay.getById(dc, rawBioAssay.getId()); // Create a batcher for inserting spots batcher = rawBioAssay.getRawDataBatcher(); // Cache columns mappings in map columnMappings = new HashMap<String, String>(); for (PluginParameter<?> pp : getAllColumnMappings(rawBioAssay.getRawDataType())) { columnMappings.put(pp.getName(), (String)configuration.getValue(pp.getName())); } // For progress reporting numInserted = 0; }
This method is called once for every header line that is found in
the file. The line
parameter contains information
about the header. The default implementation of this method does
nothing.
@Override protected void handleHeader(Line line) throws BaseException { super.handleHeader(line); if (line.name() != null && line.value() != null) { rawBioAssay.setHeader(line.name(), line.value()); } }
This method is called once for each section that is found in the file.
The line
parameter contains information
about the header. The default implementation of this method does
nothing. Currently, we have no plugins using this feature and can't show any
example code.
This method is abstract and must be implemented by all subclasses. It follows the same pattern as the other methods, and is called once for every data line in the the file.
// Snippets from the RawDataFlatFileImporter class @Override protected void handleData(Data data) throws BaseException { // Create new RawData object RawData raw = batcher.newRawData(); // External ID for the reporter String externalId = data.map(columnMappings.get("reporterIdColumnMapping")); // Block, row and column numbers String block = data.map(columnMappings.get(blockColumnMapping.getName())); String column = data.map(columnMappings.get(columnColumnMapping.getName())); String row = data.map(columnMappings.get(rowColumnMapping.getName())); // ... more: metaGrid coordinate, X-Y coordinate if (block != null) raw.setBlock(Integer.valueOf(block)); if (column != null) raw.setColumn(Integer.valueOf(column)); if (row != null) raw.setRow(Integer.valueOf(row)); // ... more: metaGrid coordinate, X-Y coordinate // Other properties for (RawDataProperty rdp : rawBioAssay.getRawDataType().getProperties()) { String extendedData = data.map( columnMappings.get("propertyMapping."+rdp.getName())); raw.setExtended(rdp.getName(), rdp.parseString(extendedData)); } // Insert raw data to the database batcher.insert(raw, externalId); numInserted++; }
Called when the parsing has ended, either because the end of
file was reached or because an error has occurred. The subclass
should close any open resources, ie. the DbControl
object. The success
parameter is true
if the parsing was successful, false
otherwise.
The default implementation does nothing.
@Override protected void end(boolean success) throws BaseException { try { // Commit if the parsing was successful if (success) { batcher.close(); dc.commit(); } } catch (BaseException ex) { // Well, now we got an exception success = false; throw ex; } finally { // Always close... and call super.end() if (dc != null) dc.close(); super.end(success); } }
This is the last method that is called, and it is only called if everything went suceessfully. This method allows a subclass to generate a short message that is sent back to the database as a final progress report. The default implementation returns null, which means that no message will be generated.
@Override protected String getSuccessMessage() { return numInserted + (numInserted == 1 ? " spot inserted" : " spots inserted"); }
Base has built-in functionality for autodetecting file formats. If your import plugin
wants to participate in that feature it must implement the AutoDetectingImporter
interface. This interface has two methods:
Check the input stream if it seems to contain data that can be imported by the plugin. Usually it means scanning a few lines for some header mathing a predefined string or a regexp.
The AbstractFlatFileImporter
implements this method
by checking reading the headers from the input stream and checking if
it stopped at an unknown type of line or not:
public final boolean isImportable(InputStream in) throws BaseException { FlatFileParser ffp = getInitializedFlatFileParser(); ffp.setInputStream(in); try { FlatFileParser.LineType result = ffp.parseHeaders(); return result != FlatFileParser.LineType.UNKNOWN; } catch (IOException ex) { throw new BaseException(ex); } }
Note that the input stream doesn't have to be a text file. It could be any type of file, for example a binary or xml file. In the case of an xml file you would need to validate the entiry input stream in order to be a 100% sure that it is a valid xml file, but we recommend that you only check the first few xml tags, ie. the <!DOCTYPE > declaration and/or the root element tag.
Parse the input stream and import all data that is found. This method is of
cource only called if the isImportable
has returned true. Note
however that the input stream is reopened at the start of the file. It may even
be the case that the isImportable
method is called on one instance
of the plugin and the doImport
method is called on another.
Thus, the doImport
can't rely on any state set by the isImportable
method.