A plug-in becomes an import plugin simply by returning
Plugin.MainType.IMPORT
from the Plugin.getMainType()
method.
BASE has built-in functionality for autodetecting file formats.
Your plug-in can be part of that feature if it reads it data
from a single file. It must also implement the
AutoDetectingImporter
interface.
public boolean isImportable(InputStream in)
throws BaseException;
Check the input stream if it seems to contain data that can be imported by the plugin. Usually it means scanning a few lines for some header matching a predefined string or regular expression.
The AbstractFlatFileImporter
can be used for text-based files and implements this method
by reading the headers from the input stream and checking if
it stopped at an unknown type of line or not:
public final boolean isImportable(InputStream in) throws BaseException { FlatFileParser ffp = getInitializedFlatFileParser(); ffp.setInputStream(in); try { ffp.nextSection(); FlatFileParser.LineType result = ffp.parseHeaders(); if (result == FlatFileParser.LineType.UNKNOWN) { return false; } else { return isImportable(ffp); } } catch (IOException ex) { throw new BaseException(ex); } }
The AbstractFlatFileImporter
also has functions
for setting the character set and automatic unwrapping of compressed
files. See the javadoc for more information.
Note that the input stream doesn't have to be a text file (but you can't use the
AbstractFlatFileImporter
then).
It can be any type of file, for example a binary or an XML file.
In the case of an XML file you would need to validate the entire
input stream in order to be a 100% sure that it is a valid
xml file, but we recommend that you only check the first few XML tags,
for example, the <!DOCTYPE > declaration and/or the root element
tag.
public void doImport(InputStream in,
ProgressReporter progress)
throws BaseException;
Parse the input stream and import all data that is found.
This method is of course only called if the
isImportable()
has returned true. Note
however that the input stream is reopened at the start of the
file. It may even be the case that the isImportable()
method is called on one instance of the plugin and the
doImport()
method is called on another.
Thus, the doImport()
can't rely on any state set
by the isImportable()
method.
The call sequence for autodetection resembles the call sequence for checking if the plug-in can be used in a given context.
A new instance of the plug-in class is created. The plug-in must have a public no-argument constructor.
The Plugin.init()
method is called.
The job
parameter is null
.
The configuration
parameter is null
if the plug-in does not have any configuration parameters.
If the plug-in is interactive the the InteractivePlugin.isInContext()
is called. If the context is a list context, the item
parameter is null, otherwise the current item is passed. The plug-in
should return null
if it can be used under the
current circumstances, or a message explaining why not.
If the plug-in can be used the AutoDetectingImporter.isImportable()
method is called to check if the selected file is importable or not.
After this, Plugin.done()
is called and
the plug-in instance is discarded. If there are
several configurations for a plug-in, this procedure is repeated
for each configuration. If the plug-in can be used without
a configuration the procedure is also repeated without
configuration parameters.
If a single plug-in was found the user is taken to the regular job configuration wizard. A new plug-in instance is created for this. If more than one plug-in was found the user is presented with a list of the plug-ins. After selecting one of them the regular job configuration wizard is used with a new plug-in instance.
The AbstractFlatFileImporter
is a very useful abstract
class to use as a superclass for your own import plug-ins. It can be used
if your plug-in uses regular text files that can be parsed by an instance of the
net.sf.basedb.util.FlatFileParser
class. This class parses a file
by checking each line against a few regular expressions. Depending on which regular
expression matches the line, it is classified as a header line, a section line,
a comment, a data line, a footer line or unknown. Header lines are inspected as a group,
but data lines individually, meaning that it consumes very little memory since only
a few lines at a time needs to be loaded.
The AbstractFlatFileImporter
defines
PluginParameter
objects
for each of the regular expressions and other parameters used by the parser. It also
implements the Plugin.run()
method and does most of
the ground work for instantiating a FlatFileParser
and
parsing the file. What you have to do in your plugin is to put together the
RequestInformation
objects
for configuring the plugin and creating a job and implement the
InteractivePlugin.configure()
method for validating and
storing the parameters. You should also implement or override some methods
defined by AbstractFlatFileImporter
.
Here is what you need to do:
Implement the InteractivePlugin
methods.
See the section called “The net.sf.basedb.core.plugin.InteractivePlugin interface” for more information. Note that the
AbstractFlatFileImporter
has defined many parameters for regular expressions used by the parser
already. You should just pick them and put in your RequestInformation
object.
// Parameter that maps the items name from a column private PluginParameter<String> nameColumnMapping; // Parameter that maps the items description from a column private PluginParameter<String> descriptionColumnMapping; private RequestInformation getConfigurePluginParameters(GuiContext context) { if (configurePlugin == null) { // To store parameters for CONFIGURE_PLUGIN List<PluginParameter<?>> parameters = new ArrayList<PluginParameter<?>>(); // Parser regular expressions - from AbstractFlatFileParser parameters.add(parserSection); parameters.add(headerRegexpParameter); parameters.add(dataHeaderRegexpParameter); parameters.add(dataSplitterRegexpParameter); parameters.add(ignoreRegexpParameter); parameters.add(dataFooterRegexpParameter); parameters.add(minDataColumnsParameter); parameters.add(maxDataColumnsParameter); // Column mappings nameColumnMapping = new PluginParameter<String>( "nameColumnMapping", "Name", "Mapping that picks the items name from the data columns", new StringParameterType(255, null, true) ); descriptionColumnMapping = new PluginParameter<String>( "descriptionColumnMapping", "Description", "Mapping that picks the items description from the data columns", new StringParameterType(255, null, false) ); parameters.add(mappingSection); parameters.add(nameColumnMapping); parameters.add(descriptionColumnMapping); configurePlugin = new RequestInformation ( Request.COMMAND_CONFIGURE_PLUGIN, "File parser settings", "", parameters ); } return configurePlugin; }
Implement/override some of the methods defined by
AbstractFlatFileParser
. The most important
methods are listed below.
protected FlatFileParser getInitializedFlatFileParser()
throws BaseException;
The method is called to create a FlatFileParser
and set the regular expressions that should be used for parsing the file.
The default implementation assumes that your plug-in has used the built-in
PluginParameter
objects and has stored the values
at the configuration level. You should override this method if you need to
initialise the parser in a different way. See for example the
code for the PrintMapFlatFileImporter
plug-in which
has a fixed format and doesn't use configurations.
@Override protected FlatFileParser getInitializedFlatFileParser() throws BaseException { FlatFileParser ffp = new FlatFileParser(); ffp.setSectionRegexp(Pattern.compile("\\[(.+)\\]")); ffp.setHeaderRegexp(Pattern.compile("(.+)=,(.*)")); ffp.setDataSplitterRegexp(Pattern.compile(",")); ffp.setDataFooterRegexp(Pattern.compile("")); ffp.setMinDataColumns(12); return ffp; }
protected boolean isImportable(FlatFileParser ffp)
throws IOException;
This method is called from the isImportable(InputStream)
method, AFTER FlatFileParser.nextSection()
and
FlatFileParser.parseHeaders()
has been called
a single time and if the parseHeaders
method didn't
stop on an unknown line. The default implementation of this method always returns
TRUE, since obviously some data has been found. A subclass may override this method
if it wants to do more checks, for example, make that a certain header is present
with a certain value. It may also continue parsing the file. Here is a code example from
the PrintMapFlatFileImporter
which checks if a
FormatName
header is present and contains either
TAM
or MwBr
.
/** Check that the file is a TAM or MwBr file. @return TRUE if a FormatName header is present and contains "TAM" or "MwBr", FALSE otherwise */ @Override protected boolean isImportable(FlatFileParser ffp) { String formatName = ffp.getHeader("FormatName"); return formatName != null && (formatName.contains("TAM") || formatName.contains("MwBr")); }
protected void begin(FlatFileParser ffp)
throws BaseException;
This method is called just before the parsing of the file
begins. Override this method if you need to initialise some
internal state. This is, for example, a good place to open
a DbControl
object, read parameters from the
job and configuration and put them into more useful variables. The default
implementation does nothing, but we recommend that
super.begin()
is always called.
// Snippets from the RawDataFlatFileImporter class private DbControl dc; private RawDataBatcher batcher; private RawBioAssay rawBioAssay; private Map<String, String> columnMappings; private int numInserted; @Override protected void begin() throws BaseException { super.begin(); // Get DbControl dc = sc.newDbControl(); rawBioAssay = (RawBioAssay)job.getValue(rawBioAssayParameter.getName()); // Reload raw bioassay using current DbControl rawBioAssay = RawBioAssay.getById(dc, rawBioAssay.getId()); // Create a batcher for inserting spots batcher = rawBioAssay.getRawDataBatcher(); // For progress reporting numInserted = 0; }
protected void handleHeader(FlatFileParser.Line line)
throws BaseException;
This method is called once for every header line that is found in
the file. The line
parameter contains information
about the header. The default implementation of this method does
nothing.
@Override protected void handleHeader(Line line) throws BaseException { super.handleHeader(line); if (line.name() != null && line.value() != null) { rawBioAssay.setHeader(line.name(), line.value()); } }
protected void handleSection(FlatFileParser.Line line)
throws BaseException;
This method is called once for each section that is found in the file.
The line
parameter contains information
about the section. The default implementation of this method does
nothing.
protected abstract void beginData()
throws BaseException;
This method is called after the headers has been parsed, but before the first line of data. This is a good place to add code that depends on information in the headers, for example, put together column mappings.
private Mapper reporterMapper; private Mapper blockMapper; private Mapper columnMapper; private Mapper rowMapper; // ... more mappers @Override protected void beginData() { boolean cropStrings = ("crop".equals(job.getValue("stringTooLongError"))); // Mapper that always return null; used if no mapping expression has been entered Mapper nullMapper = new ConstantMapper((String)null); // Column mappers reporterMapper = getMapper(ffp, (String)configuration.getValue("reporterIdColumnMapping"), cropStrings ? ReporterData.MAX_EXTERNAL_ID_LENGTH : null, nullMapper); blockMapper = getMapper(ffp, (String)configuration.getValue("blockColumnMapping"), null, nullMapper); columnMapper = getMapper(ffp, (String)configuration.getValue("columnColumnMapping"), null, nullMapper); rowMapper = getMapper(ffp, (String)configuration.getValue("rowColumnMapping"), null, nullMapper); // ... more mappers: metaGrid coordinate, X-Y coordinate, extended properties // ... }
protected abstract void handleData(FlatFileParser.Data data)
throws BaseException;
This method is abstract and must be implemented by all subclasses. It is called once for every data line in the the file.
// Snippets from the RawDataFlatFileImporter class @Override protected void handleData(Data data) throws BaseException { // Create new RawData object RawData raw = batcher.newRawData(); // External ID for the reporter String externalId = reporterMapper.getValue(data); // Block, row and column numbers raw.setBlock(blockMapper.getInt(data)); raw.setColumn(columnMapper.getInt(data)); raw.setRow(rowMapper.getInt(data)); // ... more: metaGrid coordinate, X-Y coordinate, extended properties // Insert raw data to the database batcher.insert(raw, externalId); numInserted++; }
protected void end(boolean success);
Called when the parsing has ended, either because the end of
file was reached or because an error has occurred. The subclass
should close any open resources, ie. the DbControl
object. The success
parameter is true
if the parsing was successful, false
otherwise.
The default implementation does nothing.
@Override protected void end(boolean success) throws BaseException { try { // Commit if the parsing was successful if (success) { batcher.close(); dc.commit(); } } catch (BaseException ex) { // Well, now we got an exception success = false; throw ex; } finally { // Always close... and call super.end() if (dc != null) dc.close(); super.end(success); } }
protected String getSuccessMessage();
This is the last method that is called, and it is only called if everything went suceessfully. This method allows a subclass to generate a short message that is sent back to the database as a final progress report. The default implementation returns null, which means that no message will be generated.
@Override protected String getSuccessMessage() { return numInserted + " spots inserted"; }
The AbstractFlatFileImporter
has a lot of
other methods that you may use and/or override in your own plug-in.
Check the javadoc for more information.
The ConfigureByExample
is a tagging interface that can be used by plug-ins using the
FlatFileParser
class for parsing.
The web client detects if a plug-in implements this interface and if the list of
parameters includes a section parameter with the name parserSection
a buttons is activated. This button will take the
user to a form which allows the user to enter values for the parameters defined in
the AbstractFlatFileImporter
class. Parameters for column mappings
must have the string "Mapping" in their names.