26.3. Import plug-ins

26.3.1. Autodetect file formats
The net.sf.basedb.core.plugin.AutoDetectingImporter interface
Call sequence during autodetection
26.3.2. The AbstractFlatFileImporter superclass
Configure by example

A plug-in becomes an import plugin simply by returning Plugin.MainType.IMPORT from the Plugin.getMainType() method.

26.3.1. Autodetect file formats

BASE has built-in functionality for autodetecting file formats. Your plug-in can be part of that feature if it reads it data from a single file. It must also implement the AutoDetectingImporter interface.

The net.sf.basedb.core.plugin.AutoDetectingImporter interface

public boolean isImportable(InputStream in)
    throws BaseException;

Check the input stream if it seems to contain data that can be imported by the plugin. Usually it means scanning a few lines for some header matching a predefined string or regular expression.

The AbstractFlatFileImporter can be used for text-based files and implements this method by reading the headers from the input stream and checking if it stopped at an unknown type of line or not:

public final boolean isImportable(InputStream in)
   throws BaseException
{
   FlatFileParser ffp = getInitializedFlatFileParser();
   ffp.setInputStream(in);
   try
   {
      ffp.nextSection();
      FlatFileParser.LineType result = ffp.parseHeaders();
      if (result == FlatFileParser.LineType.UNKNOWN)
      {
         return false;
      }
      else
      {
         return isImportable(ffp);
      }
   }
   catch (IOException ex)
   {
      throw new BaseException(ex);
   }
}

The AbstractFlatFileImporter also has functions for setting the character set and automatic unwrapping of compressed files. See the javadoc for more information.

Note that the input stream doesn't have to be a text file (but you can't use the AbstractFlatFileImporter then). It can be any type of file, for example a binary or an XML file. In the case of an XML file you would need to validate the entire input stream in order to be a 100% sure that it is a valid xml file, but we recommend that you only check the first few XML tags, for example, the <!DOCTYPE > declaration and/or the root element tag.

[Tip] Try casting to ImportInputStream

In many cases (but not all) the auto-detect functionality uses a ImportInputStream as the in parameter. This class contains some metadata about the file the input stream is originating from. The most useful feature is the possibility to get information about the character set used in the file. This makes it possible to open text files using the correct character set.

String charset = Config.getCharset(); // Default value
if (in instanceof ImportInputStream)
{
   ImportInputStream iim = (ImportInputStream)in;
   if (iim.getCharacterSet() != null) charset = iim.getCharacterSet();
}
Reader reader = new InputStreamReader(in, Charset.forName(charset)));
public void doImport(InputStream in,
                     ProgressReporter progress)
    throws BaseException;

Parse the input stream and import all data that is found. This method is of course only called if the isImportable() has returned true. Note however that the input stream is reopened at the start of the file. It may even be the case that the isImportable() method is called on one instance of the plugin and the doImport() method is called on another. Thus, the doImport() can't rely on any state set by the isImportable() method.

Call sequence during autodetection

The call sequence for autodetection resembles the call sequence for checking if the plug-in can be used in a given context.

  1. A new instance of the plug-in class is created. The plug-in must have a public no-argument constructor.

  2. The Plugin.init() method is called. The job parameter is null. The configuration parameter is null if the plug-in does not have any configuration parameters.

  3. If the plug-in is interactive the the InteractivePlugin.isInContext() is called. If the context is a list context, the item parameter is null, otherwise the current item is passed. The plug-in should return null if it can be used under the current circumstances, or a message explaining why not.

  4. If the plug-in can be used the AutoDetectingImporter.isImportable() method is called to check if the selected file is importable or not.

  5. After this, Plugin.done() is called and the plug-in instance is discarded. If there are several configurations for a plug-in, this procedure is repeated for each configuration. If the plug-in can be used without a configuration the procedure is also repeated without configuration parameters.

  6. If a single plug-in was found the user is taken to the regular job configuration wizard. A new plug-in instance is created for this. If more than one plug-in was found the user is presented with a list of the plug-ins. After selecting one of them the regular job configuration wizard is used with a new plug-in instance.

26.3.2. The AbstractFlatFileImporter superclass

The AbstractFlatFileImporter is a very useful abstract class to use as a superclass for your own import plug-ins. It can be used if your plug-in uses regular text files that can be parsed by an instance of the net.sf.basedb.util.FlatFileParser class. This class parses a file by checking each line against a few regular expressions. Depending on which regular expression matches the line, it is classified as a header line, a section line, a comment, a data line, a footer line or unknown. Header lines are inspected as a group, but data lines individually, meaning that it consumes very little memory since only a few lines at a time needs to be loaded.

The AbstractFlatFileImporter defines PluginParameter objects for each of the regular expressions and other parameters used by the parser. It also implements the Plugin.run() method and does most of the ground work for instantiating a FlatFileParser and parsing the file. What you have to do in your plugin is to put together the RequestInformation objects for configuring the plugin and creating a job and implement the InteractivePlugin.configure() method for validating and storing the parameters. You should also implement or override some methods defined by AbstractFlatFileImporter.

Here is what you need to do:

  • Implement the InteractivePlugin methods. See the section called “The net.sf.basedb.core.plugin.InteractivePlugin interface” for more information. Note that the AbstractFlatFileImporter has defined many parameters for regular expressions used by the parser already. You should just pick them and put in your RequestInformation object.

    // Parameter that maps the items name from a column
    private PluginParameter<String> nameColumnMapping;
    
    // Parameter that maps the items description from a column
    private PluginParameter<String> descriptionColumnMapping;
    
    private RequestInformation getConfigurePluginParameters(GuiContext context)
    {
       if (configurePlugin == null)
       {
          // To store parameters for CONFIGURE_PLUGIN
          List<PluginParameter<?>> parameters = 
             new ArrayList<PluginParameter<?>>();
    
          // Parser regular expressions - from AbstractFlatFileParser
          parameters.add(parserSection);
          parameters.add(headerRegexpParameter);
          parameters.add(dataHeaderRegexpParameter);
          parameters.add(dataSplitterRegexpParameter);
          parameters.add(ignoreRegexpParameter);
          parameters.add(dataFooterRegexpParameter);
          parameters.add(minDataColumnsParameter);
          parameters.add(maxDataColumnsParameter);
    
          // Column mappings
          nameColumnMapping = new PluginParameter<String>(
             "nameColumnMapping",
             "Name",
             "Mapping that picks the items name from the data columns",
             new StringParameterType(255, null, true)
          );
    		
          descriptionColumnMapping = new PluginParameter<String>(
            "descriptionColumnMapping",
            "Description",
            "Mapping that picks the items description from the data columns",
            new StringParameterType(255, null, false)
          );
    
          parameters.add(mappingSection);
          parameters.add(nameColumnMapping);
          parameters.add(descriptionColumnMapping);
    			
          configurePlugin = new RequestInformation
          (
             Request.COMMAND_CONFIGURE_PLUGIN,
             "File parser settings",
             "",
             parameters
          );
    
       }
       return configurePlugin;
    }
    
  • Implement/override some of the methods defined by AbstractFlatFileParser. The most important methods are listed below.

protected FlatFileParser getInitializedFlatFileParser()
    throws BaseException;

The method is called to create a FlatFileParser and set the regular expressions that should be used for parsing the file. The default implementation assumes that your plug-in has used the built-in PluginParameter objects and has stored the values at the configuration level. You should override this method if you need to initialise the parser in a different way. See for example the code for the PrintMapFlatFileImporter plug-in which has a fixed format and doesn't use configurations.

@Override
protected FlatFileParser getInitializedFlatFileParser()
   throws BaseException
{
   FlatFileParser ffp = new FlatFileParser();
   ffp.setSectionRegexp(Pattern.compile("\\[(.+)\\]"));
   ffp.setHeaderRegexp(Pattern.compile("(.+)=,(.*)"));
   ffp.setDataSplitterRegexp(Pattern.compile(","));
   ffp.setDataFooterRegexp(Pattern.compile(""));
   ffp.setMinDataColumns(12);
   return ffp;
}
protected boolean isImportable(FlatFileParser ffp)
    throws IOException;

This method is called from the isImportable(InputStream) method, AFTER FlatFileParser.nextSection() and FlatFileParser.parseHeaders() has been called a single time and if the parseHeaders method didn't stop on an unknown line. The default implementation of this method always returns TRUE, since obviously some data has been found. A subclass may override this method if it wants to do more checks, for example, make that a certain header is present with a certain value. It may also continue parsing the file. Here is a code example from the PrintMapFlatFileImporter which checks if a FormatName header is present and contains either TAM or MwBr.

/**
   Check that the file is a TAM or MwBr file.
   @return TRUE if a FormatName header is present and contains "TAM" or "MwBr", FALSE
      otherwise
*/
@Override
protected boolean isImportable(FlatFileParser ffp)
{
   String formatName = ffp.getHeader("FormatName");
   return formatName != null && 
      (formatName.contains("TAM") || formatName.contains("MwBr"));
}
protected void begin(FlatFileParser ffp)
    throws BaseException;

This method is called just before the parsing of the file begins. Override this method if you need to initialise some internal state. This is, for example, a good place to open a DbControl object, read parameters from the job and configuration and put them into more useful variables. The default implementation does nothing, but we recommend that super.begin() is always called.

// Snippets from the RawDataFlatFileImporter class
private DbControl dc;
private RawDataBatcher batcher;
private RawBioAssay rawBioAssay;
private Map<String, String> columnMappings;
private int numInserted;

@Override
protected void begin()
   throws BaseException
{
   super.begin();

   // Get DbControl
   dc = sc.newDbControl();
   rawBioAssay = (RawBioAssay)job.getValue(rawBioAssayParameter.getName());

   // Reload raw bioassay using current DbControl
   rawBioAssay = RawBioAssay.getById(dc, rawBioAssay.getId());
   
   // Create a batcher for inserting spots
   batcher = rawBioAssay.getRawDataBatcher();

   // For progress reporting
   numInserted = 0;
}					
protected void handleHeader(FlatFileParser.Line line)
    throws BaseException;

This method is called once for every header line that is found in the file. The line parameter contains information about the header. The default implementation of this method does nothing.

@Override
protected void handleHeader(Line line) 
   throws BaseException
{
   super.handleHeader(line);
   if (line.name() != null && line.value() != null)
   {
      rawBioAssay.setHeader(line.name(), line.value());
   }
}
protected void handleSection(FlatFileParser.Line line)
    throws BaseException;

This method is called once for each section that is found in the file. The line parameter contains information about the section. The default implementation of this method does nothing.

protected abstract void beginData()
    throws BaseException;

This method is called after the headers has been parsed, but before the first line of data. This is a good place to add code that depends on information in the headers, for example, put together column mappings.

private Mapper reporterMapper;
private Mapper blockMapper;
private Mapper columnMapper;
private Mapper rowMapper;
// ... more mappers

@Override
protected void beginData()
{
   boolean cropStrings = ("crop".equals(job.getValue("stringTooLongError")));

   // Mapper that always return null; used if no mapping expression has been entered
   Mapper nullMapper = new ConstantMapper((String)null);
   
   // Column mappers
   reporterMapper = getMapper(ffp, (String)configuration.getValue("reporterIdColumnMapping"), 
      cropStrings ? ReporterData.MAX_EXTERNAL_ID_LENGTH : null, nullMapper);
   blockMapper = getMapper(ffp, (String)configuration.getValue("blockColumnMapping"), 
      null, nullMapper);
   columnMapper = getMapper(ffp, (String)configuration.getValue("columnColumnMapping"), 
      null, nullMapper);
   rowMapper = getMapper(ffp, (String)configuration.getValue("rowColumnMapping"), 
      null, nullMapper);
   // ... more mappers: metaGrid coordinate, X-Y coordinate, extended properties
   // ...
}
protected abstract void handleData(FlatFileParser.Data data)
    throws BaseException;

This method is abstract and must be implemented by all subclasses. It is called once for every data line in the the file.

// Snippets from the RawDataFlatFileImporter class
@Override
protected void handleData(Data data)
   throws BaseException
{
   // Create new RawData object
   RawData raw = batcher.newRawData();

   // External ID for the reporter
   String externalId = reporterMapper.getValue(data);
   
   // Block, row and column numbers
   raw.setBlock(blockMapper.getInt(data));
   raw.setColumn(columnMapper.getInt(data));
   raw.setRow(rowMapper.getInt(data));
   // ... more: metaGrid coordinate, X-Y coordinate, extended properties
   
   // Insert raw data to the database
   batcher.insert(raw, externalId);
   numInserted++;
}
protected void end(boolean success);

Called when the parsing has ended, either because the end of file was reached or because an error has occurred. The subclass should close any open resources, ie. the DbControl object. The success parameter is true if the parsing was successful, false otherwise. The default implementation does nothing.

@Override
protected void end(boolean success)
   throws BaseException
{
   try
   {
      // Commit if the parsing was successful
      if (success)
      {
         batcher.close();
         dc.commit();
      }
   }
   catch (BaseException ex)
   {
      // Well, now we got an exception
      success = false;
      throw ex;
   }
   finally
   {
      // Always close... and call super.end()
      if (dc != null) dc.close();
      super.end(success);
   }
}			
protected String getSuccessMessage();

This is the last method that is called, and it is only called if everything went suceessfully. This method allows a subclass to generate a short message that is sent back to the database as a final progress report. The default implementation returns null, which means that no message will be generated.

@Override
protected String getSuccessMessage()
{
   return numInserted + " spots inserted";
}

The AbstractFlatFileImporter has a lot of other methods that you may use and/or override in your own plug-in. Check the javadoc for more information.

Configure by example

The ConfigureByExample is a tagging interface that can be used by plug-ins using the FlatFileParser class for parsing. The web client detects if a plug-in implements this interface and if the list of parameters includes a section parameter with the name parserSection a Test with file buttons is activated. This button will take the user to a form which allows the user to enter values for the parameters defined in the AbstractFlatFileImporter class. Parameters for column mappings must have the string "Mapping" in their names.