Plugins for importing data

NOTE! This document is outdated and has been replaced with newer documentation. See Plug-in developer: Import plug-ins

This document contains information specific for writing import plugins. Notably, it has some information about the AbstractFlatFileImporter class which is useful if you are importing things from text files.

Contents

  1. Import plugins
  2. AbstractFlatFileImporter
  3. Autodetecting file formats

See also

Last updated: $Date: 2008-09-11 22:01:44 +0200 (to, 11 sep 2008) $
Copyright © 2006 The respective authors. All rights reserved.

1. Import plugins

A plugin becoms an import plugin simply by returning Plugin.MainType.IMPORT from the Plugin.getMainType() method.

2. AbstractFlatFileImporter

The AbstractFlatFileImporter is a very useful abstract class to inherit from if your plugin uses regular text files that can be parsed by an instance of the net.sf.basedb.util.FlatFileParser class. This class parses a file by checking each line against a few regular expressions. Depending on which regular expression matches the line, it is classified as a header line, a section line, a comment, a data line, a footer line or unknown. Header lines are inspected in a group, but data lines individually, meaning that it consumes very little memory since only a few lines at a time needs to be loaded.

The AbstractFlatFileImporter defines PluginParameter objects for each of the regular expressions and other parameters used by the parser. It also implements the Plugin.run() method and does most of the ground work for instantiating a FlatFileParser and parsing the file. What you have to do in your plugin is to put together the RequestInformation objects for configuring the plugin and creating a job and implement the InteractivePlugin.configure() method for validating and storing the parameteters. You should also implement some abstract methods like handleHeader() and handleData() but more of that later.

Here is what you need to do:

Implement getAbout() and getMainType()
See The Plugin interface for more information
Implement the InteractivePlugin methods
See The InteractivePlugin interface for more information. Note that the AbstractFlatFileImporter has defined many parameters for regular expressions used by the parser already. You should just pick them and put in your RequestInformation object.
// Parameter that maps the items name from a column
private PluginParameter<String> nameColumnMapping;

// Parameter that maps the items description from a column
private PluginParameter<String> descriptionColumnMapping;

private RequestInformation getConfigurePluginParameters(GuiContext context)
{
   if (configurePlugin == null)
   {
      // RequestInformation object for CONFIGURE_PLUGIN
      List<PluginParameter<?>> parameters = new ArrayList<PluginParameter<?>>();

      // Parser regular expressions - from AbstractFlatFileParser
      parameters.add(parserSection);
      parameters.add(headerRegexpParameter);
      parameters.add(dataHeaderRegexpParameter);
      parameters.add(dataSplitterRegexpParameter);
      parameters.add(ignoreRegexpParameter);
      parameters.add(dataFooterRegexpParameter);
      parameters.add(minDataColumnsParameter);
      parameters.add(maxDataColumnsParameter);

      // Column mappings
      nameColumnMapping = new PluginParameter<String>(
         "nameColumnMapping",
         "Name",
         "Mapping that picks the items name from the data columns",
         new StringParameterType(255, null, true)
      );
		
      descriptionColumnMapping = new PluginParameter<String>(
        "descriptionColumnMapping",
        "Description",
        "Mapping that picks the items description from the data columns",
        new StringParameterType(255, null, false)
      );

      parameters.add(mappingSection);
      parameters.add(nameColumnMapping);
      parameters.add(descriptionColumnMapping);
			
      configurePlugin = new RequestInformation
      (
         Request.COMMAND_CONFIGURE_PLUGIN,
         "File parser settings",
         "TODO - description",
         parameters
      );

   }
   return configurePlugin;
}
Implement/override some of the methods defined by AbstractFlatFileParser
protected void begin()
This method is called just before the parsing of the file begins. Override this emthod if you need to initialise some internal state. This is, for example, a good place to open a DbControl object, read parameters from the job and configuration and put them into more useful variables. The default implementation does nothing, but we recommend that super.begin() is always called.
// Snippets from the RawDataFlatFileImporter class
private DbControl dc;
private RawDataBatcher batcher;
private RawBioAssay rawBioAssay;
private Map<String, String> columnMappings;
private int numInserted;

@Override
protected void begin()
   throws BaseException
{
   super.begin();

   // Get DbControl
   dc = sc.newDbControl();
   rawBioAssay = (RawBioAssay)job.getValue(rawBioAssayParameter.getName());

   // Reload raw bioassay using current DbControl
   rawBioAssay = RawBioAssay.getById(dc, rawBioAssay.getId());
   
   // Create a batcher for inserting spots
   batcher = rawBioAssay.getRawDataBatcher();

   // Cache columns mappings in map
   columnMappings = new HashMap<String, String>();
   for (PluginParameter<?> pp : getAllColumnMappings(rawBioAssay.getRawDataType()))
   {
      columnMappings.put(pp.getName(), 
         (String)configuration.getValue(pp.getName()));
   }
   
   // For progress reporting
   numInserted = 0;
}
protected void handleHeader(FlatFileParser.Line line)

This method is called once for every header line that is found in the file. The line parameter contains information about the header. The default implementation of this method does nothing.

@Override
protected void handleHeader(Line line) 
   throws BaseException
{
   super.handleHeader(line);
   if (line.name() != null && line.value() != null)
   {
      rawBioAssay.setHeader(line.name(), line.value());
   }
}
protected void handleSection(FlatFileParser.Line line)

This method is called once for each section that is found in the file. The line parameter contains information about the header. The default implementation of this method does nothing. Currently, we have no plugins using this feature and can't show any example code.

protected abstract void handleData(FlatFileParser.Data data) throws BaseException;

This method is abstract and must be implemented by all subclasses. It follows the same pattern as the other methods, and is called once for every data line in the the file.

// Snippets from the RawDataFlatFileImporter class
@Override
protected void handleData(Data data)
   throws BaseException
{
   // Create new RawData object
   RawData raw = batcher.newRawData();

   // External ID for the reporter
   String externalId = data.map(columnMappings.get("reporterIdColumnMapping"));
   
   // Block, row and column numbers
   String block = data.map(columnMappings.get(blockColumnMapping.getName()));
   String column = data.map(columnMappings.get(columnColumnMapping.getName()));
   String row = data.map(columnMappings.get(rowColumnMapping.getName()));
   // ... more: metaGrid coordinate, X-Y coordinate

   if (block != null) raw.setBlock(Integer.valueOf(block));
   if (column != null) raw.setColumn(Integer.valueOf(column));
   if (row != null) raw.setRow(Integer.valueOf(row));
   // ... more: metaGrid coordinate, X-Y coordinate

   // Other properties 
   for (RawDataProperty rdp : rawBioAssay.getRawDataType().getProperties())
   {
      String extendedData = data.map(
         columnMappings.get("propertyMapping."+rdp.getName()));
      raw.setExtended(rdp.getName(), rdp.parseString(extendedData));
   }
   
   // Insert raw data to the database
   batcher.insert(raw, externalId);
   numInserted++;
}
protected void end(boolean success) throws BaseException

Called when the parsing has ended, either because the end of file was reached or because an error has occurred. The subclass should close any open resources, ie. the DbControl object. The success parameter is true if the parsing was successful, false otherwise. The default implementation does nothing.

@Override
protected void end(boolean success)
   throws BaseException
{
   try
   {
      // Commit if the parsing was successful
      if (success)
      {
         batcher.close();
         dc.commit();
      }
   }
   catch (BaseException ex)
   {
      // Well, now we got an exception
      success = false;
      throw ex;
   }
   finally
   {
      // Always close... and call super.end()
      if (dc != null) dc.close();
      super.end(success);
   }
}			
protected String getSuccessMessage()

This is the last method that is called, and it is only called if everything went suceessfully. This method allows a subclass to generate a short message that is sent back to the database as a final progress report. The default implementation returns null, which means that no message will be generated.

@Override
protected String getSuccessMessage()
{
   return numInserted + (numInserted == 1 ? " spot inserted" : " spots inserted");
}

3. Autodetecting file formats

Base has built-in functionality for autodetecting file formats. If your import plugin wants to participate in that feature it must implement the AutoDetectingImporter interface. This interface has two methods:

public boolean isImportable(InputStream in) throws BaseException;

Check the input stream if it seems to contain data that can be imported by the plugin. Usually it means scanning a few lines for some header mathing a predefined string or a regexp.

The AbstractFlatFileImporter implements this method by checking reading the headers from the input stream and checking if it stopped at an unknown type of line or not:

public final boolean isImportable(InputStream in)
   throws BaseException
{
   FlatFileParser ffp = getInitializedFlatFileParser();
   ffp.setInputStream(in);
   try
   {
      FlatFileParser.LineType result = ffp.parseHeaders();
      return result != FlatFileParser.LineType.UNKNOWN;
   }
   catch (IOException ex)
   {
      throw new BaseException(ex);
   }
}

Note that the input stream doesn't have to be a text file. It could be any type of file, for example a binary or xml file. In the case of an xml file you would need to validate the entiry input stream in order to be a 100% sure that it is a valid xml file, but we recommend that you only check the first few xml tags, ie. the <!DOCTYPE > declaration and/or the root element tag.

public void doImport(InputStream in, ProgressReporter progress) throws BaseException;

Parse the input stream and import all data that is found. This method is of cource only called if the isImportable has returned true. Note however that the input stream is reopened at the start of the file. It may even be the case that the isImportable method is called on one instance of the plugin and the doImport method is called on another. Thus, the doImport can't rely on any state set by the isImportable method.