Class AbstractFlatFileImporter

java.lang.Object
net.sf.basedb.core.plugin.AbstractPlugin
net.sf.basedb.plugins.AbstractFlatFileImporter
All Implemented Interfaces:
AutoDetectingImporter, Plugin, SignalTarget
Direct Known Subclasses:
AbstractItemImporter, AnnotationFlatFileImporter, AnyToAnyImporter, IlluminaRawDataImporter, PlateFlatFileImporter, PlateMappingImporter, PrintMapFlatFileImporter, RawDataFlatFileImporter, ReporterFlatFileImporter, ReporterMapFlatFileImporter

public abstract class AbstractFlatFileImporter
extends AbstractPlugin
implements AutoDetectingImporter, SignalTarget
An abstract base class for all importers that imports data from one or more flat files. The implementation in this class uses a FlatFileParser for parsing the files and uses a callback method handleData(FlatFileParser.Data) that lets the subclass do whatever it needs to insert a single line of data into the database.

The subclass must also generate the RequestInformation object for both plugin and job configuration. However, this implementation expects to find the regular expression needed for the FlatFileParser to be found in the plugin configuration and the File to import to be found in the job configuration.

All of the needed PluginParameter objects needed to ask for these parameters are declared as protected variables in this class.

// ReporterFlatFileImporter.java
// RequestInformation object for command = CONFIGURE_PLUGIN
List<PluginParameter<?>> parameters = 
   new ArrayList<PluginParameter<?>>();

// Parser regular expressions
parameters.add(headerRegexpParameter);
parameters.add(dataHeaderRegexpParameter);
parameters.add(dataSplitterRegexpParameter);
parameters.add(ignoreRegexpParameter);
parameters.add(dataFooterRegexpParameter);

// Column mappings
parameters.addAll(allColumnMappings);

// Reporter type
parameters.add(reporterTypeParameter);

configurePlugin = new RequestInformation
(
   Request.COMMAND_CONFIGURE_PLUGIN,
   "Parser settings",
   "Please enter all settings needed by the flat file parser",
   parameters
);
This class implements the invoke method but only for the Request.COMMAND_EXECUTE command. The subclass must override the invoke method and provide implementations for the other commands (Request.COMMAND_CONFIGURE_PLUGIN and Request.COMMAND_CONFIGURE_JOB). The normal implementation of would be to store a request parameter.
// ReporterFlatFileImporter.java
if (command.equals(Request.COMMAND_CONFIGURE_JOB))
{
   List<Throwable> errors = validateRequestParameters(jobParameters, request);
   if (errors != null)
   {
      response.setError(errors.size()+" invalid parameter(s) were found in the request", errors);
      return;
   }
   storeValue(job, request, fileParameter);
   response.setDone(null);
}

For multi-file support (added in BASE 2.9) the subclass needs to override the getFileIterator() and getTotalFileSize() methods.

Version:
2.0
Author:
Enell, Nicklas
Last modified
$Date: 2020-10-13 14:31:34 +0200 (Tue, 13 Oct 2020) $
  • Field Details

  • Constructor Details

    • AbstractFlatFileImporter

      public AbstractFlatFileImporter()
  • Method Details

    • getMainType

      public Plugin.MainType getMainType()
      Specified by:
      getMainType in interface Plugin
      Returns:
      One of the defined types
    • requiresConfiguration

      public boolean requiresConfiguration()
      Return TRUE, since the implementation requires it for finding the regular expressions used by the FlatFileParser. If this method is overridden and returns FALSE, the subclass must also override the getInitializedFlatFileParser() method and provide a parser with all regular expressions and other options set.
      Specified by:
      requiresConfiguration in interface Plugin
      Overrides:
      requiresConfiguration in class AbstractPlugin
      Returns:
      TRUE or FALSE
    • run

      public void run​(Request request, Response response, ProgressReporter progress)
      Implements the Request.COMMAND_EXECUTE command. Subclasses must override this to implement other commands. Subclasses should not call this method for other commands than Request.COMMAND_EXECUTE since this method will set an error response status.
      Specified by:
      run in interface Plugin
      Parameters:
      request - Request object with the command and parameters
      response - Response object in for the plugin to response through
      progress - A ProgressReporter where the plugin can report its progess, can be null
    • isImportable

      public final boolean isImportable​(InputStream in) throws BaseException
      Description copied from interface: AutoDetectingImporter
      Check if the given InputStream can be imported by this plugin.
      Specified by:
      isImportable in interface AutoDetectingImporter
      Parameters:
      in - The input stream to check
      Returns:
      TRUE if the stream can be imported, FALSE otherwise
      Throws:
      BaseException - If something goes wrong
    • doImport

      public void doImport​(InputStream in, ProgressReporter progress) throws BaseException
      Description copied from interface: AutoDetectingImporter
      Import the data from the given InputStream.
      Specified by:
      doImport in interface AutoDetectingImporter
      Parameters:
      in - The input stream to read from
      progress - Progress reporter used by the caller to keep track of the progress. Null is allowed
      Throws:
      BaseException - If something goes wrong
    • getSignalHandler

      public SignalHandler getSignalHandler()
      Create a new ThreadSignalHandler that supports the Signal.ABORT signal. Subclasses may override this to provide another signal handler, or return null if they don't support signals.
      Specified by:
      getSignalHandler in interface SignalTarget
      Returns:
      A SignalHandler object, or null if the current instance doesn't support signals
      Since:
      2.6
    • start

      protected void start()
      Called once before starting the import. The default implementation does nothing.
      Since:
      2.9
    • getFileIterator

      protected Iterator<File> getFileIterator()
      Get an iterator that returns the files to be imported. The default implementation returns the single file found in the job's "file" parameter. Subclasses that needs multi-file/item import should override this method to provide their own iterator. They should also override the getTotalFileSize() method to return sum of all file sizes. Eg. File.getSize().
      Since:
      2.9
    • getTotalFileSize

      protected long getTotalFileSize()
      Get the total file size of all files that are going to be imported. A subclass that is going to import from multiple files needs to override this method. The default implementation return the size of the file in the job's "file" parameter.
      Returns:
      The sum of the file sizes, or -1 if not known
      Since:
      2.9
    • getProgress

      protected int getProgress​(FlatFileParser ffp)
      Get the progress of import as a percentage value. The default implementation calls sums the file size of the completed files and getNumBytes(FlatFileParser). This values is divided by getTotalFileSize().
      Parameters:
      ffp - The file parser that is used to parsed the file
      Returns:
      A value between 0 and 100
      Since:
      2.6
    • getNumBytes

      protected long getNumBytes​(FlatFileParser ffp)
      Get the number of bytes read from the file. The value should indicate the how far into the file parsing has proceeded. The default implementation calls FlatFileParser.getParsedBytes(). If a subclass has wrapped the input stream the number of parsed byts may not correspond to the number of bytes read from the file. For example, if the file is a compressed file the number of parsed bytes will be higher than the number of bytes read from the file.
      Parameters:
      ffp - The file parser that is used to parsed the file
      Returns:
      The number of bytes read from the original file
      Since:
      2.6
      See Also:
      wrapInputStream(InputStream), getProgress(FlatFileParser)
    • isImportable

      protected boolean isImportable​(FlatFileParser ffp) throws IOException
      This method is called by the isImportable(InputStream) method after FlatFileParser.nextSection() and FlatFileParser.parseHeaders() has been called and if data has been found. Thus, the default implementation of this method always returns TRUE. Subclasses may override this method to do more checks, for example to make sure certain headers are present or parse more data from the file.
      Parameters:
      ffp - The FlatFileParser object used to parse the file
      Returns:
      Always TRUE
      Throws:
      IOException
      Since:
      2.4
    • wrapInputStream

      protected InputStream wrapInputStream​(InputStream in) throws IOException
      This method is called before the parser starts reading from the input stream. A subclass may override this method to wrap the inputstream with a filtering stream, for example, a gzip input stream. The default implementation of this method returns the original stream unmodified. If a subclass overrides this methods it may also need to override one of the getNumBytes(FlatFileParser) or getProgress(FlatFileParser) methods.
      Parameters:
      in - The input stream to wrap
      Returns:
      The same or a different input stream
      Throws:
      IOException
      Since:
      2.6
      See Also:
      getNumBytes(FlatFileParser)
    • begin

      protected void begin​(FlatFileParser ffp) throws BaseException
      Called just before parsing of the file begins. A subclass may override this method if it needs to initialise some resources before the parsing starts. Note that this method is called once for each file returned by getFileIterator().
      Throws:
      BaseException
      See Also:
      end(boolean)
    • handleHeader

      protected void handleHeader​(FlatFileParser.Line line) throws BaseException
      Called by the parser for every line in the file that is a header line.
      Throws:
      BaseException
    • handleSection

      protected void handleSection​(FlatFileParser.Line line) throws BaseException
      Called by the parser for every line in the file that is a section line.
      Throws:
      BaseException
    • beginData

      protected void beginData() throws BaseException
      Called by the parser after the headers have been parsed but before the first data line is parsed.
      Throws:
      BaseException
    • handleData

      protected abstract void handleData​(FlatFileParser.Data data) throws BaseException
      Called by the parser for every line in the file that is a data line.
      Throws:
      BaseException
    • end

      protected void end​(boolean success) throws BaseException
      Called just after the last line of the file has been parsed, or immediately after an error has ocurred. A subclass should clean up any resources aquired in the begin(FlatFileParser) method here. Note that this metod is called once for every file returned by the getFileIterator() iterator.
      Parameters:
      success - TRUE if the file was parsed successfully, FALSE otherwise
      Throws:
      BaseException
      See Also:
      begin(FlatFileParser)
    • getSuccessMessage

      protected String getSuccessMessage​(int skippedLines)
      Called if the parsing was successful to let the subclass generate a simple message that is sent back to the core and user interface. An example message might by: 178 reporters imported successfully. The default implementation always return null. Note that this method is called once for every file returned by getFileIterator().
      Parameters:
      skippedLines - The number of data lines that were skipped due to errors
    • finish

      protected String finish​(Throwable t)
      Called once when all files has been imported or when exiting due to an error. This method may return a message that is sent back to the core and user interface. If this method returns null the last message returned by getSuccessMessage(int) is used instead.
      Parameters:
      t - Null if no error has happened
      Returns:
      A message or null
      Since:
      2.9
    • getInitializedFlatFileParser

      protected FlatFileParser getInitializedFlatFileParser() throws BaseException
      Get an initialized parser that is configured by job or configuration values.
      Throws:
      BaseException
    • getInitializedFlatFileParser

      protected FlatFileParser getInitializedFlatFileParser​(ParameterValues parameters) throws BaseException
      Create and initialise a flat file parser by setting all regular expressions and other options. This implementation gets all parameters from the AbstractPlugin.job or AbstractPlugin.configuration settings. If a subclass doesn't store the parameters there it must override this method and initialise the parser. Note that this method is called once for each file returned by the getFileIterator() and that a new parser is needed for each file.
      Parameters:
      parameters - ParameterValues implementation to pick options from
      Returns:
      An intialised flat file parser
      Throws:
      BaseException
      Since:
      3.17
    • getCharset

      protected String getCharset()
      Get the character set the file uses. This method looks for a character set in this order:
      1. Job parameter with name Parameters.CHARSET_PARAMETER.
      2. Configuration parameter with name Parameters.CHARSET_PARAMETER.
      3. Character set specified by file to import from File.getCharacterSet().
      4. System default characeter set Config.getCharset().
      first checks the job parameters for a value, then the configuration parameters. If not found the Config.getCharset() is returned.
      Returns:
      The name of the charset to use
      See Also:
      getCharset(Request)
    • getCharset

      protected String getCharset​(Request request)
      Get the character set the file uses. This method first checks the request object, and then calls getCharset().
      Returns:
      The name of the charset to use
      Since:
      2.8
    • getDecimalSeparator

      protected String getDecimalSeparator()
      Get the decimal separator used by numbers in the file. This method first checks the job parameters for a value, then the configuration parameters. If not found null is returned.
      Returns:
      The decimal separator or null to use Float.valueOf() or Double.valueOf()
      Since:
      2.2
    • getNumberFormat

      protected NumberFormat getNumberFormat()
      Get a number formatter that is able to parse numbers with the specified decimal separator. Returns null if no decimal separator has been specified which causes numbers to be parsed with Float.valueOf() or Double.valueOf().
      Returns:
      The number format or null to use Float.valueOf() or Double.valueOf()
      Since:
      2.2
    • getDateFormatter

      protected Formatter<Date> getDateFormatter​(String defaultFormat)
    • getTimestampFormatter

      protected Formatter<Date> getTimestampFormatter​(String defaultFormat)
    • getPattern

      protected Pattern getPattern​(String name) throws BaseException
      Throws:
      BaseException
    • getPattern

      protected Pattern getPattern​(ParameterValues parameters, String name) throws BaseException
      Throws:
      BaseException
    • checkColumnMapping

      protected String checkColumnMapping​(String mapExpression, boolean allowComplex, String name) throws InvalidDataException
      Check if a column mapping is a valid mapping expression and optionally if it is complex mapping.
      Parameters:
      mapExpression - The mapping expression
      allowComplex - If complex column mappings should be allowed
      name - The name of the column (used if an error message needs to be generated)
      Returns:
      The mapping string
      Throws:
      InvalidDataException - If the mapping isn't a valid column mapping or if the allowComplex parameter is false and the mapping is complex
      Since:
      2.4
    • checkColumnMapping

      protected String checkColumnMapping​(FlatFileParser ffp, String mapExpression, boolean allowComplex, String name) throws InvalidDataException
      Checks the syntax column mapping and verifies that the given file parser has found the columns that are used in the file.
      Parameters:
      ffp - An optional flat file parser that should have parsed the file headers with FlatFileParser.parseHeaders()
      mapExpression - The mapping expression
      allowComplex - If complex column mappings should be allowed
      name - The name of the column (used if an error message needs to be generated)
      Returns:
      The mapping string
      Throws:
      InvalidDataException - If the mapping isn't a valid column mapping or if the allowComplex parameter is false and the mapping is complex
      Since:
      2.9
    • getMapper

      protected Mapper getMapper​(FlatFileParser ffp, String mapExpression, Integer maxStringLength, Mapper defaultMapper)
      See Also:
      getMapper(FlatFileParser, String, Integer, Mapper, JepFunction...)
    • getMapper

      protected Mapper getMapper​(FlatFileParser ffp, String mapExpression, Integer maxStringLength, Mapper defaultMapper, JepFunction... functions)
      Get a mapper for the specified flat file parser. This method calls the FlatFileParser.getMapper(String) method to create a mapper, if the map expression isn't null. If a max string length has been specified the created mapper is wrapped by a CropStringMapper that crops strings returned by the Mapper.getValue(FlatFileParser.Data) method to the specified length. Use this method mainly for creating mappers for string values.
      Parameters:
      ffp - The flat file parser
      mapExpression - The map expression, a null value is allows but no mapper is created
      maxStringLength - The maximum allowed string length or null to allow any length
      defaultMapper - The mapper to return if the map expression is null
      functions - Additional JEP functions that should be included in the parser in case the map expression is a JEP expression
      Returns:
      A mapper, or null if the map expression and default mapper is null
      Since:
      3.1
    • getErrorOption

      protected String getErrorOption​(String parameterName)
      Get the value for an error handling parameter. If no value is found for the specified parameter the value for the 'defaultError' parameter is returned.
      Parameters:
      parameterName - The name of the error parameter
      Returns:
      The error option or fail if no option has been set
      Since:
      2.4
    • getErrorHandler

      protected ErrorHandler getErrorHandler​(String method, ErrorHandler defaultErrorHandler)
    • continueWithNextFileAfterError

      protected boolean continueWithNextFileAfterError​(Throwable t)
      If the importer should continue with the next file after an error. The default implementation of this method return FALSE. A subclass may override this method to let the import continue after an error. This method isn't called until an error has happened, and it is possible to control what should happen on a file-by-file bases. The importer will of course exit if this method returns FALSE.
      Parameters:
      t - The error that happened
      Returns:
      TRUE to contine, FALSE to abort
      Since:
      2.9
    • setUpErrorHandling

      protected void setUpErrorHandling()
      Initialise the error handling system. This method is called just before the import of a file is starting. A subclass may override this method to add specific error handlers. If super.setUpErrorHandling() isn't called error handling in AbstractFlatFileImporter is disabled and the subclass must do all it's error handling in it's own code. The subclass may also add error handlers in the begin(FlatFileParser) method. Note that the error handling system is re-initialised for every file returned by getFileIterator().
    • addErrorHandler

      protected void addErrorHandler​(Class<? extends Throwable> t, ErrorHandler handler)
      Add an error handler for the specified class of error. The error handler also handles error that are subclasses of the specified class.
      See Also:
      ClassMapErrorHandler.addErrorHandler(Class, ErrorHandler)
    • log

      protected void log​(String message, FlatFileParser.Data data)
      Log a message about a data line to the log file created by AbstractPlugin.createLogFile(String). If no log file has been created, this method does nothing.
      Parameters:
      message - The message to log
      data - The data line the log message is related to
      Since:
      2.8
    • log

      protected void log​(String message, FlatFileParser.Data data, Throwable t)
      Log an error message about a data line to the log file created by AbstractPlugin.createLogFile(String). If no log file has been created, this method does nothing.
      Parameters:
      message - The message to log
      data - The data line the log message is related to
      t - The error
      Since:
      2.8
    • log

      protected void log​(String message, FlatFileParser.Line line)
      Log a message about a header line to the log file created by AbstractPlugin.createLogFile(String). If no log file has been created, this method does nothing.
      Parameters:
      message - The message to log
      line - The header line line the log message is related to
      Since:
      2.8
    • log

      protected void log​(String message, FlatFileParser.Line line, Throwable t)
      Log an error message about a header line to the log file created by AbstractPlugin.createLogFile(String). If no log file has been created, this method does nothing.
      Parameters:
      message - The message to log
      line - The header line line the log message is related to
      t - The error
      Since:
      2.8