Package net.sf.basedb.plugins
Class AbstractFlatFileImporter
java.lang.Object
net.sf.basedb.core.plugin.AbstractPlugin
net.sf.basedb.plugins.AbstractFlatFileImporter
- All Implemented Interfaces:
AutoDetectingImporter
,Plugin
,SignalTarget
- Direct Known Subclasses:
AbstractItemImporter
,AnnotationFlatFileImporter
,AnyToAnyImporter
,IlluminaRawDataImporter
,PlateFlatFileImporter
,PlateMappingImporter
,PrintMapFlatFileImporter
,RawDataFlatFileImporter
,ReporterFlatFileImporter
,ReporterMapFlatFileImporter
public abstract class AbstractFlatFileImporter
extends AbstractPlugin
implements AutoDetectingImporter, SignalTarget
An abstract base class for all importers that imports data from one or more
flat files. The implementation in this class uses a
FlatFileParser
for parsing the files and uses a callback method handleData(FlatFileParser.Data)
that lets the subclass do whatever it needs to insert a single line
of data into the database.
The subclass must also generate the RequestInformation
object
for both plugin and job configuration. However, this implementation expects
to find the regular expression needed for the FlatFileParser
to
be found in the plugin configuration and the File
to import to
be found in the job configuration.
All of the needed PluginParameter
objects needed to ask for these
parameters are declared as protected variables in this class.
// ReporterFlatFileImporter.java // RequestInformation object for command = CONFIGURE_PLUGIN List<PluginParameter<?>> parameters = new ArrayList<PluginParameter<?>>(); // Parser regular expressions parameters.add(This class implements theheaderRegexpParameter
); parameters.add(dataHeaderRegexpParameter
); parameters.add(dataSplitterRegexpParameter
); parameters.add(ignoreRegexpParameter
); parameters.add(dataFooterRegexpParameter
); // Column mappings parameters.addAll(allColumnMappings); // Reporter type parameters.add(reporterTypeParameter); configurePlugin = new RequestInformation ( Request.COMMAND_CONFIGURE_PLUGIN, "Parser settings", "Please enter all settings needed by the flat file parser", parameters );
invoke
method but only for
the Request.COMMAND_EXECUTE
command. The subclass must override
the invoke
method and provide implementations for the
other commands (Request.COMMAND_CONFIGURE_PLUGIN
and
Request.COMMAND_CONFIGURE_JOB
). The normal implementation of
would be to store a request parameter.
// ReporterFlatFileImporter.java
if (command.equals(Request.COMMAND_CONFIGURE_JOB))
{
List<Throwable> errors = validateRequestParameters(jobParameters, request);
if (errors != null)
{
response.setError(errors.size()+" invalid parameter(s) were found in the request", errors);
return;
}
storeValue(job, request, fileParameter
);
response.setDone(null);
}
For multi-file support (added in BASE 2.9) the subclass needs to
override the getFileIterator()
and getTotalFileSize()
methods.
- Version:
- 2.0
- Author:
- Enell, Nicklas
- Last modified
- $Date: 2020-10-13 14:31:34 +0200 (Tue, 13 Oct 2020) $
-
Nested Class Summary
Nested classes/interfaces inherited from interface net.sf.basedb.core.plugin.Plugin
Plugin.MainType
-
Field Summary
Modifier and TypeFieldDescriptionprotected static final PluginParameter<String>
Parameter that asks if complex column mappings should be enabled or not.private File
protected static final PluginParameter<String>
Parameter definition that asks for the regular expression that matches the data footer.protected static final PluginParameter<String>
Parameter definition that asks for the regular expression that matches the data header.protected static final PluginParameter<String>
Parameter definition that asks for the regular expression that splits a data line into individual columns.protected static final PluginParameter<String>
private ClassMapErrorHandler
protected static final PluginParameter<String>
Section definition for grouping error handling options.protected static final PluginParameter<String>
Parameter definition for specifying the name of the sheet to use if the selected file is an Excel file.protected static final PluginParameter<File>
Parameter definition that asks for the file that should be imported.protected static final ParameterType<File>
private long
protected static final PluginParameter<String>
Parameter definition that asks for the regular expression that splits a header line in key/value pair.protected static final PluginParameter<String>
Parameter definition that asks for the regular expression that matches a line that should be ignored.protected static final PluginParameter<String>
protected static final PluginParameter<String>
Section definition for grouping all column mapping expressionsprotected static final PluginParameter<Integer>
Parameter definition that asks for the maximum number of columns produced by the data splitter regexp for a line to be a data line.protected static final PluginParameter<Integer>
Parameter definition that asks for the minimum number of columns produced by the data splitter regexp for a line to be a data line.protected static final PluginParameter<String>
protected static final PluginParameter<String>
protected static final ParameterType<Integer>
protected static final ParameterType<String>
protected static final PluginParameter<String>
Section definition for grouping all file parser settings (ie. regular expressions)protected static final ParameterType<String>
protected static final PluginParameter<String>
Parameter definition that asks for the regular expression that matches a section line and extracts the section name.private ThreadSignalHandler
private int
protected static final PluginParameter<String>
private long
protected static final PluginParameter<Boolean>
Parameter definition that asks if quotes around quoted values should be removed or not.Fields inherited from class net.sf.basedb.core.plugin.AbstractPlugin
annotationSection, configuration, COPY_ANNOTATIONS, job, OVERWRITE_ANNOTATIONS, sc
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionprotected void
addErrorHandler
(Class<? extends Throwable> t, ErrorHandler handler) Add an error handler for the specified class of error.protected void
begin
(FlatFileParser ffp) Called just before parsing of the file begins.protected void
Called by the parser after the headers have been parsed but before the first data line is parsed.protected String
checkColumnMapping
(String mapExpression, boolean allowComplex, String name) Check if a column mapping is a valid mapping expression and optionally if it is complex mapping.protected String
checkColumnMapping
(FlatFileParser ffp, String mapExpression, boolean allowComplex, String name) Checks the syntax column mapping and verifies that the given file parser has found the columns that are used in the file.protected boolean
If the importer should continue with the next file after an error.void
doImport
(InputStream in, ProgressReporter progress) Import the data from the givenInputStream
.protected void
end
(boolean success) Called just after the last line of the file has been parsed, or immediately after an error has ocurred.protected String
Called once when all files has been imported or when exiting due to an error.protected String
Get the character set the file uses.protected String
getCharset
(Request request) Get the character set the file uses.getDateFormatter
(String defaultFormat) protected String
Get the decimal separator used by numbers in the file.protected ErrorHandler
getErrorHandler
(String method, ErrorHandler defaultErrorHandler) protected String
getErrorOption
(String parameterName) Get the value for an error handling parameter.Get an iterator that returns the files to be imported.protected FlatFileParser
Get an initialized parser that is configured by job or configuration values.protected FlatFileParser
getInitializedFlatFileParser
(ParameterValues parameters) Create and initialise a flat file parser by setting all regular expressions and other options.ReturnPlugin.MainType.IMPORT
.protected Mapper
getMapper
(FlatFileParser ffp, String mapExpression, Integer maxStringLength, Mapper defaultMapper) protected Mapper
getMapper
(FlatFileParser ffp, String mapExpression, Integer maxStringLength, Mapper defaultMapper, JepFunction... functions) Get a mapper for the specified flat file parser.protected NumberFormat
Get a number formatter that is able to parse numbers with the specified decimal separator.protected long
Get the number of bytes read from the file.protected Pattern
getPattern
(String name) protected Pattern
getPattern
(ParameterValues parameters, String name) protected int
Get the progress of import as a percentage value.Create a newThreadSignalHandler
that supports theSignal.ABORT
signal.protected String
getSuccessMessage
(int skippedLines) Called if the parsing was successful to let the subclass generate a simple message that is sent back to the core and user interface.getTimestampFormatter
(String defaultFormat) protected long
Get the total file size of all files that are going to be imported.protected abstract void
Called by the parser for every line in the file that is a data line.protected void
Called by the parser for every line in the file that is a header line.protected void
Called by the parser for every line in the file that is a section line.final boolean
Check if the givenInputStream
can be imported by this plugin.protected boolean
This method is called by theisImportable(InputStream)
method afterFlatFileParser.nextSection()
andFlatFileParser.parseHeaders()
has been called and if data has been found.protected void
log
(String message, FlatFileParser.Data data) Log a message about a data line to the log file created byAbstractPlugin.createLogFile(String)
.protected void
log
(String message, FlatFileParser.Data data, Throwable t) Log an error message about a data line to the log file created byAbstractPlugin.createLogFile(String)
.protected void
log
(String message, FlatFileParser.Line line) Log a message about a header line to the log file created byAbstractPlugin.createLogFile(String)
.protected void
log
(String message, FlatFileParser.Line line, Throwable t) Log an error message about a header line to the log file created byAbstractPlugin.createLogFile(String)
.boolean
Return TRUE, since the implementation requires it for finding the regular expressions used by theFlatFileParser
.void
run
(Request request, Response response, ProgressReporter progress) Implements theRequest.COMMAND_EXECUTE
command.protected void
Initialise the error handling system.protected void
start()
Called once before starting the import.protected InputStream
This method is called before the parser starts reading from the input stream.Methods inherited from class net.sf.basedb.core.plugin.AbstractPlugin
cloneParameterWithDefaultValue, closeLogFile, createLogFile, done, getCopyAnnotationsParmeter, getCurrentConfiguration, getCurrentJob, getJobOrConfigurationValue, getOverwriteAnnotationsParameters, getPermissions, init, isLogging, log, log, storeValue, storeValue, storeValues, supportsConfigurations, validateRequestParameters
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
Methods inherited from interface net.sf.basedb.core.plugin.Plugin
done, getPermissions, init, supportsConfigurations
-
Field Details
-
requiredRegexpType
-
optionalRegexpType
-
numDataColumnsType
-
fileType
-
sectionRegexpParameter
Parameter definition that asks for the regular expression that matches a section line and extracts the section name. This is an optional parameter.- See Also:
-
headerRegexpParameter
Parameter definition that asks for the regular expression that splits a header line in key/value pair. This is an optional parameter.- See Also:
-
dataHeaderRegexpParameter
Parameter definition that asks for the regular expression that matches the data header. This is a required parameter for the plugin configuration. -
dataSplitterRegexpParameter
Parameter definition that asks for the regular expression that splits a data line into individual columns. This is a required parameter for the plugin configuration. -
trimQuotesParameter
Parameter definition that asks if quotes around quoted values should be removed or not.- See Also:
-
ignoreRegexpParameter
Parameter definition that asks for the regular expression that matches a line that should be ignored. This is an optional parameter for the plugin configuration.- See Also:
-
minDataColumnsParameter
Parameter definition that asks for the minimum number of columns produced by the data splitter regexp for a line to be a data line. This is an optional parameter for the plugin configuration.- See Also:
-
maxDataColumnsParameter
Parameter definition that asks for the maximum number of columns produced by the data splitter regexp for a line to be a data line. This is an optional parameter for the plugin configuration.- See Also:
-
fileParameter
Parameter definition that asks for the file that should be imported. This is a required parameter for the job configuration. -
excelSheetParameter
Parameter definition for specifying the name of the sheet to use if the selected file is an Excel file.- Since:
- 3.15
-
parserSection
Section definition for grouping all file parser settings (ie. regular expressions) -
mappingSection
Section definition for grouping all column mapping expressions -
complexMappings
Parameter that asks if complex column mappings should be enabled or not. UsecheckColumnMapping(String, boolean, String)
to check a mapping.- Since:
- 2.4
-
errorSection
Section definition for grouping error handling options.- See Also:
-
defaultErrorParameter
-
stringTooLongErrorParameter
-
invalidUseOfNullErrorParameter
-
numberOutOfRangeErrorParameter
-
numberFormatErrorParameter
-
currentFile
-
totalFileSize
private long totalFileSize -
finishedFileSize
private long finishedFileSize -
skippedLines
private int skippedLines -
errorHandler
-
signalHandler
-
-
Constructor Details
-
AbstractFlatFileImporter
public AbstractFlatFileImporter()
-
-
Method Details
-
getMainType
ReturnPlugin.MainType.IMPORT
.- Specified by:
getMainType
in interfacePlugin
- Returns:
- One of the defined types
-
requiresConfiguration
public boolean requiresConfiguration()Return TRUE, since the implementation requires it for finding the regular expressions used by theFlatFileParser
. If this method is overridden and returns FALSE, the subclass must also override thegetInitializedFlatFileParser()
method and provide a parser with all regular expressions and other options set.- Specified by:
requiresConfiguration
in interfacePlugin
- Overrides:
requiresConfiguration
in classAbstractPlugin
- Returns:
- TRUE or FALSE
-
run
Implements theRequest.COMMAND_EXECUTE
command. Subclasses must override this to implement other commands. Subclasses should not call this method for other commands thanRequest.COMMAND_EXECUTE
since this method will set an error response status.- Specified by:
run
in interfacePlugin
- Parameters:
request
- Request object with the command and parametersresponse
- Response object in for the plugin to response throughprogress
- AProgressReporter
where the plugin can report its progess, can be null
-
isImportable
Description copied from interface:AutoDetectingImporter
Check if the givenInputStream
can be imported by this plugin.- Specified by:
isImportable
in interfaceAutoDetectingImporter
- Parameters:
in
- The input stream to check- Returns:
- TRUE if the stream can be imported, FALSE otherwise
- Throws:
BaseException
- If something goes wrong
-
doImport
Description copied from interface:AutoDetectingImporter
Import the data from the givenInputStream
.- Specified by:
doImport
in interfaceAutoDetectingImporter
- Parameters:
in
- The input stream to read fromprogress
- Progress reporter used by the caller to keep track of the progress. Null is allowed- Throws:
BaseException
- If something goes wrong
-
getSignalHandler
Create a newThreadSignalHandler
that supports theSignal.ABORT
signal. Subclasses may override this to provide another signal handler, or return null if they don't support signals.- Specified by:
getSignalHandler
in interfaceSignalTarget
- Returns:
- A SignalHandler object, or null if the current instance doesn't support signals
- Since:
- 2.6
-
start
protected void start()Called once before starting the import. The default implementation does nothing.- Since:
- 2.9
-
getFileIterator
Get an iterator that returns the files to be imported. The default implementation returns the single file found in the job's "file" parameter. Subclasses that needs multi-file/item import should override this method to provide their own iterator. They should also override thegetTotalFileSize()
method to return sum of all file sizes. Eg.File.getSize()
.- Since:
- 2.9
-
getTotalFileSize
protected long getTotalFileSize()Get the total file size of all files that are going to be imported. A subclass that is going to import from multiple files needs to override this method. The default implementation return the size of the file in the job's "file" parameter.- Returns:
- The sum of the file sizes, or -1 if not known
- Since:
- 2.9
-
getProgress
Get the progress of import as a percentage value. The default implementation calls sums the file size of the completed files andgetNumBytes(FlatFileParser)
. This values is divided bygetTotalFileSize()
.- Parameters:
ffp
- The file parser that is used to parsed the file- Returns:
- A value between 0 and 100
- Since:
- 2.6
-
getNumBytes
Get the number of bytes read from the file. The value should indicate the how far into the file parsing has proceeded. The default implementation callsFlatFileParser.getParsedBytes()
. If a subclass has wrapped the input stream the number of parsed byts may not correspond to the number of bytes read from the file. For example, if the file is a compressed file the number of parsed bytes will be higher than the number of bytes read from the file.- Parameters:
ffp
- The file parser that is used to parsed the file- Returns:
- The number of bytes read from the original file
- Since:
- 2.6
- See Also:
-
isImportable
This method is called by theisImportable(InputStream)
method afterFlatFileParser.nextSection()
andFlatFileParser.parseHeaders()
has been called and if data has been found. Thus, the default implementation of this method always returns TRUE. Subclasses may override this method to do more checks, for example to make sure certain headers are present or parse more data from the file.- Parameters:
ffp
- The FlatFileParser object used to parse the file- Returns:
- Always TRUE
- Throws:
IOException
- Since:
- 2.4
-
wrapInputStream
This method is called before the parser starts reading from the input stream. A subclass may override this method to wrap the inputstream with a filtering stream, for example, a gzip input stream. The default implementation of this method returns the original stream unmodified. If a subclass overrides this methods it may also need to override one of thegetNumBytes(FlatFileParser)
orgetProgress(FlatFileParser)
methods.- Parameters:
in
- The input stream to wrap- Returns:
- The same or a different input stream
- Throws:
IOException
- Since:
- 2.6
- See Also:
-
begin
Called just before parsing of the file begins. A subclass may override this method if it needs to initialise some resources before the parsing starts. Note that this method is called once for each file returned bygetFileIterator()
.- Throws:
BaseException
- See Also:
-
handleHeader
Called by the parser for every line in the file that is a header line.- Throws:
BaseException
-
handleSection
Called by the parser for every line in the file that is a section line.- Throws:
BaseException
-
beginData
Called by the parser after the headers have been parsed but before the first data line is parsed.- Throws:
BaseException
-
handleData
Called by the parser for every line in the file that is a data line.- Throws:
BaseException
-
end
Called just after the last line of the file has been parsed, or immediately after an error has ocurred. A subclass should clean up any resources aquired in thebegin(FlatFileParser)
method here. Note that this metod is called once for every file returned by thegetFileIterator()
iterator.- Parameters:
success
- TRUE if the file was parsed successfully, FALSE otherwise- Throws:
BaseException
- See Also:
-
getSuccessMessage
Called if the parsing was successful to let the subclass generate a simple message that is sent back to the core and user interface. An example message might by:178 reporters imported successfully
. The default implementation always return null. Note that this method is called once for every file returned bygetFileIterator()
.- Parameters:
skippedLines
- The number of data lines that were skipped due to errors
-
finish
Called once when all files has been imported or when exiting due to an error. This method may return a message that is sent back to the core and user interface. If this method returns null the last message returned bygetSuccessMessage(int)
is used instead.- Parameters:
t
- Null if no error has happened- Returns:
- A message or null
- Since:
- 2.9
-
getInitializedFlatFileParser
Get an initialized parser that is configured by job or configuration values.- Throws:
BaseException
-
getInitializedFlatFileParser
protected FlatFileParser getInitializedFlatFileParser(ParameterValues parameters) throws BaseException Create and initialise a flat file parser by setting all regular expressions and other options. This implementation gets all parameters from theAbstractPlugin.job
orAbstractPlugin.configuration
settings. If a subclass doesn't store the parameters there it must override this method and initialise the parser. Note that this method is called once for each file returned by thegetFileIterator()
and that a new parser is needed for each file.- Parameters:
parameters
- ParameterValues implementation to pick options from- Returns:
- An intialised flat file parser
- Throws:
BaseException
- Since:
- 3.17
-
getCharset
Get the character set the file uses. This method looks for a character set in this order:- Job parameter with name
Parameters.CHARSET_PARAMETER
. - Configuration parameter with name
Parameters.CHARSET_PARAMETER
. - Character set specified by file to import from
File.getCharacterSet()
. - System default characeter set
Config.getCharset()
.
Config.getCharset()
is returned.- Returns:
- The name of the charset to use
- See Also:
- Job parameter with name
-
getCharset
Get the character set the file uses. This method first checks the request object, and then callsgetCharset()
.- Returns:
- The name of the charset to use
- Since:
- 2.8
-
getDecimalSeparator
Get the decimal separator used by numbers in the file. This method first checks the job parameters for a value, then the configuration parameters. If not found null is returned.- Returns:
- The decimal separator or null to use Float.valueOf() or Double.valueOf()
- Since:
- 2.2
-
getNumberFormat
Get a number formatter that is able to parse numbers with the specified decimal separator. Returns null if no decimal separator has been specified which causes numbers to be parsed with Float.valueOf() or Double.valueOf().- Returns:
- The number format or null to use Float.valueOf() or Double.valueOf()
- Since:
- 2.2
-
getDateFormatter
-
getTimestampFormatter
-
getPattern
- Throws:
BaseException
-
getPattern
- Throws:
BaseException
-
checkColumnMapping
protected String checkColumnMapping(String mapExpression, boolean allowComplex, String name) throws InvalidDataException Check if a column mapping is a valid mapping expression and optionally if it is complex mapping.- Parameters:
mapExpression
- The mapping expressionallowComplex
- If complex column mappings should be allowedname
- The name of the column (used if an error message needs to be generated)- Returns:
- The mapping string
- Throws:
InvalidDataException
- If the mapping isn't a valid column mapping or if the allowComplex parameter is false and the mapping is complex- Since:
- 2.4
-
checkColumnMapping
protected String checkColumnMapping(FlatFileParser ffp, String mapExpression, boolean allowComplex, String name) throws InvalidDataException Checks the syntax column mapping and verifies that the given file parser has found the columns that are used in the file.- Parameters:
ffp
- An optional flat file parser that should have parsed the file headers withFlatFileParser.parseHeaders()
mapExpression
- The mapping expressionallowComplex
- If complex column mappings should be allowedname
- The name of the column (used if an error message needs to be generated)- Returns:
- The mapping string
- Throws:
InvalidDataException
- If the mapping isn't a valid column mapping or if the allowComplex parameter is false and the mapping is complex- Since:
- 2.9
-
getMapper
protected Mapper getMapper(FlatFileParser ffp, String mapExpression, Integer maxStringLength, Mapper defaultMapper) -
getMapper
protected Mapper getMapper(FlatFileParser ffp, String mapExpression, Integer maxStringLength, Mapper defaultMapper, JepFunction... functions) Get a mapper for the specified flat file parser. This method calls theFlatFileParser.getMapper(String)
method to create a mapper, if the map expression isn't null. If a max string length has been specified the created mapper is wrapped by aCropStringMapper
that crops strings returned by theMapper.getValue(FlatFileParser.Data)
method to the specified length. Use this method mainly for creating mappers for string values.- Parameters:
ffp
- The flat file parsermapExpression
- The map expression, a null value is allows but no mapper is createdmaxStringLength
- The maximum allowed string length or null to allow any lengthdefaultMapper
- The mapper to return if the map expression is nullfunctions
- Additional JEP functions that should be included in the parser in case the map expression is a JEP expression- Returns:
- A mapper, or null if the map expression and default mapper is null
- Since:
- 3.1
-
getErrorOption
Get the value for an error handling parameter. If no value is found for the specified parameter the value for the 'defaultError' parameter is returned.- Parameters:
parameterName
- The name of the error parameter- Returns:
- The error option or
fail
if no option has been set - Since:
- 2.4
-
getErrorHandler
-
continueWithNextFileAfterError
If the importer should continue with the next file after an error. The default implementation of this method return FALSE. A subclass may override this method to let the import continue after an error. This method isn't called until an error has happened, and it is possible to control what should happen on a file-by-file bases. The importer will of course exit if this method returns FALSE.- Parameters:
t
- The error that happened- Returns:
- TRUE to contine, FALSE to abort
- Since:
- 2.9
-
setUpErrorHandling
protected void setUpErrorHandling()Initialise the error handling system. This method is called just before the import of a file is starting. A subclass may override this method to add specific error handlers. Ifsuper.setUpErrorHandling()
isn't called error handling in AbstractFlatFileImporter is disabled and the subclass must do all it's error handling in it's own code. The subclass may also add error handlers in thebegin(FlatFileParser)
method. Note that the error handling system is re-initialised for every file returned bygetFileIterator()
. -
addErrorHandler
Add an error handler for the specified class of error. The error handler also handles error that are subclasses of the specified class. -
log
Log a message about a data line to the log file created byAbstractPlugin.createLogFile(String)
. If no log file has been created, this method does nothing.- Parameters:
message
- The message to logdata
- The data line the log message is related to- Since:
- 2.8
-
log
Log an error message about a data line to the log file created byAbstractPlugin.createLogFile(String)
. If no log file has been created, this method does nothing.- Parameters:
message
- The message to logdata
- The data line the log message is related tot
- The error- Since:
- 2.8
-
log
Log a message about a header line to the log file created byAbstractPlugin.createLogFile(String)
. If no log file has been created, this method does nothing.- Parameters:
message
- The message to logline
- The header line line the log message is related to- Since:
- 2.8
-
log
Log an error message about a header line to the log file created byAbstractPlugin.createLogFile(String)
. If no log file has been created, this method does nothing.- Parameters:
message
- The message to logline
- The header line line the log message is related tot
- The error- Since:
- 2.8
-