Package net.sf.basedb.plugins
Class IlluminaRawDataImporter
java.lang.Object
net.sf.basedb.core.plugin.AbstractPlugin
net.sf.basedb.plugins.AbstractFlatFileImporter
net.sf.basedb.plugins.IlluminaRawDataImporter
- All Implemented Interfaces:
AutoDetectingImporter
,InteractivePlugin
,Plugin
,SignalTarget
An importer plug-in for Illumina raw data. The plug-in will create one or more
raw bioassays from the data in the file. Optionally, it will add the raw bioassays
to an experiment.
Since the data files doesn't have any coordinate information the importer will use one of the following methods.
-
If no array design has been selected or if the array design is using the
FeatureIdentificationMethod.COORDINATES
method for identifying features: The plug-in create fake coordinates like this: block=1, column=1, row=linenumber in file. The linenmbers start with 1 at the first data line, ie. header lines are not counted. -
If the array design uses the
FeatureIdentificationMethod.POSITION
method for identiyfing features: The plug-in sets the position=line number in file. The linenmbers start with 1 at the first data line, ie. header lines are not counted. -
If the array design uses the
FeatureIdentificationMethod.FEATURE_ID
method for identifying features. The plug-in assumes that the feature ID is the same as the reporter ID.
NOTE! Since the methods are not conflicting with each other, there will not be an actual check which method to use by this plug-in. We will simple set all values as specified above and let the BASE core handle the identification.
- Version:
- 2.4
- Author:
- nicklas
- Last modified
- $Date: 2019-03-21 12:50:52 +0100 (tors, 21 mars 2019) $
-
Nested Class Summary
Modifier and TypeClassDescriptionprivate static class
Holds a RawBioAssay and the RawDataBatcher to use for import and the Mapper for all it's data columns.Nested classes/interfaces inherited from interface net.sf.basedb.core.plugin.Plugin
Plugin.MainType
-
Field Summary
Modifier and TypeFieldDescriptionprivate static final PluginParameter<String>
Section definition for grouping associations to other items: scan, software, protocol and experimentprivate List<DerivedBioAssay>
private RequestInformation
private DbControl
private ArrayDesign
private Experiment
private static final PluginParameter<String>
private static final PluginParameter<String>
private FlatFileParser
private FeatureIdentificationMethod
private static final Set<GuiContext>
private List<FlatFileParser.Line>
private RawDataType
private static final PluginParameter<String>
private static final PluginParameter<String>
private boolean
private boolean
private int
private int
private Protocol
private List<RawBioAssay>
private Mapper
private Software
private static final Pattern
A column header must be like: extendedpropertyname-arraynameprivate boolean
private boolean
Fields inherited from class net.sf.basedb.plugins.AbstractFlatFileImporter
complexMappings, dataFooterRegexpParameter, dataHeaderRegexpParameter, dataSplitterRegexpParameter, defaultErrorParameter, errorSection, excelSheetParameter, fileParameter, fileType, headerRegexpParameter, ignoreRegexpParameter, invalidUseOfNullErrorParameter, mappingSection, maxDataColumnsParameter, minDataColumnsParameter, numberFormatErrorParameter, numberOutOfRangeErrorParameter, numDataColumnsType, optionalRegexpType, parserSection, requiredRegexpType, sectionRegexpParameter, stringTooLongErrorParameter, trimQuotesParameter
Fields inherited from class net.sf.basedb.core.plugin.AbstractPlugin
annotationSection, configuration, COPY_ANNOTATIONS, job, OVERWRITE_ANNOTATIONS, sc
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionprotected void
begin
(FlatFileParser ffp) Called just before parsing of the file begins.protected void
Check column headers and map them to raw bioassays.void
configure
(GuiContext context, Request request, Response response) Configure the plugin.protected void
end
(boolean success) If successful: Close batchers Associate with experiment if there is one Commit If not successful: Delete raw bioassays that has been created Rollbackprivate List<RawBioAssay>
extractAndCreateRawBioAssays
(DbControl dc, List<String> headers, boolean verifyColumns, File rawDataFile) Extract array names and raw data property names from the column headers.private DerivedBioAssay
findBioAssay
(List<DerivedBioAssay> bioAssays, String name) private static String
getColumnName
(RawDataProperty rdp, RawBioAssay rba) Convert a raw data property to a column name.private RequestInformation
getConfigureJobParameters
(GuiContext context) protected String
Get the decimal separator used by numbers in the file.Get a set containing all items that the plugin handles.protected FlatFileParser
Create a FlatFileParser that can parse Illumina data files: Data splitter: (,|\t) Header regexp: (.+)=(.*?)getItems
(DbControl dc, ItemQuery<T> query, Restriction... restrictions) Sort the items by name and add USE permission filter to the query.getRequestInformation
(GuiContext context, String command) This method will return theRequestInformation
for a given command, i.e.protected String
getSuccessMessage
(int skippedLines) Called if the parsing was successful to let the subclass generate a simple message that is sent back to the core and user interface.protected void
Called by the parser for every line in the file that is a data line.protected void
Copy headers so that we can set them on the raw bioassays in beginData()protected boolean
Check that the first line contains the text "Illumina"isInContext
(GuiContext context, Object item) If used from an experiment context, verify that the experiment is an 'illumina' experiment and that the logged in user has permission to write.boolean
Return TRUE, since the implementation requires it for finding the regular expressions used by theFlatFileParser
.boolean
Returns TRUE, since that is how the plugins used to work before this method was introduced.Methods inherited from class net.sf.basedb.plugins.AbstractFlatFileImporter
addErrorHandler, checkColumnMapping, checkColumnMapping, continueWithNextFileAfterError, doImport, finish, getCharset, getCharset, getDateFormatter, getErrorHandler, getErrorOption, getFileIterator, getInitializedFlatFileParser, getMainType, getMapper, getMapper, getNumberFormat, getNumBytes, getPattern, getPattern, getProgress, getSignalHandler, getTimestampFormatter, getTotalFileSize, handleSection, isImportable, log, log, log, log, run, setUpErrorHandling, start, wrapInputStream
Methods inherited from class net.sf.basedb.core.plugin.AbstractPlugin
cloneParameterWithDefaultValue, closeLogFile, createLogFile, done, getCopyAnnotationsParmeter, getCurrentConfiguration, getCurrentJob, getJobOrConfigurationValue, getOverwriteAnnotationsParameters, getPermissions, init, isLogging, log, log, storeValue, storeValue, storeValues, validateRequestParameters
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
Methods inherited from interface net.sf.basedb.core.plugin.Plugin
done, getMainType, getPermissions, init, run
-
Field Details
-
guiContexts
-
featureIdentificationParameter
-
invalidColumnsErrorParameter
-
missingReporterErrorParameter
-
featureMismatchErrorParameter
-
associationsSection
Section definition for grouping associations to other items: scan, software, protocol and experiment -
configureJob
-
illumina
-
ffp
-
dc
-
experiment
-
bioAssays
-
design
-
software
-
protocol
-
fiMethod
-
rawBioAssays
-
holders
-
headerLines
-
reporterMapper
-
numInserted
private int numInserted -
numRawBioAssays
private int numRawBioAssays -
nullIfException
private boolean nullIfException -
verifyColumns
private boolean verifyColumns -
nullIfMissingReporter
private boolean nullIfMissingReporter -
useSmartFeatureMismatchHandling
private boolean useSmartFeatureMismatchHandling -
splitHeader
A column header must be like: extendedpropertyname-arrayname
-
-
Constructor Details
-
IlluminaRawDataImporter
public IlluminaRawDataImporter()
-
-
Method Details
-
supportsConfigurations
public boolean supportsConfigurations()Description copied from class:AbstractPlugin
Returns TRUE, since that is how the plugins used to work before this method was introduced.- Specified by:
supportsConfigurations
in interfacePlugin
- Overrides:
supportsConfigurations
in classAbstractPlugin
- Returns:
- TRUE or FALSE
-
requiresConfiguration
public boolean requiresConfiguration()Description copied from class:AbstractFlatFileImporter
Return TRUE, since the implementation requires it for finding the regular expressions used by theFlatFileParser
. If this method is overridden and returns FALSE, the subclass must also override theAbstractFlatFileImporter.getInitializedFlatFileParser()
method and provide a parser with all regular expressions and other options set.- Specified by:
requiresConfiguration
in interfacePlugin
- Overrides:
requiresConfiguration
in classAbstractFlatFileImporter
- Returns:
- TRUE or FALSE
-
getGuiContexts
Description copied from interface:InteractivePlugin
Get a set containing all items that the plugin handles. Ie. if the plugin imports reporters, return a set containingItem.REPORTER
. This information is used by client applications to put the plugin in the proper place in the user interface.- Specified by:
getGuiContexts
in interfaceInteractivePlugin
- Returns:
- A
Set
containingItem
:s, or null if the plugin is not concerned about items
-
isInContext
If used from an experiment context, verify that the experiment is an 'illumina' experiment and that the logged in user has permission to write.- Specified by:
isInContext
in interfaceInteractivePlugin
- Parameters:
context
- The current context of the client application, it is one of the values found in set returned byInteractivePlugin.getGuiContexts()
item
- The currently active item, it's type should match theGuiContext.getItem()
type, or null if the context is a list context- Returns:
Null
if the plugin can use that item, or a warning-level message explaining why the plugin can't be used
-
getRequestInformation
public RequestInformation getRequestInformation(GuiContext context, String command) throws BaseException Description copied from interface:InteractivePlugin
This method will return theRequestInformation
for a given command, i.e. the list of parameters and some nice help text.- Specified by:
getRequestInformation
in interfaceInteractivePlugin
- Parameters:
context
- The current context of the client application, it is one of the values found in set returned byInteractivePlugin.getGuiContexts()
command
- The command- Returns:
- The
RequestInformation
for the command - Throws:
BaseException
- if there is an error
-
configure
Description copied from interface:InteractivePlugin
Configure the plugin. Hopefully the client is supplying values for the parameters specified byInteractivePlugin.getRequestInformation(GuiContext, String)
.- Specified by:
configure
in interfaceInteractivePlugin
- Parameters:
context
- The current context of the client application, it is one of the values found in set returned byInteractivePlugin.getGuiContexts()
request
- Request object with the command and parametersresponse
- Response object in for the plugin to response through
-
getInitializedFlatFileParser
Create a FlatFileParser that can parse Illumina data files:- Data splitter: (,|\t)
- Header regexp: (.+)=(.*?),*
- Data header: TargetID(,|\t).*
isImportable(FlatFileParser)
) when we know which one is actually used, we change this in the parser. We need to do this since numbers may use comma as decimal separator.- Overrides:
getInitializedFlatFileParser
in classAbstractFlatFileImporter
- Throws:
BaseException
-
getDecimalSeparator
Description copied from class:AbstractFlatFileImporter
Get the decimal separator used by numbers in the file. This method first checks the job parameters for a value, then the configuration parameters. If not found null is returned.- Overrides:
getDecimalSeparator
in classAbstractFlatFileImporter
- Returns:
- As specified by job parameter or "dot" if not
-
isImportable
Check that the first line contains the text "Illumina"- Overrides:
isImportable
in classAbstractFlatFileImporter
- Parameters:
ffp
- The FlatFileParser object used to parse the file- Returns:
- TRUE if the first line contains "Illumina", FALSE otherwise
-
begin
Description copied from class:AbstractFlatFileImporter
Called just before parsing of the file begins. A subclass may override this method if it needs to initialise some resources before the parsing starts. Note that this method is called once for each file returned byAbstractFlatFileImporter.getFileIterator()
.- Overrides:
begin
in classAbstractFlatFileImporter
- Throws:
BaseException
- See Also:
-
handleHeader
Copy headers so that we can set them on the raw bioassays in beginData()- Overrides:
handleHeader
in classAbstractFlatFileImporter
- Throws:
BaseException
-
beginData
protected void beginData()Check column headers and map them to raw bioassays. Create raw bioassays. Initialise columnMapper
:s.- Overrides:
beginData
in classAbstractFlatFileImporter
-
handleData
Description copied from class:AbstractFlatFileImporter
Called by the parser for every line in the file that is a data line.- Specified by:
handleData
in classAbstractFlatFileImporter
- Throws:
BaseException
-
end
If successful:- Close batchers
- Associate with experiment if there is one
- Commit
- Delete raw bioassays that has been created
- Rollback
- Overrides:
end
in classAbstractFlatFileImporter
- Parameters:
success
- TRUE if the file was parsed successfully, FALSE otherwise- Throws:
BaseException
- See Also:
-
getSuccessMessage
Description copied from class:AbstractFlatFileImporter
Called if the parsing was successful to let the subclass generate a simple message that is sent back to the core and user interface. An example message might by:178 reporters imported successfully
. The default implementation always return null. Note that this method is called once for every file returned byAbstractFlatFileImporter.getFileIterator()
.- Overrides:
getSuccessMessage
in classAbstractFlatFileImporter
- Parameters:
skippedLines
- The number of data lines that were skipped due to errors
-
extractAndCreateRawBioAssays
private List<RawBioAssay> extractAndCreateRawBioAssays(DbControl dc, List<String> headers, boolean verifyColumns, File rawDataFile) Extract array names and raw data property names from the column headers. Create a raw bioassay for each array and set headers, scan, software and protocol. Optionally, verify that all arrays have the same data columns. Duplicate columns are always reported as an error. -
findBioAssay
-
getConfigureJobParameters
-
getColumnName
Convert a raw data property to a column name. -
getItems
private <T extends BasicItem> List<T> getItems(DbControl dc, ItemQuery<T> query, Restriction... restrictions) Sort the items by name and add USE permission filter to the query.
-