Class FlatFileParser
- Data must be organised into columns, with one record per line
- Each data column must be separated by some special character or character sequence not occuring in the data, for example a tab or a comma. Data in fixed-size columns cannot be parsed.
- Data may optionally be preceeded by a data header, ie. the names of the columns
- The data header may optionally be preceeded by file headers. A file header is something that can be split in a name-value pair.
- The file may contain comments, which are ignored by the parser
- The file contain section where each section can contain a header and/or a data part
Example
# Example of a parsable file, not actual format of a GenePix file section info Type=GenePix Results 1.3 DateTime=2002/09/04 13:59:48 Scanner=GenePix 4000B [83306] section data Block Column Row Name ID 1 1 1 "Ly68_Lymphocyte antigen 68" "M000205_01" 1 2 1 "Bag1_Bcl2-associated athanogene 1" "M000209_01" 1 3 1 "Rps16_Ribosomal protein S16" "M000213_01" 1 4 1 "Col4a1_Procollagen, type IV, alpha 1" "M000229_01" 1 5 1 "Ace_Angiotensin converting enzyme" "M000233_01" 1 6 1 "Cd5_CD5 antigen" "M000237_01" 1 7 1 "Psme1_Protease (prosome, macropain) 28 s"
If the file is an Excel file the first sheet is automatically converted to a
tab-separated text file by default. To use a different sheet call the
setExcelSheet(String)
method with the name or index of the sheet.
Initial parsing and regular expression matching is always done against the
text representation of the selected sheet. Note that empty cells on the top
and left are usually cut away. When retrieving values via the FlatFileParser.Data
class it will typically go directly to the mapped cell from the Excel sheet
and get the value, which means that numeric and date values doesn't have to
be converted to and from strings if not needed. For example, if a date
value is requested and the mapped cell is date that will be used as it is,
but it the mapped cell is a string, the same parsers that are used for CSV
files are used to convert the string to a date.
How to use
The parsing is controlled by regular expressions. Start by
creating a new FlatFileParser
object. Use the
various set
methods to provide regular expression
used to match the data/headers.
Use the setInputStream(InputStream, String)
method to specify a file to parse, and parseHeaders()
to start the
parsing. Note! Even if you know that
the file doesn't contain any headers, you should always call this
method since the parser must initialize itself. If there are sections
in the file use nextSection()
first to control which section
you are parsing from.
When the headers have been found use the hasMoreData()
and nextData()
methods in a loop to read all
data from the section.
Example
FlatFileParser ffp = new FlatFileParser(); ffp.setHeaderRegexp(Pattern.compile("(.*)=(.*)")); ffp.setDataHeaderRegexp(Pattern.compile("Block\\tColumn\\tRow\\tName\\tID")); ffp.setDataSplitterRegexp(Pattern.compile("\\t")); ffp.setIgnoreRegexp(Pattern.compile("#.*")); ffp.setMinDataColumns(5); ffp.setMaxDataColumns(5); ffp.setInputStream(FileUtil.getInputStream(path_to_file), Config.getCharset()); ffp.parseHeaders(); for (int i = 0; i < ffp.getLineCount(); i++) { FlatFileParser.Line line = ffp.getLine(i); System.out.println(i+":"+line.type()+":"+line.line()); } int i = 0; while (ffp.hasMoreData()) { FlatFileParser.Data data = ffp.nextData(); System.out.println(i+":"+data.columns()+":"+data.line()); }
Mapping column values
With the FlatFileParser.Data
object you can only
access the data by column index (0-based). Another approach is to
use Mapper
:s. A mapper takes a string template and inserts the
values of the data columns where you specify. Here are some example:
\1\ \row\ Row: \row\, Col:\col\ =2 * col('Radius')The result can be retrieved either as a string, as a numeric value or as a date. It is even possible to create expressions that does a calculation on the value before it is returned. See the
getMapper(String)
method for more information.- Version:
- 2.0
- Author:
- Nicklas, Enell
- Last modified
- $Date$
-
Nested Class Summary
Modifier and TypeClassDescriptionstatic class
This class holds data about a line parsed by thehasMoreData()
method.(package private) static class
Subclass that is used to return data when the source file is an Excel file.static class
This class holds data about a line parsed by theparseHeaders()
method.static enum
Represents the type of a line matched or unmatched by the parser. -
Field Summary
Modifier and TypeFieldDescriptionprivate Pattern
The regular expression for matching the beginning-of-file markerprivate String
The value that was captured by the bofMarker pattern.List of the column names found by splitting the data header using the data splitter regexp.private Pattern
The regular expression for matching the data footer line.private Pattern
The regular expression for matching the data header line.private Pattern
The regular expression for splitting a data line.The default date formatter to use when creating mappers.static final int
The default value for the number of unknown lines in a row that may be encountered by theparseHeaders
method before it gives up.private boolean
Ifnull
should be returned for empty columns (instead of an empty string).private int
The value of the parsedLines when the current Excel sheet is parsed.private XlsxToCsvUtil.SheetInfo
Excel sheet that is currently being parsed.private String
Name of the Excel sheet to parse if the file is an Excel file.private XlsxToCsvUtil
Excel workbook that has been loaded.private static final Pattern
Pattern used to find column mappings in a string, ie. abc \col\ defprivate Pattern
The regular expression for matching a header line.Map of header lines parsed by theparseHeaders()
method.private Pattern
The regular expression for matching a comment line.private int
Number of ignored lines in thenextData()
method.private boolean
If non-existing columns should be ignored (true) or result in an exception (false)private boolean
If unknown or ignored lines should be kept.private List<FlatFileParser.Line>
List of lines parsed by theparseHeaders()
method.private int
The maximun number of allowed data columns for a line to be considered a data line.private int
The maximum number of unkown lines to parse before giving up.private int
The minimun number of allowed data columns for a line to be considered a data line.private FlatFileParser.Data
The next available data line as parsed by thehasMoreData()
method.private FlatFileParser.Line
The line that last matched thesection
.private boolean
Ifnull
should be returned for the string NULL (ignoring case) or not.private NumberFormat
The default number formatter to use for creating mappers.private boolean
Flag to indicate if only a single (=false) or all (=true) Excel sheets should be parsed in one go.private long
The total number of parsed characters so far.private int
The number of data lines parsed in the current section so far.private int
The total number of lines parsed so far.private int
The number of sections parsed so far.private BufferedReader
Reads from the given input streamprivate Pattern
The regular expression for matching the fist line of a section.private List<FlatFileParser.Line>
List for keeping ignored and unknown lines in thenextData()
method.The default timestamp format to use when creating mappers.private InputStreamTracker
For keeping track of the number of bytes parsed.private boolean
If quotes should be trimmed from data values or not.private boolean
If white space should be trimmed from data values or not.private int
Number of unknown lines in thenextData()
method.private boolean
Ifnull
should be returned if a (numeric) value can't be parsed. -
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionprivate String
convertToNull
(String value) findColumnHeaderIndex
(String regex) Find the index of a column header using a regular expression for pattern matching.Get the value captured by the BOF marker regular expression.getColumnHeaderIndex
(String name) Get the index of a column header with a given name.Get all column headers that were found by splitting the line matching thesetDataHeaderRegexp(Pattern)
pattern using thesetDataSplitterRegexp(Pattern)
pattern.If the input stream that is being parsed is an Excel document, this method returns information about it.If the input stream that is being parsed is an Excel document, this method returns information about the current worksheet.getDateMapper
(String expression) Get a mapper using the default date format.Get the default date format.Get the default number format.Get the default timestamp format.Get the name of the Excel sheet that should be or is parsed.Get the value of the header with the specified name.Get the names of all headers found by theparseHeaders()
method.int
Get the number of lines that the last call tonextData()
orhasMoreData()
ignored because they matched the ignore regular expression.getLine
(int index) Get the line with the specified number.int
Get the number of lines that theparseHeaders()
method parsed.getLines()
Get the lines read byparseHeaders()
.Get a mapper using the default number format.Get a mapper using the default number format.getMapper
(String expression, NumberFormat numberFormat) Get a mapper using a specific number format.getMapper
(String expression, NumberFormat numberFormat, boolean nullIfException) getMapper
(String expression, NumberFormat numberFormat, boolean nullIfException, JepFunction... functions) getMapper
(String expression, NumberFormat numberFormat, Formatter<Date> dateFormat, boolean nullIfException, JepFunction... functions) Create a mapper object that maps an expression string to a value.Get a mapper using the specified date format.getMapper
(String expression, JepFunction... functions) int
Get the number of lines that the last call tonextData()
orhasMoreData()
ignored because they matched the ignore regular expression or couldn't be interpreted as data lines.boolean
long
Get the number of parsed bytes so far.long
Get the number of parsed characters so far.int
Get the number of parsed data lines so far in the current section.int
Get the number of parsed lines so far.int
Get the number of found sections so far.Get lines that was skipped during the last call tonextData()
orhasMoreData()
.getTimestampMapper
(String expression) Get a mapper using the default timestamp format.int
Get the number of lines that the last call tonextData()
orhasMoreData()
ignored because they couldn't be interpreted as data lines.boolean
boolean
Check if the input stream contains more data.boolean
Check if the input stream contains more sections.nextData()
Get the next available data.Get the next line that matches thesection
regular expression.Start parsing the input stream.boolean
Parse the file until the beginning-of-file marker is found.void
setBofMarkerRegexp
(Pattern regexp) Set a regular expression that maches a beginning-of-file marker.void
setDataFooterRegexp
(Pattern regexp) Set a regular expression that can be matched against a data footer.void
setDataHeaderRegexp
(Pattern regexp) Set a regular expression that can be matched against the data header.void
setDataSplitterRegexp
(Pattern regexp) Set a regular expression that is used to split a data line into columns.void
setDefaultDateFormat
(Formatter<Date> dateFormat) Set the default date format to use when creating mappers.void
setDefaultNumberFormat
(NumberFormat numberFormat) Set the default number format to use when creating mappers.void
setDefaultTimestampFormat
(Formatter<Date> timestampFormat) Set the default timestamp format to use when creating mappers.void
setExcelSheet
(String name) Set the name of Excel worksheet to parse if the given file is an Excel file, otherwise this is ignored.void
setHeaderRegexp
(Pattern regexp) Set a regular expression that can be matched against a header.void
setIgnoreNonExistingColumns
(boolean ignoreNonExistingColumns) Specify if trying to create a mapper with one of thegetMapper(String)
methods for an expression which references a non-existing column should result in an exception or be ignored.void
setIgnoreRegexp
(Pattern regexp) Set a regular expression that is used to match a line that should be ignored.void
setInputStream
(InputStream in, String charsetOrSheetName) Set the input stream that will be parsed.void
setKeepSkippedLines
(boolean keep) If thenextData()
andhasMoreData()
methods should keep information of lines that was skipped because they matched the ignore pattern or could be interpreted as data lines.void
setMaxDataColumns
(int columns) Set the maximum number of columns a data line can contain in order for it to be counted as a data line.void
setMaxUnknownLines
(int lines) The number of unknown lines in a row that can be parsed by theparseHeaders
method before it gives up.void
setMinDataColumns
(int columns) Set the minimum number of columns a data line must contain in order for it to be counted as a data line.void
setParseAllExcelSheets
(boolean parseAllExcelSheets) If this flag is set and the source file is an Excel file, then all sheets will be parsed unless a named sheet specified.void
setSectionRegexp
(Pattern regexp) Set a regular expression that can be matched against the section line.void
setTrimQuotes
(boolean trimQuotes) Set if quotes around each data value should be removed or not.void
setTrimWhiteSpace
(boolean trimWhiteSpace) Set a flag indicating if white-space should be trimmed from start and end of data values.void
setUseNullIfEmpty
(boolean emptyIsNull) Specify ifnull
values should be returned instead of empty strings for columns that doesn't contain any value.void
setUseNullIfException
(boolean useNullIfException) Specify ifnull
should be returned if a (numeric) value can't be parsed.void
setUseNullIfNull
(boolean nullIsNull) Specify ifnull
values should be returned for strings having the value "NULL" (ignoring case).String[]
trimQuotes
(String[] columns) Remove enclosing quotes (" or ') around all columns.
-
Field Details
-
DEFAULT_MAX_UNKNOWN_LINES
public static final int DEFAULT_MAX_UNKNOWN_LINESThe default value for the number of unknown lines in a row that may be encountered by theparseHeaders
method before it gives up.- See Also:
-
findColumn
Pattern used to find column mappings in a string, ie. abc \col\ def -
reader
Reads from the given input stream -
tracker
For keeping track of the number of bytes parsed. -
excelSheetName
Name of the Excel sheet to parse if the file is an Excel file. -
parseAllExcelSheets
private boolean parseAllExcelSheetsFlag to indicate if only a single (=false) or all (=true) Excel sheets should be parsed in one go. -
excelWorkbook
Excel workbook that has been loaded. -
excelSheet
Excel sheet that is currently being parsed. -
excelParsedLinesOffset
private int excelParsedLinesOffsetThe value of the parsedLines when the current Excel sheet is parsed. Used to find the correct row number in the current sheet (=parsedLines - excelParsedLinesOffset) -
bofMarker
The regular expression for matching the beginning-of-file marker -
header
The regular expression for matching a header line. -
section
The regular expression for matching the fist line of a section. The expression must have one capturing group. -
dataHeader
The regular expression for matching the data header line. The expression must have two capturing groups. -
dataSplitter
The regular expression for splitting a data line. -
trimQuotes
private boolean trimQuotesIf quotes should be trimmed from data values or not. Default true. Quotes are double or single quotes. -
trimWhiteSpace
private boolean trimWhiteSpaceIf white space should be trimmed from data values or not. Default is false.- Since:
- 3.15.1
-
minDataColumns
private int minDataColumnsThe minimun number of allowed data columns for a line to be considered a data line. -
maxDataColumns
private int maxDataColumnsThe maximun number of allowed data columns for a line to be considered a data line. -
ignore
The regular expression for matching a comment line. -
maxUnknownLines
private int maxUnknownLinesThe maximum number of unkown lines to parse before giving up. -
emptyIsNull
private boolean emptyIsNullIfnull
should be returned for empty columns (instead of an empty string). -
useNullIfException
private boolean useNullIfExceptionIfnull
should be returned if a (numeric) value can't be parsed. -
ignoreNonExistingColumns
private boolean ignoreNonExistingColumnsIf non-existing columns should be ignored (true) or result in an exception (false) -
nullIsNull
private boolean nullIsNullIfnull
should be returned for the string NULL (ignoring case) or not. -
numberFormat
The default number formatter to use for creating mappers. -
dateFormat
The default date formatter to use when creating mappers. -
timestampFormat
The default timestamp format to use when creating mappers. -
bofType
The value that was captured by the bofMarker pattern. -
lines
List of lines parsed by theparseHeaders()
method. -
parsedLines
private int parsedLinesThe total number of lines parsed so far. -
parsedSections
private int parsedSectionsThe number of sections parsed so far. -
parsedCharacters
private long parsedCharactersThe total number of parsed characters so far. -
parsedDataLines
private int parsedDataLinesThe number of data lines parsed in the current section so far. This value is reset at each new section. -
headers
Map of header lines parsed by theparseHeaders()
method. The map contains name -> value pairs -
columnHeaders
List of the column names found by splitting the data header using the data splitter regexp. -
nextSection
The line that last matched thesection
. -
nextData
The next available data line as parsed by thehasMoreData()
method. -
ignoredLines
private int ignoredLinesNumber of ignored lines in thenextData()
method. -
unknownLines
private int unknownLinesNumber of unknown lines in thenextData()
method. -
keepSkippedLines
private boolean keepSkippedLinesIf unknown or ignored lines should be kept.- See Also:
-
skippedLines
List for keeping ignored and unknown lines in thenextData()
method.
-
-
Constructor Details
-
FlatFileParser
public FlatFileParser()Create a newFlatFileParser
object.
-
-
Method Details
-
setBofMarkerRegexp
Set a regular expression that maches a beginning-of-file marker. This property should be set before starting to parse the file (otherwise it is ignored). The first method call that causes the parsing to be started will invokeparseToBof()
(can also be invoked manually).The regular expression may contain a single capturing group. The matched value is returned by
getBofType()
.- Parameters:
regexp
- A regular expression- Since:
- 2.15
-
setHeaderRegexp
Set a regular expression that can be matched against a header. The regular expression must contain two capturing groups, the first should capture the name and the second the value of the header. For example, the file contains headers like:"Type=GenePix Results 1.3" "DateTime=2002/09/04 13:59:48"
To match this we can use the following regular expression:"(.*)=(.*)"
.- Parameters:
regexp
- A regular expression
-
setSectionRegexp
Set a regular expression that can be matched against the section line. For example, the file contains a section like:[FileInformation]
To match this we can use the following regular expression:section (.*)
. This will match to anything that starts with "section ". The section name will be in the capturing group.- Parameters:
regexp
- A regular expression
-
setDataHeaderRegexp
Set a regular expression that can be matched against the data header. For example, the file contains a data header like:"Block"{tab}"Column"{tab}"Row"{tab}"Name"{tab}"ID" ...and so on
To match this we can use the following regular expression:"(.*?)"(\t"(.*?)")
. This will match to anything that has at least two columns. We could also be more specific and use:"Block"\t"Column"\t"Row"\t"Name"\t"ID"...
- Parameters:
regexp
- A regular expression
-
setDataSplitterRegexp
Set a regular expression that is used to split a data line into columns. To split on tabs we use:\t
. This regular expression is also used to split the data header line into column names, which can then be used in thegetMapper(String)
method.- Parameters:
regexp
- A regular expression- See Also:
-
setTrimQuotes
public void setTrimQuotes(boolean trimQuotes) Set if quotes around each data value should be removed or not. A quote is either a double quote (") or a single quote ('). The default setting of this option is true.- Parameters:
trimQuotes
- TRUE to remove quotes, FALSE to keep them
-
setTrimWhiteSpace
public void setTrimWhiteSpace(boolean trimWhiteSpace) Set a flag indicating if white-space should be trimmed from start and end of data values. The default setting is false.- Parameters:
trimWhiteSpace
- TRUE to remove white-space, FALSE to keep them- Since:
- 3.15.1
-
setMinDataColumns
public void setMinDataColumns(int columns) Set the minimum number of columns a data line must contain in order for it to be counted as a data line.- Parameters:
columns
- The minimum number of columns
-
setMaxDataColumns
public void setMaxDataColumns(int columns) Set the maximum number of columns a data line can contain in order for it to be counted as a data line.- Parameters:
columns
- The maximum number of columns, or 0 for an unlimited number, or -1 to disable counting the number of columns
-
setIgnoreRegexp
Set a regular expression that is used to match a line that should be ignored. For example, the file may contain comments starting with a #:\#.*
- Parameters:
regexp
- A regular expression
-
setMaxUnknownLines
public void setMaxUnknownLines(int lines) The number of unknown lines in a row that can be parsed by theparseHeaders
method before it gives up. The default value is specified by {#link #DEFAULT_MAX_UNKNOWN_LINES}. This value is ignored while parsing data.- Parameters:
lines
- The number of lines
-
setUseNullIfEmpty
public void setUseNullIfEmpty(boolean emptyIsNull) Specify ifnull
values should be returned instead of empty strings for columns that doesn't contain any value.- Parameters:
emptyIsNull
- TRUE to return null, FALSE to return an empty string
-
setUseNullIfNull
public void setUseNullIfNull(boolean nullIsNull) Specify ifnull
values should be returned for strings having the value "NULL" (ignoring case).- Parameters:
nullIsNull
- TRUE to return null, FALSE to return the original string value
-
setKeepSkippedLines
public void setKeepSkippedLines(boolean keep) If thenextData()
andhasMoreData()
methods should keep information of lines that was skipped because they matched the ignore pattern or could be interpreted as data lines. The default is FALSE. The number of lines that was skipped is always available regardless of this setting.- Parameters:
keep
- TRUE to keep line information, FALSE to not- See Also:
-
setParseAllExcelSheets
public void setParseAllExcelSheets(boolean parseAllExcelSheets) If this flag is set and the source file is an Excel file, then all sheets will be parsed unless a named sheet specified. Each sheet is handled like a section with the sheet name inside brackets ([name]). The regular expression for detecting a section is automatically updated to match this pattern.- Since:
- 3.15
-
getParseAllExcelSheets
public boolean getParseAllExcelSheets()- Since:
- 3.15
- See Also:
-
setExcelSheet
Set the name of Excel worksheet to parse if the given file is an Excel file, otherwise this is ignored.- Since:
- 3.15
-
getExcelSheet
Get the name of the Excel sheet that should be or is parsed.- Since:
- 3.15
-
getCurrentExcelWorkbook
If the input stream that is being parsed is an Excel document, this method returns information about it.- Returns:
- An XlsxToCsvUtil object or null if the stream is not an Excel document
- Since:
- 3.15.1
-
getCurrentSheet
If the input stream that is being parsed is an Excel document, this method returns information about the current worksheet.- Returns:
- An XlsxToCsvUtil.SheetInfo object or null if the stream is not an Excel document
- Since:
- 3.15.1
-
setInputStream
Set the input stream that will be parsed. The stream can be either a text CSV-like stream or an Excel workbook (xlsx). If the stream is an Excel workbook the following apply: If no date format has been specified, yyyy-MM-dd is used If no timestamp format has been specified, yyyy-MM-dd HH:mm:ss is used If no number format has been specified, 'dot' is used The data splitter regular expression is changed to \\t The section regular expression is changed to [.*] (if thegetParseAllExcelSheets()
flag is set- Parameters:
in
- TheInputStream
charsetOrSheetName
- If CSV, the name of the character set to use when parsing the file, or null to use the default charset specified byConfig.getCharset()
If Excel, the name or index of the worksheet in the workbook, the default is to parse the first sheet (index=0) or the whole workbook if thegetParseAllExcelSheets()
flag is set- Since:
- 2.1.1
-
parseToBof
Parse the file until the beginning-of-file marker is found. If no regular expression has been set withsetBofMarkerRegexp(Pattern)
or if the parsing of the file has already started, this method call is ignored.- Returns:
- TRUE if this call resulted in parsing and the BOF marker was found, FALSE otherwise
- Throws:
IOException
- Since:
- 2.15
-
getBofType
Get the value captured by the BOF marker regular expression. If no capturing groups was specified in the pattern this value is the string that matched the entire pattern.- Returns:
- The matched value, or null if BOF matching has not been done
- Since:
- 2.15
-
parseHeaders
Start parsing the input stream. The parser will read a single line at a time. Each line is checked in the following order:- Does it match the
section
regular expression? - Does it match the
header
regular expression? - Does it match the
data header
regular expression? - Does it match the
comment
regular expression? - Can it be split by the
data
regular expression into the appropriate number of columns?
FlatFileParser.LineType.UNKNOWN
and processing is continued with the next line. If too many unkown lines in a row has been found the method also returns. This should be considered as a failure to parse the specified file.The method returns the type of the last line that was parsed as follows:
FlatFileParser.LineType.SECTION
: The last line was a section. Header, data header or data may follow this line.FlatFileParser.LineType.DATA_HEADER
: The last line was the data header. It is expected that data should follow.FlatFileParser.LineType.DATA
: The last line was a data line. More data may follow.FlatFileParser.LineType.UNKNOWN
: The last line was of unknown format. The file could not be parsed.
- Returns:
- The
FlatFileParser.LineType
of the last parsed line - Throws:
IOException
- If reading the file fails.
- Does it match the
-
convertToNull
-
getHeaderNames
Get the names of all headers found by theparseHeaders()
method. To get the value of a header, use thegetHeader(String)
method. -
getHeader
Get the value of the header with the specified name. This method should only be used afterparseHeaders()
has been completed.- Parameters:
name
- The name of the header- Returns:
- The value of the header, or null if it was not found
- See Also:
-
getLineCount
public int getLineCount()Get the number of lines that theparseHeaders()
method parsed.- Returns:
- The number of lines parsed
-
getLine
Get the line with the specified number. This method should only be used afterparseHeaders()
has been completed.- Parameters:
index
- The line number, starting at 0- Returns:
- A
Line
object - See Also:
-
getLines
Get the lines read byparseHeaders()
.- Returns:
- The lines in the order that they have been read.
-
getColumnHeaders
Get all column headers that were found by splitting the line matching thesetDataHeaderRegexp(Pattern)
pattern using thesetDataSplitterRegexp(Pattern)
pattern. This method should only be called afterparseHeaders()
has been called.- Returns:
- A list containing the column headers, or null if no headers have been found
-
getColumnHeaderIndex
Get the index of a column header with a given name. This method should only be called afterparseHeaders()
has been called. If more than one header with the same name exists the index of the first is returned.- Parameters:
name
- The name of the column header- Returns:
- The index, or null if no header with that name exists
- See Also:
-
findColumnHeaderIndex
Find the index of a column header using a regular expression for pattern matching. This method should only be called afterparseHeaders()
has been called. If more than one header matches the regular expression only the first one found is returned.- Parameters:
regex
- The regular expression used to match the header names- Returns:
- The index, or null if no header is matching the regular expression or if the string is not a valid regular expression
- Since:
- 2.5
- See Also:
-
setDefaultNumberFormat
Set the default number format to use when creating mappers.- Parameters:
numberFormat
- The number format to use, or null to parse numbers with Float.valueOf or Double.valueOf- Since:
- 2.2
- See Also:
-
getDefaultNumberFormat
Get the default number format.- Returns:
- The number format, or null if none has been specified
- Since:
- 2.2
-
setDefaultDateFormat
Set the default date format to use when creating mappers. If null, xxx is used.- Since:
- 3.15
-
getDefaultDateFormat
Get the default date format.- Since:
- 3.15
-
setDefaultTimestampFormat
Set the default timestamp format to use when creating mappers. If null, xxx is used.- Since:
- 3.15
-
getDefaultTimestampFormat
Get the default timestamp format.- Since:
- 3.15
-
setUseNullIfException
public void setUseNullIfException(boolean useNullIfException) Specify ifnull
should be returned if a (numeric) value can't be parsed. If this setting is set to TRUE all mappers created by one of thegetMapper(String)
methods are wrapped in aNullIfExceptionMapper
. It is not possible to log or get information about the exception.- Parameters:
useNullIfException
- TRUE to return null, FALSE to throw an exception- Since:
- 2.4
-
getUseNullIfException
public boolean getUseNullIfException()- Since:
- 3.15
-
setIgnoreNonExistingColumns
public void setIgnoreNonExistingColumns(boolean ignoreNonExistingColumns) Specify if trying to create a mapper with one of thegetMapper(String)
methods for an expression which references a non-existing column should result in an exception or be ignored.- Parameters:
ignoreNonExistingColumns
- TRUE to ignore, or FALSE to throw an exception- Since:
- 2.6
-
getMapper
Get a mapper using the default number format.- See Also:
-
getMapper
-
getMapper
Get a mapper using the default number format.- Since:
- 2.4
- See Also:
-
getMapper
Get a mapper using a specific number format.- Since:
- 2.2
- See Also:
-
getMapper
- Since:
- 2.4
- See Also:
-
getDateMapper
Get a mapper using the default date format.- Since:
- 3.15
-
getTimestampMapper
Get a mapper using the default timestamp format.- Since:
- 3.15
-
getMapper
Get a mapper using the specified date format.- Since:
- 3.15
-
getMapper
public Mapper getMapper(String expression, NumberFormat numberFormat, boolean nullIfException, JepFunction... functions) -
getMapper
public Mapper getMapper(String expression, NumberFormat numberFormat, Formatter<Date> dateFormat, boolean nullIfException, JepFunction... functions) Create a mapper object that maps an expression string to a value. An expression string is a regular string which contains placeholders where the data column values will be inserted. For example:\1\ \row\ Row: \row\, Col:\col\
It is also possible to use expressions that are evaluated dynamically.=2 * col('Radius')
If no column that is matching the exact name is found the placeholder is interpreted as a regular expression which is checked against each of the column headers. In all cases, the first column header found is used if there are multiple matches.If the expression is null, a mapper returning en empty string is returned, unless the
setUseNullIfEmpty(boolean)
has been activated. In that case the mapper returns null.- Parameters:
expression
- The string containing the mapping expressionnumberFormat
- The number format the mapper should use for parsing numbers, or null to use Float.valueOf or Double.valueOfdateFormat
- The date format the mapper should use for parsing dates, or null to use Type.DATE.parseString()nullIfException
- TRUE to return a null value instead of throwing an exception when a value can't be parsed.functions
- Optional array with Jep functions that should be included in the parser- Returns:
- A mapper object
- Since:
- 3.15
-
hasMoreData
Check if the input stream contains more data. If it is unknown if there is more data or not, this method will start reading more lines from the stream. Each line is checked in the following order:- Does it match the
data footer
regular expression? - Does it match the
section
regular expression? - Does it match the
ignore
regular expression? - Can it be split by the
data
regular expression into the appropriate number of columns?
nextSection
method. If the third check is true, the line is ignored and the processing continues with the next line. If the fourth check is true, TRUE is returned and the data may be retrieved with thenextData
method.- Returns:
- TRUE if there is more data, FALSE otherwise
- Throws:
IOException
- If there is an error reading from the input stream- See Also:
- Does it match the
-
trimQuotes
Remove enclosing quotes (" or ') around all columns.- Parameters:
columns
- The columns- Returns:
- The trimmed columns
-
getParsedLines
public int getParsedLines()Get the number of parsed lines so far. -
getParsedSections
public int getParsedSections()Get the number of found sections so far.- Since:
- 3.15
-
getParsedDataLines
public int getParsedDataLines()Get the number of parsed data lines so far in the current section. This value is reset for each new section. -
getParsedCharacters
public long getParsedCharacters()Get the number of parsed characters so far. This value may or may not correspond to the number of parsed bytes depending on the character set of the file.- See Also:
-
getParsedBytes
public long getParsedBytes()Get the number of parsed bytes so far. This value may or may not correspond to the number of parsed characters depending on the character set of the file.- Since:
- 2.5.1
- See Also:
-
nextData
Get the next available data.- Returns:
- A
Data
object, or null if there is no more data - Throws:
IOException
- If the is an error reading from the input stream.- See Also:
-
getIgnoredLines
public int getIgnoredLines()Get the number of lines that the last call tonextData()
orhasMoreData()
ignored because they matched the ignore regular expression.- Returns:
- The number of ignored lines
- See Also:
-
getUnknownLines
public int getUnknownLines()Get the number of lines that the last call tonextData()
orhasMoreData()
ignored because they couldn't be interpreted as data lines.- Returns:
- The number of unknown lines
- See Also:
-
getNumSkippedLines
public int getNumSkippedLines()Get the number of lines that the last call tonextData()
orhasMoreData()
ignored because they matched the ignore regular expression or couldn't be interpreted as data lines.- Returns:
- The number of ignored or unknown lines
- See Also:
-
getSkippedLines
Get lines that was skipped during the last call tonextData()
orhasMoreData()
. The list is only available if thesetKeepSkippedLines(boolean)
has been set to true (default is false).- Returns:
- A list with the skipped lines
- See Also:
-
hasMoreSections
Check if the input stream contains more sections. If it is unknown if there is more sections or not, this method will start reading more lines from the stream. Each line is checked if it matches thesection
regular expression. The parser will continue util a section line is found or end of file is reached. If the metod return TRUE the section may be retrived with thenextSection()
method. If thesection
regular expression isn't specified the method returns FALSE and won't parse any line.- Returns:
- TRUE if there is more data, FALSE otherwise
- Throws:
IOException
- If there is an error reading from the input stream- See Also:
-
nextSection
Get the next line that matches thesection
regular expression.- Returns:
- The line that matched the regular expression
- Throws:
IOException
- See Also:
-