FlatFileParser (BASE 2.17.2 API documentation)

Overview

Package

Class

Tree

Deprecated

Index

Help

2.17.2: 2011-06-17

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

net.sf.basedb.util.parser
Class FlatFileParser

java.lang.Object
  net.sf.basedb.util.parser.FlatFileParser

public class FlatFileParser
extends Object
extends Object

This class can be used to parse data from flat text files. The text file must follow a few simple rules:

Data must be organised into columns, with one record per line
Each data column must be separated by some special character or character sequence not occuring in the data, for example a tab or a comma. Data in fixed-size columns cannot be parsed.
Data may optionally be preceeded by a data header, ie. the names of the columns
The data header may optionally be preceeded by file headers. A file header is something that can be split in a name-value pair.
The file may contain comments, which are ignored by the parser
The file contain section where each section can contain a header and/or a data part

Example

# Example of a parsable file, not actual format of a GenePix file
section info
Type=GenePix Results 1.3
DateTime=2002/09/04 13:59:48
Scanner=GenePix 4000B [83306]

section data
Block   Column  Row     Name    ID
1       1       1       "Ly68_Lymphocyte antigen 68"    "M000205_01"
1       2       1       "Bag1_Bcl2-associated athanogene 1"     "M000209_01"
1       3       1       "Rps16_Ribosomal protein S16"   "M000213_01"
1       4       1       "Col4a1_Procollagen, type IV, alpha 1"  "M000229_01"
1       5       1       "Ace_Angiotensin converting enzyme"     "M000233_01"
1       6       1       "Cd5_CD5 antigen"       "M000237_01"
1       7       1       "Psme1_Protease (prosome, macropain) 28 s"

How to use
The parsing is controlled by regular expressions. Start by creating a new FlatFileParser object. Use the various set methods to provide regular expression used to match the data/headers.

Use the setInputStream(InputStream, String) method to specify a file to parse, and parseHeaders() to start the parsing. Note! Even if you know that the file doesn't contain any headers, you should always call this method since the parser must initialize itself. If there are sections in the file use nextSection() first to control which section you are parsing from.

When the headers have been found use the hasMoreData() and nextData() methods in a loop to read all data from the section.

Example

FlatFileParser ffp = new FlatFileParser();
ffp.setHeaderRegexp(Pattern.compile("(.*)=(.*)"));
ffp.setDataHeaderRegexp(Pattern.compile("Block\\tColumn\\tRow\\tName\\tID"));
ffp.setDataSplitterRegexp(Pattern.compile("\\t"));
ffp.setIgnoreRegexp(Pattern.compile("#.*"));
ffp.setMinDataColumns(5);
ffp.setMaxDataColumns(5);
ffp.setInputStream(FileUtil.getInputStream(path_to_file), Config.getCharset());
ffp.parseHeaders();
for (int i = 0; i < ffp.getLineCount(); i++)
{
   FlatFileParser.Line line = ffp.getLine(i);
   System.out.println(i+":"+line.type()+":"+line.line());
}
int i = 0;
while (ffp.hasMoreData())
{
   FlatFileParser.Data data = ffp.nextData();
   System.out.println(i+":"+data.columns()+":"+data.line());
}

Mapping column values
With the FlatFileParser.Data object you can only access the data by column index (0-based) and all values are returned as strings. Another approach is to use Mapper:s. A mapper takes a string template and inserts the values of the data columns where you specify. Here are some example:

\1\
\row\
Row: \row\, Col:\col\
=2 * col('Radius')

The result can be retrieved either as a string or as a numeric value. It is even possible to create expressions that does a calculation on the value before it is returned. See the getMapper(String) method for more information.

Version:: 2.0
Author:: Nicklas, Enell
Last modified: $Date$

Nested Class Summary
`static class`	`FlatFileParser.Data` This class holds data about a line parsed by the `hasMoreData()` method.
`static class`	`FlatFileParser.Line` This class holds data about a line parsed by the `parseHeaders()` method.
`static class`	`FlatFileParser.LineType` Represents the type of a line matched or unmatched by the parser.

Field Summary
`private Pattern`	`bofMarker` The regular expression for matching the beginning-of-file marker
`private String`	`bofType` The value that was captured by the bofMarker pattern.
`private List<String>`	`columnHeaders` List of the column names found by splitting the data header using the data splitter regexp.
`private Pattern`	`dataFooter` The regular expression for matching the data footer line.
`private Pattern`	`dataHeader` The regular expression for matching the data header line.
`private Pattern`	`dataSplitter` The regular expression for splitting a data line.
`static int`	`DEFAULT_MAX_UNKNOWN_LINES` The default value for the number of unknown lines in a row that may be encountered by the `parseHeaders` method before it gives up.
`private boolean`	`emptyIsNull` If `null` should be returned for empty columns (instead of an empty string).
`private static Pattern`	`findColumn` Pattern used to find column mappings in a string, ie. abc \col\ def
`private Pattern`	`header` The regular expression for matching a header line.
`private Map<String,String>`	`headers` Map of header lines parsed by the `parseHeaders()` method.
`private Pattern`	`ignore` The regular expression for matching a comment line.
`private int`	`ignoredLines` Number of ignored lines in the `nextData()` method.
`private boolean`	`ignoreNonExistingColumns` If non-existing columns should be ignored (true) or result in an exception (false)
`private boolean`	`keepSkippedLines` If unknown or ignored lines should be kept.
`private List<FlatFileParser.Line>`	`lines` List of lines parsed by the `parseHeaders()` method.
`private int`	`maxDataColumns` The maximun number of allowed data columns for a line to be considered a data line.
`private int`	`maxUnknownLines` The maximum number of unkown lines to parse before giving up.
`private int`	`minDataColumns` The minimun number of allowed data columns for a line to be considered a data line.
`private FlatFileParser.Data`	`nextData` The next available data line as parsed by the `hasMoreData()` method.
`private FlatFileParser.Line`	`nextSection` The line that last matched the `section`.
`private boolean`	`nullIsNull` If `null` should be returned for the string NULL (ignoring case) or not.
`private NumberFormat`	`numberFormat` The default number formatter to use for creating mappers.
`private long`	`parsedCharacters` The total number of parsed characters so far.
`private int`	`parsedDataLines` The number of data lines parsed in the current section so far.
`private int`	`parsedLines` The total number of lines parsed so far.
`private BufferedReader`	`reader` Reads from the given input stream
`private Pattern`	`section` The regular expression for matching the fist line of a section.
`private List<FlatFileParser.Line>`	`skippedLines` List for keeping ignored and unknown lines in the `nextData()` method.
`private InputStreamTracker`	`tracker` For keeping track of the number of bytes parsed.
`private boolean`	`trimQuotes` If quotes should be trimmed from data values or not.
`private int`	`unknownLines` Number of unknown lines in the `nextData()` method.
`private boolean`	`useNullIfException` If `null` should be returned if a (numeric) value can't be parsed.

Constructor Summary
`FlatFileParser()` Create a new `FlatFileParser` object.

Method Summary
`private String`	`convertToNull(String value)`
`Integer`	`findColumnHeaderIndex(String regex)` Find the index of a column header using a regular expression for pattern matching.
`String`	`getBofType()` Get the value captured by the BOF marker regular expression.
`Integer`	`getColumnHeaderIndex(String name)` Get the index of a column header with a given name.
`List<String>`	`getColumnHeaders()` Get all column headers that were found by splitting the line matching the `setDataHeaderRegexp(Pattern)` pattern using the `setDataSplitterRegexp(Pattern)` pattern.
`NumberFormat`	`getDefaultNumberFormat()` Get the default number format.
`String`	`getHeader(String name)` Get the value of the header with the specified name.
`Set<String>`	`getHeaderNames()` Get the names of all headers found by the `parseHeaders()` method.
`int`	`getIgnoredLines()` Get the number of lines that the last call to `nextData()` or `hasMoreData()` ignored because they matched the ignore regular expression.
`FlatFileParser.Line`	`getLine(int index)` Get the line with the specified number.
`int`	`getLineCount()` Get the number of lines that the `parseHeaders()` method parsed.
`List<FlatFileParser.Line>`	`getLines()` Get the lines read by `parseHeaders()`.
`Mapper`	`getMapper(String expression)` Get a mapper using the default number format.
`Mapper`	`getMapper(String expression, boolean nullIfException)` Get a mapper using the default number format.
`Mapper`	`getMapper(String expression, NumberFormat numberFormat)` Get a mapper using a specific number format.
`Mapper`	`getMapper(String expression, NumberFormat numberFormat, boolean nullIfException)` Create a mapper object that maps an expression string to a value.
`int`	`getNumSkippedLines()` Get the number of lines that the last call to `nextData()` or `hasMoreData()` ignored because they matched the ignore regular expression or couldn't be interpreted as data lines.
`long`	`getParsedBytes()` Get the number of parsed bytes so far.
`long`	`getParsedCharacters()` Get the number of parsed characters so far.
`int`	`getParsedDataLines()` Get the number of parsed data lines so far in the current section.
`int`	`getParsedLines()` Get the number of parsed lines so far.
`List<FlatFileParser.Line>`	`getSkippedLines()` Get lines that was skipped during the last call to `nextData()` or `hasMoreData()`.
`int`	`getUnknownLines()` Get the number of lines that the last call to `nextData()` or `hasMoreData()` ignored because they couldn't be interpreted as data lines.
`boolean`	`hasMoreData()` Check if the input stream contains more data.
`boolean`	`hasMoreSections()` Check if the input stream contains more sections.
`FlatFileParser.Data`	`nextData()` Get the next available data.
`FlatFileParser.Line`	`nextSection()` Get the next line that matches the `section` regular expression.
`FlatFileParser.LineType`	`parseHeaders()` Start parsing the input stream.
`boolean`	`parseToBof()` Parse the file until the beginning-of-file marker is found.
`void`	`setBofMarkerRegexp(Pattern regexp)` Set a regular expression that maches a beginning-of-file marker.
`void`	`setDataFooterRegexp(Pattern regexp)` Set a regular expression that can be matched against a data footer.
`void`	`setDataHeaderRegexp(Pattern regexp)` Set a regular expression that can be matched against the data header.
`void`	`setDataSplitterRegexp(Pattern regexp)` Set a regular expression that is used to split a data line into columns.
`void`	`setDefaultNumberFormat(NumberFormat numberFormat)` Set the default number format to use when creating mappers.
`void`	`setHeaderRegexp(Pattern regexp)` Set a regular expression that can be matched against a header.
`void`	`setIgnoreNonExistingColumns(boolean ignoreNonExistingColumns)` Specify if trying to create a mapper with one of the `getMapper(String)` methods for an expression which references a non-existing column should result in an exception or be ignored.
`void`	`setIgnoreRegexp(Pattern regexp)` Set a regular expression that is used to match a line that should be ignored.
`void`	`setInputStream(InputStream in)` Deprecated. Use `setInputStream(InputStream, String)` instead
`void`	`setInputStream(InputStream in, String charsetName)` Set the input stream that will be parsed.
`void`	`setKeepSkippedLines(boolean keep)` If the `nextData()` and `hasMoreData()` methods should keep information of lines that was skipped because they matched the ignore pattern or could be interpreted as data lines.
`void`	`setMaxDataColumns(int columns)` Set the maximum number of columns a data line can contain in order for it to be counted as a data line.
`void`	`setMaxUnknownLines(int lines)` The number of unknown lines in a row that can be parsed by the `parseHeaders` method before it gives up.
`void`	`setMinDataColumns(int columns)` Set the minimum number of columns a data line must contain in order for it to be counted as a data line.
`void`	`setSectionRegexp(Pattern regexp)` Set a regular expression that can be matched against the section line.
`void`	`setTrimQuotes(boolean trimQuotes)` Set if quotes around each data value should be removed or not.
`void`	`setUseNullIfEmpty(boolean emptyIsNull)` Specify if `null` values should be returned instead of empty strings for columns that doesn't contain any value.
`void`	`setUseNullIfException(boolean useNullIfException)` Specify if `null` should be returned if a (numeric) value can't be parsed.
`void`	`setUseNullIfNull(boolean nullIsNull)` Specify if `null` values should be returned for strings having the value "NULL" (ignoring case).
`String[]`	`trimQuotes(String[] columns)` Remove enclosing quotes (" or ') around all columns.

Methods inherited from class java.lang.Object
`clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Field Detail

DEFAULT_MAX_UNKNOWN_LINES

public static final int DEFAULT_MAX_UNKNOWN_LINES

The default value for the number of unknown lines in a row that may be encountered by the parseHeaders method before it gives up.

See Also:: setMaxUnknownLines, Constant Field Values

findColumn

private static final Pattern findColumn

Pattern used to find column mappings in a string, ie. abc \col\ def

reader

private BufferedReader reader

Reads from the given input stream

tracker

private InputStreamTracker tracker

For keeping track of the number of bytes parsed.

bofMarker

private Pattern bofMarker

The regular expression for matching the beginning-of-file marker

header

private Pattern header

The regular expression for matching a header line.

section

private Pattern section

The regular expression for matching the fist line of a section. The expression must have one capturing group.

dataHeader

private Pattern dataHeader

The regular expression for matching the data header line. The expression must have two capturing groups.

dataSplitter

private Pattern dataSplitter

The regular expression for splitting a data line.

trimQuotes

private boolean trimQuotes

If quotes should be trimmed from data values or not. Default true. Quotes are double or single quotes.

dataFooter

private Pattern dataFooter

The regular expression for matching the data footer line.

minDataColumns

private int minDataColumns

The minimun number of allowed data columns for a line to be considered a data line.

maxDataColumns

private int maxDataColumns

The maximun number of allowed data columns for a line to be considered a data line.

ignore

private Pattern ignore

The regular expression for matching a comment line.

maxUnknownLines

private int maxUnknownLines

The maximum number of unkown lines to parse before giving up.

emptyIsNull

private boolean emptyIsNull

If null should be returned for empty columns (instead of an empty string).

useNullIfException

private boolean useNullIfException

If null should be returned if a (numeric) value can't be parsed.

ignoreNonExistingColumns

private boolean ignoreNonExistingColumns

If non-existing columns should be ignored (true) or result in an exception (false)

nullIsNull

private boolean nullIsNull

If null should be returned for the string NULL (ignoring case) or not.

numberFormat

private NumberFormat numberFormat

The default number formatter to use for creating mappers.

bofType

private String bofType

The value that was captured by the bofMarker pattern.

lines

private List<FlatFileParser.Line> lines

List of lines parsed by the parseHeaders() method.

parsedLines

private int parsedLines

The total number of lines parsed so far.

parsedCharacters

private long parsedCharacters

The total number of parsed characters so far.

parsedDataLines

private int parsedDataLines

The number of data lines parsed in the current section so far. This value is reset at each new section.

headers

private Map<String,String> headers

Map of header lines parsed by the parseHeaders() method. The map contains name -> value pairs

columnHeaders

private List<String> columnHeaders

List of the column names found by splitting the data header using the data splitter regexp.

nextSection

private FlatFileParser.Line nextSection

The line that last matched the section.

nextData

private FlatFileParser.Data nextData

The next available data line as parsed by the hasMoreData() method.

ignoredLines

private int ignoredLines

Number of ignored lines in the nextData() method.

unknownLines

private int unknownLines

Number of unknown lines in the nextData() method.

keepSkippedLines

private boolean keepSkippedLines

If unknown or ignored lines should be kept.

See Also:: getSkippedLines()

skippedLines

private List<FlatFileParser.Line> skippedLines

List for keeping ignored and unknown lines in the nextData() method.

Constructor Detail

FlatFileParser

public FlatFileParser()

Create a new FlatFileParser object.

Method Detail

setBofMarkerRegexp

public void setBofMarkerRegexp(Pattern regexp)

Set a regular expression that maches a beginning-of-file marker. This property should be set before starting to parse the file (otherwise it is ignored). The first method call that causes the parsing to be started will invoke parseToBof() (can also be invoked manually).

The regular expression may contain a single capturing group. The matched value is returned by getBofType().

Parameters:: regexp - A regular expression
Since:: 2.15

setHeaderRegexp

public void setHeaderRegexp(Pattern regexp)

Set a regular expression that can be matched against a header. The regular expression must contain two capturing groups, the first should capture the name and the second the value of the header. For example, the file contains headers like:

"Type=GenePix Results 1.3"
"DateTime=2002/09/04 13:59:48"

To match this we can use the following regular expression: "(.*)=(.*)".

Parameters:: regexp - A regular expression

setSectionRegexp

public void setSectionRegexp(Pattern regexp)

Set a regular expression that can be matched against the section line. For example, the file contains a section like:

[FileInformation]

To match this we can use the following regular expression: section (.*). This will match to anything that starts with "section ". The section name will be in the capturing group.

Parameters:: regexp - A regular expression

setDataHeaderRegexp

public void setDataHeaderRegexp(Pattern regexp)

Set a regular expression that can be matched against the data header. For example, the file contains a data header like:

"Block"{tab}"Column"{tab}"Row"{tab}"Name"{tab}"ID" ...and so on

To match this we can use the following regular expression: "(.*?)"(\t"(.*?)"). This will match to anything that has at least two columns. We could also be more specific and use: "Block"\t"Column"\t"Row"\t"Name"\t"ID"...

Parameters:: regexp - A regular expression

setDataSplitterRegexp

public void setDataSplitterRegexp(Pattern regexp)

Set a regular expression that is used to split a data line into columns. To split on tabs we use: \t. This regular expression is also used to split the data header line into column names, which can then be used in the getMapper(String) method.

Parameters:: regexp - A regular expression
See Also:: setMinDataColumns, setMaxDataColumns

setTrimQuotes

public void setTrimQuotes(boolean trimQuotes)

Set if quotes around each data value should be removed or not. A quote is either a double quote (") or a single quote ('). The default setting of this option is true.

Parameters:: trimQuotes - TRUE to remove quotes, FALSE to keep them

setMinDataColumns

public void setMinDataColumns(int columns)

Set the minimum number of columns a data line must contain in order for it to be counted as a data line.

Parameters:: columns - The minimum number of columns

setMaxDataColumns

public void setMaxDataColumns(int columns)

Set the maximum number of columns a data line can contain in order for it to be counted as a data line.

Parameters:: columns - The maximum number of columns, or 0 for an unlimited number, or -1 to disable counting the number of columns

setDataFooterRegexp

public void setDataFooterRegexp(Pattern regexp)

Set a regular expression that can be matched against a data footer. If a line matching this pattern is found while looking for data with the hasMoreData method it will exit and no more data will be returned.

Parameters:: regexp - A regular expression

setIgnoreRegexp

public void setIgnoreRegexp(Pattern regexp)

Set a regular expression that is used to match a line that should be ignored. For example, the file may contain comments starting with a #: \#.*

Parameters:: regexp - A regular expression

setMaxUnknownLines

public void setMaxUnknownLines(int lines)

The number of unknown lines in a row that can be parsed by the parseHeaders method before it gives up. The default value is specified by {#link #DEFAULT_MAX_UNKNOWN_LINES}. This value is ignored while parsing data.

Parameters:: lines - The number of lines

setUseNullIfEmpty

public void setUseNullIfEmpty(boolean emptyIsNull)

Specify if null values should be returned instead of empty strings for columns that doesn't contain any value.

Parameters:: emptyIsNull - TRUE to return null, FALSE to return an empty string

setUseNullIfNull

public void setUseNullIfNull(boolean nullIsNull)

Specify if null values should be returned for strings having the value "NULL" (ignoring case).

Parameters:: nullIsNull - TRUE to return null, FALSE to return the original string value

setKeepSkippedLines

public void setKeepSkippedLines(boolean keep)

If the nextData() and hasMoreData() methods should keep information of lines that was skipped because they matched the ignore pattern or could be interpreted as data lines. The default is FALSE. The number of lines that was skipped is always available regardless of this setting.

Parameters:: keep - TRUE to keep line information, FALSE to not
See Also:: getSkippedLines(), getIgnoredLines(), getUnknownLines(), getNumSkippedLines()

setInputStream

public void setInputStream(InputStream in)

Deprecated. Use setInputStream(InputStream, String) instead

setInputStream

public void setInputStream(InputStream in,
                           String charsetName)

Set the input stream that will be parsed.

Parameters:: in - The InputStream; charsetName - The name of the character set to use when parsing the file, or null to use the default charset specified by Config.getCharset()
Since:: 2.1.1

parseToBof

public boolean parseToBof()
                   throws IOException

Parse the file until the beginning-of-file marker is found. If no regular expression has been set with setBofMarkerRegexp(Pattern) or if the parsing of the file has already started, this method call is ignored.

Returns:: TRUE if this call resulted in parsing and the BOF marker was found, FALSE otherwise
Throws:: IOException
Since:: 2.15

getBofType

public String getBofType()

Get the value captured by the BOF marker regular expression. If no capturing groups was specified in the pattern this value is the string that matched the entire pattern.

Returns:: The matched value, or null if BOF matching has not been done
Since:: 2.15

parseHeaders

public FlatFileParser.LineType parseHeaders()
                                     throws IOException

Start parsing the input stream. The parser will read a single line at a time. Each line is checked in the following order:

Does it match the section regular expression?
Does it match the header regular expression?
Does it match the data header regular expression?
Does it match the comment regular expression?
Can it be split by the data regular expression into the appropriate number of columns?

The first expression that matches stops the processing of that line. If the line matched a header or comment the parser continues with the next line. If the line matched the data header or data, the method returns. If none of the above is true the line is recorded as FlatFileParser.LineType.UNKNOWN and processing is continued with the next line. If too many unkown lines in a row has been found the method also returns. This should be considered as a failure to parse the specified file.

The method returns the type of the last line that was parsed as follows:

FlatFileParser.LineType.SECTION: The last line was a section. Header, data header or data may follow this line.
FlatFileParser.LineType.DATA_HEADER: The last line was the data header. It is expected that data should follow.
FlatFileParser.LineType.DATA: The last line was a data line. More data may follow.
FlatFileParser.LineType.UNKNOWN: The last line was of unknown format. The file could not be parsed.

Returns:: The FlatFileParser.LineType of the last parsed line
Throws:: IOException - If reading the file fails.

convertToNull

private String convertToNull(String value)

getHeaderNames

public Set<String> getHeaderNames()

Get the names of all headers found by the parseHeaders() method. To get the value of a header, use the getHeader(String) method.

getHeader

public String getHeader(String name)

Get the value of the header with the specified name. This method should only be used after parseHeaders() has been completed.

Parameters:: name - The name of the header
Returns:: The value of the header, or null if it was not found
See Also:: getLine(int)

getLineCount

public int getLineCount()

Get the number of lines that the parseHeaders() method parsed.

Returns:: The number of lines parsed

getLine

public FlatFileParser.Line getLine(int index)

Get the line with the specified number. This method should only be used after parseHeaders() has been completed.

Parameters:: index - The line number, starting at 0
Returns:: A Line object
See Also:: getHeader(String)

getLines

public List<FlatFileParser.Line> getLines()

Get the lines read by parseHeaders().

Returns:: The lines in the order that they have been read.

getColumnHeaders

public List<String> getColumnHeaders()

Get all column headers that were found by splitting the line matching the setDataHeaderRegexp(Pattern) pattern using the setDataSplitterRegexp(Pattern) pattern. This method should only be called after parseHeaders() has been called.

Returns:: A list containing the column headers, or null if no headers have been found

getColumnHeaderIndex

public Integer getColumnHeaderIndex(String name)

Get the index of a column header with a given name. This method should only be called after parseHeaders() has been called. If more than one header with the same name exists the index of the first is returned.

Parameters:: name - The name of the column header
Returns:: The index, or null if no header with that name exists
See Also:: findColumnHeaderIndex(String)

findColumnHeaderIndex

public Integer findColumnHeaderIndex(String regex)

Find the index of a column header using a regular expression for pattern matching. This method should only be called after parseHeaders() has been called. If more than one header matches the regular expression only the first one found is returned.

Parameters:: regex - The regular expression used to match the header names
Returns:: The index, or null if no header is matching the regular expression or if the string is not a valid regular expression
Since:: 2.5
See Also:: getColumnHeaderIndex(String)

setDefaultNumberFormat

public void setDefaultNumberFormat(NumberFormat numberFormat)

Set the default number format to use when creating mappers.

Parameters:: numberFormat - The number format to use, or null to parse numbers with Float.valueOf or Double.valueOf
Since:: 2.2
See Also:: getMapper(String), getMapper(String, NumberFormat)

getDefaultNumberFormat

public NumberFormat getDefaultNumberFormat()

Get the default number format.

Returns:: The number format, or null if none has been specified
Since:: 2.2

setUseNullIfException

public void setUseNullIfException(boolean useNullIfException)

Specify if null should be returned if a (numeric) value can't be parsed. If this setting is set to TRUE all mappers created by one of the getMapper(String) methods are wrapped in a NullIfExceptionMapper. It is not possible to log or get information about the exception.

Parameters:: useNullIfException - TRUE to return null, FALSE to throw an exception
Since:: 2.4

setIgnoreNonExistingColumns

public void setIgnoreNonExistingColumns(boolean ignoreNonExistingColumns)

Specify if trying to create a mapper with one of the getMapper(String) methods for an expression which references a non-existing column should result in an exception or be ignored.

Parameters:: ignoreNonExistingColumns - TRUE to ignore, or FALSE to throw an exception
Since:: 2.6

getMapper

public Mapper getMapper(String expression)

Get a mapper using the default number format.

See Also:: getMapper(String, NumberFormat, boolean)

getMapper

public Mapper getMapper(String expression,
                        boolean nullIfException)

Get a mapper using the default number format.

Since:: 2.4
See Also:: getMapper(String, NumberFormat, boolean)

getMapper

public Mapper getMapper(String expression,
                        NumberFormat numberFormat)

Get a mapper using a specific number format.

Since:: 2.2
See Also:: getMapper(String, NumberFormat, boolean)

getMapper

public Mapper getMapper(String expression,
                        NumberFormat numberFormat,
                        boolean nullIfException)

Create a mapper object that maps an expression string to a value. An expression string is a regular string which contains placeholders where the data column values will be inserted. For example:

\1\
\row\
Row: \row\, Col:\col\

It is also possible to use expressions that are evaluated dynamically.

=2 * col('Radius')

If no column that is matching the exact name is found the placeholder is interpreted as a regular expression which is checked against each of the column headers. In all cases, the first column header found is used if there are multiple matches.

If the expression is null, a mapper returning en empty string is returned, unless the setUseNullIfEmpty(boolean) has been activated. In that case the mapper returns null.

Parameters:: expression - The string containing the mapping expression; numberFormat - The number format the mapper should use for parsing numbers, or null to use Float.valueOf or Double.valueOf; nullIfException - TRUE to return a null value instead of throwing an exception when a value can't be parsed.
Returns:: A mapper object
Since:: 2.4

hasMoreData

public boolean hasMoreData()
                    throws IOException

Check if the input stream contains more data. If it is unknown if there is more data or not, this method will start reading more lines from the stream. Each line is checked in the following order:

Does it match the ignore regular expression?
Does it match the data footer regular expression?
Does it match the section regular expression?
Can it be split by the data regular expression into the appropriate number of columns?

If the first check is true, TRUE is returned and the data may be retrieved with the nextData method. If the second check is true, FALSE is returned and no more data may be retrieved. If the third check is true, FALSE is returnd and no more data may be retrived but the section may be retrived with the nextSection method. If neither one is true, the processing continues with the next line until the end of file is reached.

Returns:: TRUE if there is more data, FALSE otherwise
Throws:: IOException - If there is an error reading from the input stream
See Also:: nextData

trimQuotes

public String[] trimQuotes(String[] columns)

Remove enclosing quotes (" or ') around all columns.

Parameters:: columns - The columns
Returns:: The trimmed columns

getParsedLines

public int getParsedLines()

Get the number of parsed lines so far.

getParsedDataLines

public int getParsedDataLines()

Get the number of parsed data lines so far in the current section. This value is reset for each new section.

getParsedCharacters

public long getParsedCharacters()

Get the number of parsed characters so far. This value may or may not correspond to the number of parsed bytes depending on the character set of the file.

See Also:: getParsedBytes()

getParsedBytes

public long getParsedBytes()

Get the number of parsed bytes so far. This value may or may not correspond to the number of parsed characters depending on the character set of the file.

Since:: 2.5.1
See Also:: getParsedCharacters()

nextData

public FlatFileParser.Data nextData()
                             throws IOException

Get the next available data.

Returns:: A Data object, or null if there is no more data
Throws:: IOException - If the is an error reading from the input stream.
See Also:: hasMoreData

getIgnoredLines

public int getIgnoredLines()

Get the number of lines that the last call to nextData() or hasMoreData() ignored because they matched the ignore regular expression.

Returns:: The number of ignored lines
See Also:: setIgnoreRegexp(Pattern), setKeepSkippedLines(boolean)

getUnknownLines

public int getUnknownLines()

Get the number of lines that the last call to nextData() or hasMoreData() ignored because they couldn't be interpreted as data lines.

Returns:: The number of unknown lines
See Also:: setKeepSkippedLines(boolean)

getNumSkippedLines

public int getNumSkippedLines()

Get the number of lines that the last call to nextData() or hasMoreData() ignored because they matched the ignore regular expression or couldn't be interpreted as data lines.

Returns:: The number of ignored or unknown lines
See Also:: getIgnoredLines(), getUnknownLines(), getSkippedLines()

getSkippedLines

public List<FlatFileParser.Line> getSkippedLines()

Get lines that was skipped during the last call to nextData() or hasMoreData(). The list is only available if the setKeepSkippedLines(boolean) has been set to true (default is false).

Returns:: A list with the skipped lines
See Also:: setKeepSkippedLines(boolean)

hasMoreSections

public boolean hasMoreSections()
                        throws IOException

Check if the input stream contains more sections. If it is unknown if there is more sections or not, this method will start reading more lines from the stream. Each line is checked if it matches the section regular expression. The parser will continue util a section line is found or end of file is reached. If the metod return TRUE the section may be retrived with the nextSection() method. If the section regular expression isn't specified the method returns FALSE and won't parse any line.

Returns:: TRUE if there is more data, FALSE otherwise
Throws:: IOException - If there is an error reading from the input stream
See Also:: nextData()

nextSection

public FlatFileParser.Line nextSection()
                                throws IOException

Get the next line that matches the section regular expression.

Returns:: The line that matched the regular expression
Throws:: IOException
See Also:: hasMoreSections()

Overview

Package

Class

Tree

Deprecated

Index

Help

2.17.2: 2011-06-17

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

net.sf.basedb.util.parser Class FlatFileParser

DEFAULT_MAX_UNKNOWN_LINES

findColumn

reader

tracker

bofMarker

header

section

dataHeader

dataSplitter

trimQuotes

dataFooter

minDataColumns

maxDataColumns

ignore

maxUnknownLines

emptyIsNull

useNullIfException

ignoreNonExistingColumns

nullIsNull

numberFormat

bofType

lines

parsedLines

parsedCharacters

parsedDataLines

headers

columnHeaders

nextSection

nextData

ignoredLines

unknownLines

keepSkippedLines

skippedLines

FlatFileParser

setBofMarkerRegexp

setHeaderRegexp

setSectionRegexp

setDataHeaderRegexp

setDataSplitterRegexp

setTrimQuotes

setMinDataColumns

setMaxDataColumns

setDataFooterRegexp

setIgnoreRegexp

setMaxUnknownLines

setUseNullIfEmpty

setUseNullIfNull

setKeepSkippedLines

setInputStream

setInputStream

parseToBof

getBofType

parseHeaders

convertToNull

getHeaderNames

getHeader

getLineCount

getLine

getLines

getColumnHeaders

getColumnHeaderIndex

findColumnHeaderIndex

setDefaultNumberFormat

getDefaultNumberFormat

setUseNullIfException

setIgnoreNonExistingColumns

getMapper

getMapper

getMapper

getMapper

hasMoreData

trimQuotes

getParsedLines

getParsedDataLines

getParsedCharacters

getParsedBytes

nextData

getIgnoredLines

getUnknownLines

net.sf.basedb.util.parser
Class FlatFileParser