Package net.sf.basedb.util.charset
Class CharsetDetector
java.lang.Object
net.sf.basedb.util.charset.CharsetDetector
Utility class for testing if a text stream can be parsed using
a given character set. There are two sides of the testing:
The technical side which checks for invalid byte sequences, etc. This works well for
UTF-8 but it is, for example, not able to discriminate betwee different ISO-8859-? or
Windows-? encoding.
The content side which can check that the parsed content contains some expected text
strings. This can be used to discriminate between diffent ISO-8859-? or Windows-?
encoding by using careful choices of text strings to look for.
- Since:
- 3.15
- Author:
- nicklas
-
Field Summary
Modifier and TypeFieldDescriptionprivate final Charset
private final StringDetector
private long
private int
private IOException
-
Constructor Summary
ConstructorDescriptionCharsetDetector
(Charset charset) Create a detector for the given character set that only detects technical issues.CharsetDetector
(Charset charset, StringDetector lineTester) Create a detector for the given character set that uses technical an content-based detection. -
Method Summary
Modifier and TypeMethodDescriptionGet the character set this detector is configured to use.long
Get the number of bytes that the last test operation parsed.int
If the last test failed, get the exception that was thrown by the parser.boolean
testIt
(InputStream in) Test if the given input stream can be parsed with the configured character set.boolean
testIt
(InputStream in, long maxBytes, int maxLines) Test if the given input stream can be parsed with the configured character set.
-
Field Details
-
charset
-
lineTester
-
parsingFailure
-
parsedBytes
private long parsedBytes -
parsedLines
private int parsedLines
-
-
Constructor Details
-
CharsetDetector
Create a detector for the given character set that only detects technical issues. Useful for UTF-8. -
CharsetDetector
Create a detector for the given character set that uses technical an content-based detection. If no lineTester is given it will use only technical detection.
-
-
Method Details
-
getCharset
Get the character set this detector is configured to use. -
testIt
Test if the given input stream can be parsed with the configured character set. The stream is read until the end is reached or until there is a decoding failure. -
testIt
Test if the given input stream can be parsed with the configured character set. The stream is read until maxBytes bytes has been parsed or until there is a decoding failure.- Parameters:
maxBytes
- Max number of bytes to parse or -1 to not use a limitmaxLines
- Max number of lines to parse or -1 to not use a limit
-
getParsedBytes
public long getParsedBytes()Get the number of bytes that the last test operation parsed. -
getParsedLines
public int getParsedLines() -
getParsingFailure
If the last test failed, get the exception that was thrown by the parser.
-