Class SimpleStringDetector

All Implemented Interfaces:

public class SimpleStringDetector
extends Object
implements StringDetector
A simple string detector implementation that works with two strings. It is designed to be used to detect the encoding in tabular data files using one of the ISO-8859-N encodings or similar that are not not possible to separate techically. The data file is expected to contain a header line were at least one header column has a name with non-ASCII characters. For each line of data if will first check if the 'ifFound' string can be found. If not, it will return null to request more data. If the 'ifFound' string is found, it will continue to see if the 'thenMatch' string is also present. If so, TRUE is returned to indicate a successful encoding match, otherwise FALSE is return to indicate an incorrect encoding. Note that the two strings need to be selected wisely. The 'ifFound' string should typcially be an ASCII-only string and 'thenMatch' a string with one or more non-ASCII characters. For example, if the file header is: Namn{tab}Ålder, we could use 'ifFound=Namn' and 'thenMatch=Ålder'. If the entire file is parsed without finding the 'ifFound' string, the eof(int) method will return false.
  • Field Details

    • ifFound

      private final String ifFound
    • thenMatch

      private final String thenMatch
  • Constructor Details

    • SimpleStringDetector

      public SimpleStringDetector​(String ifFound, String thenMatch)
  • Method Details

    • checkLine

      public boolean checkLine​(int lineNo, String line) throws IOException
      Description copied from interface: StringDetector
      Check the given line. The detector should return TRUE if it can be certain that the file has been decoded correctly. If it can be sure that the file has been decoded incorrecty it should throw an IOException. If the detector is not sure without more data, it should return false.
      Specified by:
      checkLine in interface StringDetector
    • eof

      public void eof​(int parsedLines) throws IOException
      Description copied from interface: StringDetector
      This is called when the end of file has been reached and the checkLine method has returned false for all lines. If this is considered to be an incorrect decoding condition, the detector should throw an IOException, otherwise it should simply return. Note that this method is not called if TRUE is returned from the checkLine method.
      Specified by:
      eof in interface StringDetector