Class XlsxToCsvUtil

java.lang.Object
net.sf.basedb.util.excel.XlsxToCsvUtil

public class XlsxToCsvUtil
extends Object
Utility class for converting Excel worksheets to CSV-formatted InputStreams. NOTE! If the Excel worksheet contains numbers and/or date values it is recommended that a number and/or date formatter is specified. If not, the values will be converted to text using the formatting specification from the Excel workbook which may depende on locale settings and other stuff (=the results may be unpredicatable). So far we only support for one formatter of each type. Future use cases may require different formatters for different columns, but we will wait with this until it is needed.
Since:
3.15
  • Field Details

    • XLSX_MIME_TYPE

      public static final String XLSX_MIME_TYPE
      The MIME type for Excel workbook files.
      See Also:
      Constant Field Values
    • charset

      private Charset charset
    • columnSeparator

      private String columnSeparator
    • rowSeparator

      private String rowSeparator
    • writeTrailingColumns

      private boolean writeTrailingColumns
    • evaluateFormulas

      private boolean evaluateFormulas
    • encoder

      private EncoderDecoder encoder
    • dateFormat

      private Formatter<Date> dateFormat
    • timestampFormat

      private Formatter<Date> timestampFormat
    • numberFormat

      private Formatter<Number> numberFormat
    • workbook

      private org.apache.poi.ss.usermodel.Workbook workbook
    • sheetNames

      private List<String> sheetNames
    • sheets

      private Map<String,​XlsxToCsvUtil.SheetInfo> sheets
  • Constructor Details

    • XlsxToCsvUtil

      public XlsxToCsvUtil()
      Create a new instance using defult settings.
  • Method Details

    • seemsLikeAnExcelFile

      public static boolean seemsLikeAnExcelFile​(File file)
      Check the MIME type and file extension to see if the file could be an Excel file. The file contents is not checked.
    • isAnExcelFile

      public static boolean isAnExcelFile​(InputStream in)
      Check if the given stream is an Excel file by looking at the first few bytes. The stream must support seek(). Call prepareForCheck(InputStream) before calling this method if it is not sure.
    • prepareForCheck

      public static InputStream prepareForCheck​(InputStream in)
      Prepare the input stream for checking with isAnExcelFile(InputStream). If the given stream supports seek() it is returned as it is, otherwise it is wrapped by a buffered stream.
    • getColumnSeparator

      public String getColumnSeparator()
      Get the current column separator. The default is {tab}.
    • setColumnSeparator

      public void setColumnSeparator​(String columnSeparator)
      Set the column separator.
    • getRowSeparator

      public String getRowSeparator()
      Get the current row separator. The default is {newline}.
    • setRowSeparator

      public void setRowSeparator​(String rowSeparator)
      Set the row separator.
    • getWriteTrailingColumns

      public boolean getWriteTrailingColumns()
      If TRUE, trailing empty columns are written to the CSV file so that all lines have the same number of columns. If FALSE, trailing emtpy columns are skipped. The default is TRUE.
    • setWriteTrailingColumns

      public void setWriteTrailingColumns​(boolean writeTrailingColumns)
    • getEvaluateFormulas

      public boolean getEvaluateFormulas()
      If TRUE, formulas are evaluated and the result is written to the CSV file. If FALSE, the formula expression is written. The default is FALSE.
    • setEvaluateFormulas

      public void setEvaluateFormulas​(boolean evaluateFormulas)
    • getCharset

      public Charset getCharset()
      Get the current character set. The default is UTF-8.
    • setCharset

      public void setCharset​(Charset charset)
      Set the character set to use for the text.
    • getEncoder

      public EncoderDecoder getEncoder()
      Get the current encoder. The default is the ToSpaceEncoderDecoder.
    • setEncoder

      public void setEncoder​(EncoderDecoder encoder)
      Set the encoder to use for converting text values so that they do not conflict with column or row separators.
    • getDateFormat

      public Formatter<Date> getDateFormat()
      Get the date formatter that is used for formatting cells with date-like values. If null, the default implementation is used (which may not be predicatable).
    • setDateFormat

      public void setDateFormat​(Formatter<Date> dateFormat)
      Set a date formatter to use for formatting cells with date-like values. If null, the default implementation is used (which may not be predicatable).
    • getTimestampFormat

      public Formatter<Date> getTimestampFormat()
      Get the date formatter that is used for formatting cells with timestamp-like values. If null, the default implementation is used (which may not be predicatable).
    • setTimestampFormat

      public void setTimestampFormat​(Formatter<Date> timestampFormat)
      Set a date formatter to use for formatting cells with timestamp-like values. If null, the default implementation is used (which may not be predicatable).
    • getNumberFormat

      public Formatter<Number> getNumberFormat()
      Get the formatter that is used for formatting cells with numeric values. If null, the default implementation is used (which may not be predicatable).
    • setNumberFormat

      public void setNumberFormat​(Formatter<Number> numberFormat)
      Set a formatter that is used for formatting cells with numeric values. If null, the default implementation is used (which may not be predicatable).
    • readWorkbook

      public void readWorkbook​(InputStream xlsx) throws IOException
      Read an Excel workbook from the stream.
      Throws:
      IOException
    • getWorkbook

      public org.apache.poi.ss.usermodel.Workbook getWorkbook()
      Get the current workbook.
    • getNumSheets

      public int getNumSheets()
      Get the number of sheets in the workbook.
    • getSheetNames

      public List<String> getSheetNames()
      Get the names of the sheets in the workbook.
    • getSheetAsCsv

      public XlsxToCsvUtil.SheetInfo getSheetAsCsv​(String name)
      Convert the given named sheet to a CSV InputStream.
    • getSheetAsCsv

      public XlsxToCsvUtil.SheetInfo getSheetAsCsv​(int index)
      Convert the given sheet to an CSV InputStream. The first sheet has index = 0.
    • getWorkbookAsCsv

      public InputStream getWorkbookAsCsv()
      Convert all sheets in the workbook to a single CSV-like stream. Each sheet is converted as if XlsxToCsvUtil.SheetInfo.parseToCsv() has been called. To separate the sheets each section is started with [Name-of-sheet] and then followed by the sheet data.
    • close

      public void close()
      Closes the current workbook. It is safe to do this before reading the CSV streams to the end.
    • hasTimePart

      private boolean hasTimePart​(String formatString)
    • hasDatePart

      private boolean hasDatePart​(String formatString)