Class XlsxToCsvUtil


  • public class XlsxToCsvUtil
    extends Object
    Utility class for converting Excel worksheets to CSV-formatted InputStreams. NOTE! If the Excel worksheet contains numbers and/or date values it is recommended that a number and/or date formatter is specified. If not, the values will be converted to text using the formatting specification from the Excel workbook which may depende on locale settings and other stuff (=the results may be unpredicatable). So far we only support for one formatter of each type. Future use cases may require different formatters for different columns, but we will wait with this until it is needed.
    Since:
    3.15
    • Constructor Detail

      • XlsxToCsvUtil

        public XlsxToCsvUtil()
        Create a new instance using defult settings.
    • Method Detail

      • seemsLikeAnExcelFile

        public static boolean seemsLikeAnExcelFile​(File file)
        Check the MIME type and file extension to see if the file could be an Excel file. The file contents is not checked.
      • isAnExcelFile

        public static boolean isAnExcelFile​(InputStream in)
        Check if the given stream is an Excel file by looking at the first few bytes. The stream must support seek(). Call prepareForCheck(InputStream) before calling this method if it is not sure.
      • prepareForCheck

        public static InputStream prepareForCheck​(InputStream in)
        Prepare the input stream for checking with isAnExcelFile(InputStream). If the given stream supports seek() it is returned as it is, otherwise it is wrapped by a buffered stream.
      • getColumnSeparator

        public String getColumnSeparator()
        Get the current column separator. The default is {tab}.
      • setColumnSeparator

        public void setColumnSeparator​(String columnSeparator)
        Set the column separator.
      • getRowSeparator

        public String getRowSeparator()
        Get the current row separator. The default is {newline}.
      • setRowSeparator

        public void setRowSeparator​(String rowSeparator)
        Set the row separator.
      • getWriteTrailingColumns

        public boolean getWriteTrailingColumns()
        If TRUE, trailing empty columns are written to the CSV file so that all lines have the same number of columns. If FALSE, trailing emtpy columns are skipped. The default is TRUE.
      • setWriteTrailingColumns

        public void setWriteTrailingColumns​(boolean writeTrailingColumns)
      • getEvaluateFormulas

        public boolean getEvaluateFormulas()
        If TRUE, formulas are evaluated and the result is written to the CSV file. If FALSE, the formula expression is written. The default is FALSE.
      • setEvaluateFormulas

        public void setEvaluateFormulas​(boolean evaluateFormulas)
      • getCharset

        public Charset getCharset()
        Get the current character set. The default is UTF-8.
      • setCharset

        public void setCharset​(Charset charset)
        Set the character set to use for the text.
      • getEncoder

        public EncoderDecoder getEncoder()
        Get the current encoder. The default is the ToSpaceEncoderDecoder.
      • setEncoder

        public void setEncoder​(EncoderDecoder encoder)
        Set the encoder to use for converting text values so that they do not conflict with column or row separators.
      • getDateFormat

        public Formatter<Date> getDateFormat()
        Get the date formatter that is used for formatting cells with date-like values. If null, the default implementation is used (which may not be predicatable).
      • setDateFormat

        public void setDateFormat​(Formatter<Date> dateFormat)
        Set a date formatter to use for formatting cells with date-like values. If null, the default implementation is used (which may not be predicatable).
      • getTimestampFormat

        public Formatter<Date> getTimestampFormat()
        Get the date formatter that is used for formatting cells with timestamp-like values. If null, the default implementation is used (which may not be predicatable).
      • setTimestampFormat

        public void setTimestampFormat​(Formatter<Date> timestampFormat)
        Set a date formatter to use for formatting cells with timestamp-like values. If null, the default implementation is used (which may not be predicatable).
      • getNumberFormat

        public Formatter<Number> getNumberFormat()
        Get the formatter that is used for formatting cells with numeric values. If null, the default implementation is used (which may not be predicatable).
      • setNumberFormat

        public void setNumberFormat​(Formatter<Number> numberFormat)
        Set a formatter that is used for formatting cells with numeric values. If null, the default implementation is used (which may not be predicatable).
      • getWorkbook

        public org.apache.poi.ss.usermodel.Workbook getWorkbook()
        Get the current workbook.
      • getNumSheets

        public int getNumSheets()
        Get the number of sheets in the workbook.
      • getSheetNames

        public List<String> getSheetNames()
        Get the names of the sheets in the workbook.
      • getSheetAsCsv

        public XlsxToCsvUtil.SheetInfo getSheetAsCsv​(int index)
        Convert the given sheet to an CSV InputStream. The first sheet has index = 0.
      • getWorkbookAsCsv

        public InputStream getWorkbookAsCsv()
        Convert all sheets in the workbook to a single CSV-like stream. Each sheet is converted as if XlsxToCsvUtil.SheetInfo.parseToCsv() has been called. To separate the sheets each section is started with [Name-of-sheet] and then followed by the sheet data.
      • close

        public void close()
        Closes the current workbook. It is safe to do this before reading the CSV streams to the end.
      • hasTimePart

        private boolean hasTimePart​(String formatString)
      • hasDatePart

        private boolean hasDatePart​(String formatString)