Package net.sf.basedb.util.excel
Class XlsxToCsvUtil
java.lang.Object
net.sf.basedb.util.excel.XlsxToCsvUtil
Utility class for converting Excel worksheets to CSV-formatted InputStreams.
NOTE! If the Excel worksheet contains numbers and/or date values it is recommended
that a number and/or date formatter is specified. If not, the values will be converted
to text using the formatting specification from the Excel workbook which may depende on
locale settings and other stuff (=the results may be unpredicatable). So far we only
support for one formatter of each type. Future use cases may require different formatters
for different columns, but we will wait with this until it is needed.
- Since:
- 3.15
-
Nested Class Summary
Modifier and TypeClassDescription(package private) static class
class
Represents a single sheet in a workbook.(package private) class
Input stream implementation that combines streams from all sheets in the current workbook. -
Field Summary
Modifier and TypeFieldDescriptionprivate Charset
private String
private EncoderDecoder
private boolean
private String
private Map<String,
XlsxToCsvUtil.SheetInfo> private org.apache.poi.ss.usermodel.Workbook
private boolean
static final String
The MIME type for Excel workbook files. -
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionvoid
close()
Closes the current workbook.Get the current character set.Get the current column separator.Get the date formatter that is used for formatting cells with date-like values.Get the current encoder.boolean
If TRUE, formulas are evaluated and the result is written to the CSV file.Get the formatter that is used for formatting cells with numeric values.int
Get the number of sheets in the workbook.Get the current row separator.getSheetAsCsv
(int index) Convert the given sheet to an CSV InputStream.getSheetAsCsv
(String name) Convert the given named sheet to a CSV InputStream.Get the names of the sheets in the workbook.Get the date formatter that is used for formatting cells with timestamp-like values.org.apache.poi.ss.usermodel.Workbook
Get the current workbook.Convert all sheets in the workbook to a single CSV-like stream.boolean
If TRUE, trailing empty columns are written to the CSV file so that all lines have the same number of columns.private boolean
hasDatePart
(String formatString) private boolean
hasTimePart
(String formatString) static boolean
Check if the given stream is an Excel file by looking at the first few bytes.static InputStream
Prepare the input stream for checking withisAnExcelFile(InputStream)
.void
readWorkbook
(InputStream xlsx) Read an Excel workbook from the stream.static boolean
seemsLikeAnExcelFile
(File file) Check the MIME type and file extension to see if the file could be an Excel file.void
setCharset
(Charset charset) Set the character set to use for the text.void
setColumnSeparator
(String columnSeparator) Set the column separator.void
setDateFormat
(Formatter<Date> dateFormat) Set a date formatter to use for formatting cells with date-like values.void
setEncoder
(EncoderDecoder encoder) Set the encoder to use for converting text values so that they do not conflict with column or row separators.void
setEvaluateFormulas
(boolean evaluateFormulas) void
setNumberFormat
(Formatter<Number> numberFormat) Set a formatter that is used for formatting cells with numeric values.void
setRowSeparator
(String rowSeparator) Set the row separator.void
setTimestampFormat
(Formatter<Date> timestampFormat) Set a date formatter to use for formatting cells with timestamp-like values.void
setWriteTrailingColumns
(boolean writeTrailingColumns)
-
Field Details
-
XLSX_MIME_TYPE
The MIME type for Excel workbook files.- See Also:
-
charset
-
columnSeparator
-
rowSeparator
-
writeTrailingColumns
private boolean writeTrailingColumns -
evaluateFormulas
private boolean evaluateFormulas -
encoder
-
dateFormat
-
timestampFormat
-
numberFormat
-
workbook
private org.apache.poi.ss.usermodel.Workbook workbook -
sheetNames
-
sheets
-
-
Constructor Details
-
XlsxToCsvUtil
public XlsxToCsvUtil()Create a new instance using defult settings.
-
-
Method Details
-
seemsLikeAnExcelFile
Check the MIME type and file extension to see if the file could be an Excel file. The file contents is not checked. -
isAnExcelFile
Check if the given stream is an Excel file by looking at the first few bytes. The stream must support seek(). CallprepareForCheck(InputStream)
before calling this method if it is not sure. -
prepareForCheck
Prepare the input stream for checking withisAnExcelFile(InputStream)
. If the given stream supports seek() it is returned as it is, otherwise it is wrapped by a buffered stream. -
getColumnSeparator
Get the current column separator. The default is {tab}. -
setColumnSeparator
Set the column separator. -
getRowSeparator
Get the current row separator. The default is {newline}. -
setRowSeparator
Set the row separator. -
getWriteTrailingColumns
public boolean getWriteTrailingColumns()If TRUE, trailing empty columns are written to the CSV file so that all lines have the same number of columns. If FALSE, trailing emtpy columns are skipped. The default is TRUE. -
setWriteTrailingColumns
public void setWriteTrailingColumns(boolean writeTrailingColumns) -
getEvaluateFormulas
public boolean getEvaluateFormulas()If TRUE, formulas are evaluated and the result is written to the CSV file. If FALSE, the formula expression is written. The default is FALSE. -
setEvaluateFormulas
public void setEvaluateFormulas(boolean evaluateFormulas) -
getCharset
Get the current character set. The default is UTF-8. -
setCharset
Set the character set to use for the text. -
getEncoder
Get the current encoder. The default is the ToSpaceEncoderDecoder. -
setEncoder
Set the encoder to use for converting text values so that they do not conflict with column or row separators. -
getDateFormat
Get the date formatter that is used for formatting cells with date-like values. If null, the default implementation is used (which may not be predicatable). -
setDateFormat
Set a date formatter to use for formatting cells with date-like values. If null, the default implementation is used (which may not be predicatable). -
getTimestampFormat
Get the date formatter that is used for formatting cells with timestamp-like values. If null, the default implementation is used (which may not be predicatable). -
setTimestampFormat
Set a date formatter to use for formatting cells with timestamp-like values. If null, the default implementation is used (which may not be predicatable). -
getNumberFormat
Get the formatter that is used for formatting cells with numeric values. If null, the default implementation is used (which may not be predicatable). -
setNumberFormat
Set a formatter that is used for formatting cells with numeric values. If null, the default implementation is used (which may not be predicatable). -
readWorkbook
Read an Excel workbook from the stream.- Throws:
IOException
-
getWorkbook
public org.apache.poi.ss.usermodel.Workbook getWorkbook()Get the current workbook. -
getNumSheets
public int getNumSheets()Get the number of sheets in the workbook. -
getSheetNames
Get the names of the sheets in the workbook. -
getSheetAsCsv
Convert the given named sheet to a CSV InputStream. -
getSheetAsCsv
Convert the given sheet to an CSV InputStream. The first sheet has index = 0. -
getWorkbookAsCsv
Convert all sheets in the workbook to a single CSV-like stream. Each sheet is converted as ifXlsxToCsvUtil.SheetInfo.parseToCsv()
has been called. To separate the sheets each section is started with [Name-of-sheet] and then followed by the sheet data. -
close
public void close()Closes the current workbook. It is safe to do this before reading the CSV streams to the end. -
hasTimePart
-
hasDatePart
-