Package net.sf.basedb.util.excel
Class XlsxToCsvUtil
- java.lang.Object
-
- net.sf.basedb.util.excel.XlsxToCsvUtil
-
public class XlsxToCsvUtil extends Object
Utility class for converting Excel worksheets to CSV-formatted InputStreams. NOTE! If the Excel worksheet contains numbers and/or date values it is recommended that a number and/or date formatter is specified. If not, the values will be converted to text using the formatting specification from the Excel workbook which may depende on locale settings and other stuff (=the results may be unpredicatable). So far we only support for one formatter of each type. Future use cases may require different formatters for different columns, but we will wait with this until it is needed.- Since:
- 3.15
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description (package private) static class
XlsxToCsvUtil.CellTypeWithInfo
class
XlsxToCsvUtil.SheetInfo
Represents a single sheet in a workbook.(package private) class
XlsxToCsvUtil.WorkbookStream
Input stream implementation that combines streams from all sheets in the current workbook.
-
Field Summary
Fields Modifier and Type Field Description private Charset
charset
private String
columnSeparator
private Formatter<Date>
dateFormat
private EncoderDecoder
encoder
private boolean
evaluateFormulas
private Formatter<Number>
numberFormat
private String
rowSeparator
private List<String>
sheetNames
private Map<String,XlsxToCsvUtil.SheetInfo>
sheets
private Formatter<Date>
timestampFormat
private org.apache.poi.ss.usermodel.Workbook
workbook
private boolean
writeTrailingColumns
static String
XLSX_MIME_TYPE
The MIME type for Excel workbook files.
-
Constructor Summary
Constructors Constructor Description XlsxToCsvUtil()
Create a new instance using defult settings.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description void
close()
Closes the current workbook.Charset
getCharset()
Get the current character set.String
getColumnSeparator()
Get the current column separator.Formatter<Date>
getDateFormat()
Get the date formatter that is used for formatting cells with date-like values.EncoderDecoder
getEncoder()
Get the current encoder.boolean
getEvaluateFormulas()
If TRUE, formulas are evaluated and the result is written to the CSV file.Formatter<Number>
getNumberFormat()
Get the formatter that is used for formatting cells with numeric values.int
getNumSheets()
Get the number of sheets in the workbook.String
getRowSeparator()
Get the current row separator.XlsxToCsvUtil.SheetInfo
getSheetAsCsv(int index)
Convert the given sheet to an CSV InputStream.XlsxToCsvUtil.SheetInfo
getSheetAsCsv(String name)
Convert the given named sheet to a CSV InputStream.List<String>
getSheetNames()
Get the names of the sheets in the workbook.Formatter<Date>
getTimestampFormat()
Get the date formatter that is used for formatting cells with timestamp-like values.org.apache.poi.ss.usermodel.Workbook
getWorkbook()
Get the current workbook.InputStream
getWorkbookAsCsv()
Convert all sheets in the workbook to a single CSV-like stream.boolean
getWriteTrailingColumns()
If TRUE, trailing empty columns are written to the CSV file so that all lines have the same number of columns.private boolean
hasDatePart(String formatString)
private boolean
hasTimePart(String formatString)
static boolean
isAnExcelFile(InputStream in)
Check if the given stream is an Excel file by looking at the first few bytes.static InputStream
prepareForCheck(InputStream in)
Prepare the input stream for checking withisAnExcelFile(InputStream)
.void
readWorkbook(InputStream xlsx)
Read an Excel workbook from the stream.static boolean
seemsLikeAnExcelFile(File file)
Check the MIME type and file extension to see if the file could be an Excel file.void
setCharset(Charset charset)
Set the character set to use for the text.void
setColumnSeparator(String columnSeparator)
Set the column separator.void
setDateFormat(Formatter<Date> dateFormat)
Set a date formatter to use for formatting cells with date-like values.void
setEncoder(EncoderDecoder encoder)
Set the encoder to use for converting text values so that they do not conflict with column or row separators.void
setEvaluateFormulas(boolean evaluateFormulas)
void
setNumberFormat(Formatter<Number> numberFormat)
Set a formatter that is used for formatting cells with numeric values.void
setRowSeparator(String rowSeparator)
Set the row separator.void
setTimestampFormat(Formatter<Date> timestampFormat)
Set a date formatter to use for formatting cells with timestamp-like values.void
setWriteTrailingColumns(boolean writeTrailingColumns)
-
-
-
Field Detail
-
XLSX_MIME_TYPE
public static final String XLSX_MIME_TYPE
The MIME type for Excel workbook files.- See Also:
- Constant Field Values
-
charset
private Charset charset
-
columnSeparator
private String columnSeparator
-
rowSeparator
private String rowSeparator
-
writeTrailingColumns
private boolean writeTrailingColumns
-
evaluateFormulas
private boolean evaluateFormulas
-
encoder
private EncoderDecoder encoder
-
workbook
private org.apache.poi.ss.usermodel.Workbook workbook
-
sheets
private Map<String,XlsxToCsvUtil.SheetInfo> sheets
-
-
Method Detail
-
seemsLikeAnExcelFile
public static boolean seemsLikeAnExcelFile(File file)
Check the MIME type and file extension to see if the file could be an Excel file. The file contents is not checked.
-
isAnExcelFile
public static boolean isAnExcelFile(InputStream in)
Check if the given stream is an Excel file by looking at the first few bytes. The stream must support seek(). CallprepareForCheck(InputStream)
before calling this method if it is not sure.
-
prepareForCheck
public static InputStream prepareForCheck(InputStream in)
Prepare the input stream for checking withisAnExcelFile(InputStream)
. If the given stream supports seek() it is returned as it is, otherwise it is wrapped by a buffered stream.
-
getColumnSeparator
public String getColumnSeparator()
Get the current column separator. The default is {tab}.
-
setColumnSeparator
public void setColumnSeparator(String columnSeparator)
Set the column separator.
-
getRowSeparator
public String getRowSeparator()
Get the current row separator. The default is {newline}.
-
setRowSeparator
public void setRowSeparator(String rowSeparator)
Set the row separator.
-
getWriteTrailingColumns
public boolean getWriteTrailingColumns()
If TRUE, trailing empty columns are written to the CSV file so that all lines have the same number of columns. If FALSE, trailing emtpy columns are skipped. The default is TRUE.
-
setWriteTrailingColumns
public void setWriteTrailingColumns(boolean writeTrailingColumns)
-
getEvaluateFormulas
public boolean getEvaluateFormulas()
If TRUE, formulas are evaluated and the result is written to the CSV file. If FALSE, the formula expression is written. The default is FALSE.
-
setEvaluateFormulas
public void setEvaluateFormulas(boolean evaluateFormulas)
-
getCharset
public Charset getCharset()
Get the current character set. The default is UTF-8.
-
setCharset
public void setCharset(Charset charset)
Set the character set to use for the text.
-
getEncoder
public EncoderDecoder getEncoder()
Get the current encoder. The default is the ToSpaceEncoderDecoder.
-
setEncoder
public void setEncoder(EncoderDecoder encoder)
Set the encoder to use for converting text values so that they do not conflict with column or row separators.
-
getDateFormat
public Formatter<Date> getDateFormat()
Get the date formatter that is used for formatting cells with date-like values. If null, the default implementation is used (which may not be predicatable).
-
setDateFormat
public void setDateFormat(Formatter<Date> dateFormat)
Set a date formatter to use for formatting cells with date-like values. If null, the default implementation is used (which may not be predicatable).
-
getTimestampFormat
public Formatter<Date> getTimestampFormat()
Get the date formatter that is used for formatting cells with timestamp-like values. If null, the default implementation is used (which may not be predicatable).
-
setTimestampFormat
public void setTimestampFormat(Formatter<Date> timestampFormat)
Set a date formatter to use for formatting cells with timestamp-like values. If null, the default implementation is used (which may not be predicatable).
-
getNumberFormat
public Formatter<Number> getNumberFormat()
Get the formatter that is used for formatting cells with numeric values. If null, the default implementation is used (which may not be predicatable).
-
setNumberFormat
public void setNumberFormat(Formatter<Number> numberFormat)
Set a formatter that is used for formatting cells with numeric values. If null, the default implementation is used (which may not be predicatable).
-
readWorkbook
public void readWorkbook(InputStream xlsx) throws IOException
Read an Excel workbook from the stream.- Throws:
IOException
-
getWorkbook
public org.apache.poi.ss.usermodel.Workbook getWorkbook()
Get the current workbook.
-
getNumSheets
public int getNumSheets()
Get the number of sheets in the workbook.
-
getSheetAsCsv
public XlsxToCsvUtil.SheetInfo getSheetAsCsv(String name)
Convert the given named sheet to a CSV InputStream.
-
getSheetAsCsv
public XlsxToCsvUtil.SheetInfo getSheetAsCsv(int index)
Convert the given sheet to an CSV InputStream. The first sheet has index = 0.
-
getWorkbookAsCsv
public InputStream getWorkbookAsCsv()
Convert all sheets in the workbook to a single CSV-like stream. Each sheet is converted as ifXlsxToCsvUtil.SheetInfo.parseToCsv()
has been called. To separate the sheets each section is started with [Name-of-sheet] and then followed by the sheet data.
-
close
public void close()
Closes the current workbook. It is safe to do this before reading the CSV streams to the end.
-
hasTimePart
private boolean hasTimePart(String formatString)
-
hasDatePart
private boolean hasDatePart(String formatString)
-
-