Opened 6 years ago

Closed 6 years ago

Last modified 6 years ago

#2161 closed task (fixed)

Import data from Excel files

Reported by: Nicklas Nordborg Owned by: Nicklas Nordborg
Priority: major Milestone: BASE 3.15
Component: core Version:
Keywords: Cc:

Description

There are a lot of importer plug-ins that can import data from CSV files. Typically, the CSV files are starting out as Excel files on some computer before being save as CSV and uploaded to BASE. The problem is that Excel is not very transparent when it comes to things such as character encoding, date formats, etc. It is not uncommon that users have to try multiple times before the file can be imported by BASE.

It would be nice to have support for reading data directly from Excel files. An approach to this has already been done in Reggie (http://baseplugins.thep.lu.se/ticket/1110). But this is for a special case only. It would be nice to, for example, add support directly to the FlatFileParser implementation. This would make it possible for all existing plug-in (I think they are all based on the FlatFileParser) to use CSV or Excel files without any code changes as long as the data is in the first sheet. With some minor changes it would also be possible to read multi-sheet Excel files and to provide direct access to cells so that data doesn't have to be converted via strings.

Change History (24)

comment:1 by Nicklas Nordborg, 6 years ago

Owner: changed from everyone to Nicklas Nordborg
Status: newaccepted

comment:2 by Nicklas Nordborg, 6 years ago

In 7652:

References #2161: Import data from Excel files

Added common-math JAR file. It is needed when evaluating formulas.

comment:3 by Nicklas Nordborg, 6 years ago

In 7653:

References #2161: Import data from Excel files

Added MIME type for xlsx files to installation script.

comment:4 by Nicklas Nordborg, 6 years ago

In 7654:

References #2161: Import data from Excel files

Started to implement a utility for parsing an Excel file like it was a CSV file. The XlsxToCsvUtil is inspired by the one originally developed by Reggie, but it need to support a wider range of Excel features.

The "Test with file" function is currently used as a testbed.

comment:5 by Nicklas Nordborg, 6 years ago

In 7655:

References #2161: Import data from Excel files

Moved auto-detection of Excel files into the FlatFileParser and added some methods for date/timestamp formatting with sensible defaults. This should make it possible for more or less all plug-ins or code that uses the FlatFileParser to import from Excel files (via auto-conversion to CSV) as long as the data is in the first sheet. To get access to other sheets the 'charset' must specify the sheet name or index instead.

comment:6 by Nicklas Nordborg, 6 years ago

In 7656:

References #2161: Import data from Excel files

Added support for selecting a specific sheet when importing to the AbstractItemImporter class which means that all batch item importers now should support this.

comment:7 by Nicklas Nordborg, 6 years ago

In 7657:

References #2161: Import data from Excel files

Added support for selecting a specific sheet when importing to the AnnotationFlatFileImporter. Also made it possible to specify this in the configuration.

comment:8 by Nicklas Nordborg, 6 years ago

In 7658:

References #2161: Import data from Excel files

The number formatter was swallowing all decimals except one. Adding more # to the format pattern should fix this.

comment:9 by Nicklas Nordborg, 6 years ago

In 7664:

References #2161: Import data from Excel files

The FlatFileParser can now pass requests for data values on to the actual Excel worksheet object with help of the ExcelData class that is a subclass of Data.

For code that is getting data via the Mapper interface the ColumnMapper has been updated (getInt() and getFloat()). There are some remaining problems:

  • The Mapper interface has no support for date, double or long values
  • A lot of code is not using the Mapper interface but go directly to Data.get() which is returning a string that is then parsed to get a date or number.


So, the Mapper.getValue() and Data.get() methods have been deprecated so that we can more easily find all places in our code that is using those methods. The Data class already has replacement methods in place and we need to updated the Mapper interface as well.

comment:10 by Nicklas Nordborg, 6 years ago

In 7665:

References #2161: Import data from Excel files

Replaced all calls to Data.get() with some of the new methods.

comment:11 by Nicklas Nordborg, 6 years ago

In 7666:

References #2161: Import data from Excel files

Introduced Mapper.getString() method and changed all places in the code that calls Mapper.getValue() to the new method where it is apparent that a string value is wanted. Most remaining places that call Mapper.getValue() need dates or a value based on a generic Type.

comment:12 by Nicklas Nordborg, 6 years ago

In 7667:

References #2161: Import data from Excel files

Introduced Mapper.getDate() method and changed all places in the code that uses/parses string values to use the new method (where this is apparent). Most remaining places that call Mapper.getValue() need a value based on a generic Type.

comment:13 by Nicklas Nordborg, 6 years ago

In 7668:

References #2161: Import data from Excel files

Introduced Mapper.getByType() method as a utility method that delegates to one of the other methods depedning on the type parameter. Also added Mapper.getDouble() and Mapper.getLong() for completeness.

Updated the remaining places in the code that uses a mapper.

comment:14 by Nicklas Nordborg, 6 years ago

In 7670:

References #2161: Import data from Excel files

Extended the TestAnnotationFlatFileImporter and TestFlatFileParser with test cases that read data from Excel.

comment:15 by Nicklas Nordborg, 6 years ago

In 7671:

References #2161: Import data from Excel files

Fixes a problem with some columns that are missing if the some rows have empty cells at the end of the line.

comment:16 by Nicklas Nordborg, 6 years ago

In 7672:

References #2161: Import data from Excel files

The ColFunction must assume that it is doing numerical computations unless the value is not possible to parse as a number (this is how it behaved before and the value will be converted later if possible).

comment:17 by Nicklas Nordborg, 6 years ago

In 7673:

References #2161: Import data from Excel files

Added functionality for parsing all worksheets in an Excel file as a single CSV-like stream. It is made possible by sections with the sheet name in brackets. For example:

[Sheet1]
Data from sheet
[Sheet2]
Data from sheet 2

The section functionality in the FlatFileParser has been updated to allow for this kind of file.

comment:18 by Nicklas Nordborg, 6 years ago

In 7674:

References #2161: Import data from Excel files

Added functionality for setting a base style that is used when creating new cells on a worksheet. The intention is that it can be used to, for example, set a bold font the header line, or to style different lines with different background colors.

comment:19 by Nicklas Nordborg, 6 years ago

In 7675:

References #2161: Import data from Excel files

Added XlsxTableWriter which is an extension of the TableWriter that write data to an excel file instead of to an CSV file. Existing code should be able to use the either writer with only minor changes to the code.

comment:20 by Nicklas Nordborg, 6 years ago

In 7676:

References #2161: Import data from Excel files

The FlatFileParser need to keep track of how many lines that has been parsed when switching worksheet in Excel mode or the mapping will not point to the correct row in the current sheet.

comment:21 by Nicklas Nordborg, 6 years ago

In 7677:

References #2161: Import data from Excel files

Added test code for testing the XlsxTableWriter and XlsxToCsvUtil. The latter is tested both standalone and via the FlatFileParser.

comment:22 by Nicklas Nordborg, 6 years ago

In 7678:

References #2161: Import data from Excel files

Implemented all print(), println() and write() methods in the XlsxTableWriter to make it almost a drop-in replacement for any PrintWriter usage that want to support writing to both CSV and Excel files with the same code.

comment:23 by Nicklas Nordborg, 6 years ago

Resolution: fixed
Status: acceptedclosed

comment:24 by Nicklas Nordborg, 6 years ago

In 7685:

References #2161: Import data from Excel files

Switched from deprecated method.

Note: See TracTickets for help on using tickets.