Opened 9 years ago

Closed 9 years ago

#1947 closed enhancement (wontfix)

Auto-detect character encoding of uploaded text files

Reported by: Nicklas Nordborg Owned by: everyone
Priority: minor Milestone:
Component: web Version:
Keywords: Cc:

Description

See http://baseplugins.thep.lu.se/ticket/788 for a recent issue with incorrect character set. BASE will not set a character set on files unless manually selected. But a character set is required for parsing and if none is set the default is used. Typically the default is UTF-8 or ISO-8859-1.

http://site.icu-project.org/ seems to have some code for making a guess based on the file content. Code examples: http://www.programcreek.com/java-api-examples/index.php?api=com.ibm.icu.text.CharsetDetector

It might be worth investing some time in checking this out and see if we can attach some clever functionality to the file upload in BASE.

Change History (1)

comment:1 by Nicklas Nordborg, 9 years ago

Milestone: BASE 3.6
Resolution: wontfix
Status: newclosed

Hmmm... this would not have solved the immediate problem since Java has no support UTF-7 encoded files. Closing this as wontfix for now. Feel free to re-open if it becomes a big problem in the future.

Note: See TracTickets for help on using tickets.