Opened 10 years ago
Closed 10 years ago
#1947 closed enhancement (wontfix)
Auto-detect character encoding of uploaded text files
Reported by: | Nicklas Nordborg | Owned by: | everyone |
---|---|---|---|
Priority: | minor | Milestone: | |
Component: | web | Version: | |
Keywords: | Cc: |
Description
See http://baseplugins.thep.lu.se/ticket/788 for a recent issue with incorrect character set. BASE will not set a character set on files unless manually selected. But a character set is required for parsing and if none is set the default is used. Typically the default is UTF-8 or ISO-8859-1.
http://site.icu-project.org/ seems to have some code for making a guess based on the file content. Code examples: http://www.programcreek.com/java-api-examples/index.php?api=com.ibm.icu.text.CharsetDetector
It might be worth investing some time in checking this out and see if we can attach some clever functionality to the file upload in BASE.
Change History (1)
comment:1 by , 10 years ago
Milestone: | BASE 3.6 |
---|---|
Resolution: | → wontfix |
Status: | new → closed |
Hmmm... this would not have solved the immediate problem since Java has no support UTF-7 encoded files. Closing this as wontfix for now. Feel free to re-open if it becomes a big problem in the future.