Context Navigation

← Previous Ticket
Next Ticket →

#1947 closed enhancement (wontfix)

Auto-detect character encoding of uploaded text files

Reported by:	Nicklas Nordborg	Owned by:	everyone
Priority:	minor	Milestone:
Component:	web	Version:
Keywords:		Cc:

Description

See http://baseplugins.thep.lu.se/ticket/788 for a recent issue with incorrect character set. BASE will not set a character set on files unless manually selected. But a character set is required for parsing and if none is set the default is used. Typically the default is UTF-8 or ISO-8859-1.

http://site.icu-project.org/ seems to have some code for making a guess based on the file content. Code examples: http://www.programcreek.com/java-api-examples/index.php?api=com.ibm.icu.text.CharsetDetector

It might be worth investing some time in checking this out and see if we can attach some clever functionality to the file upload in BASE.

Change History (1)

comment:1 by Nicklas Nordborg, 10 years ago

Milestone:	BASE 3.6
Resolution:	→ wontfix
Status:	new → closed

Hmmm... this would not have solved the immediate problem since Java has no support UTF-7 encoded files. Closing this as wontfix for now. Feel free to re-open if it becomes a big problem in the future.

Note: See TracTickets for help on using tickets.

Download in other formats: