Opened 15 years ago

Last modified 15 years ago

#1168 closed task

Require UTF-8 to be used as database character set — at Initial Version

Reported by: Nicklas Nordborg Owned by: everyone
Priority: blocker Milestone: BASE 2.9
Component: documentation Version:
Keywords: Cc:

Description

A side effect of #792 is that we must be able to store non-latin1 characters in the database. Eg. µ (micro) and Ω (ohm). Since MySQL supports the using different character sets on different columns, we originally though that this would not be problematic since it was possible to force the UnitSymbols.symbol column to use UTF-8. In practice there is a problem since we must now set the characterEncoding=UTF-8 in the connection url (db.url setting in base.config). This works fine for the UnitSymbols.symbol column, but not for 'text'-type columns that are latin1 columns. So trying to insert a description containing, for example 'åäö', results in 'åäö' being stored in the database. Eg. it seems like the UTF-8-encoded string is being inserted without first converting it to latin1. This would probably be ok if there was no conversion when reading the value back. But in this case the the bad string is "converted" to UTF-8 encoding again, resulting in: åäö. My feeling is that this might be a MySQL bug since there is no problem for 'varchar' type columns, though I don't know if this is because there is or isn't a conversion for both directions.

This ticket is about documenting the fact that UTF-8 is needed and to make sure that default configuration files, etc. reflect this fact. A separate ticket will be created that is about creating a script for changing an existing database to UTF-8.

Change History (0)

Note: See TracTickets for help on using tickets.