Opened 15 years ago

Closed 15 years ago

#1168 closed task (fixed)

Require UTF-8 to be used as database character set

Reported by: Nicklas Nordborg Owned by: Nicklas Nordborg
Priority: blocker Milestone: BASE 2.9
Component: documentation Version:
Keywords: Cc:

Description (last modified by Nicklas Nordborg)

A side effect of #792 is that we must be able to store non-latin1 characters in the database. Eg. µ (micro) and Ω (ohm). Since MySQL supports the using different character sets on different columns, we originally though that this would not be problematic since it was possible to force the UnitSymbols.symbol column to use UTF-8. In practice there is a problem since we must now set the characterEncoding=UTF-8 in the connection url (db.url setting in base.config). This works fine for the UnitSymbols.symbol column, but not for 'text'-type columns that are latin1 columns. So trying to insert a description containing, for example 'åäö', results in 'åäö' being stored in the database. Eg. it seems like the UTF-8-encoded string is being inserted without first converting it to latin1. This would probably be ok if there was no conversion when reading the value back. But in this case the the bad string is "converted" to UTF-8 encoding again, resulting in: åäö. My feeling is that this might be a MySQL bug since there is no problem for 'varchar' type columns, though I don't know if this is because there is or isn't a conversion for both directions.

This ticket is about documenting the fact that UTF-8 is needed and to make sure that default configuration files, etc. reflect this fact. A separate ticket (#1169) has been created that is about creating a script for changing an existing database to use UTF-8 encoding.

Change History (2)

comment:1 by Nicklas Nordborg, 15 years ago

Description: modified (diff)
Owner: changed from everyone to Nicklas Nordborg
Status: newassigned

comment:2 by Nicklas Nordborg, 15 years ago

Resolution: fixed
Status: assignedclosed

(In [4637]) Fixes #1168: Require UTF-8 to be used as database character set

Note: See TracTickets for help on using tickets.