Opened 14 years ago

Closed 14 years ago

Last modified 14 years ago

#1169 closed task (fixed)

Create a function that changes an existing BASE database to use UTF-8

Reported by: Nicklas Nordborg Owned by: Nicklas Nordborg
Priority: blocker Milestone: BASE 2.9
Component: install Version:
Keywords: Cc:

Description

See #1168 for background info.

The script provided with the BASE installation creates a database that uses UTF-8, but the examples in the documentation doesn't do this. Since this will probably create the database with 'latin1' encoding instead we need to have a function that can convert an existing database to use UTF-8 instead.

In theory it is as simply as executing the SQL below for each table:

alter table <table-name>
convert to character set utf8,
default character set utf8;

and for the database:

alter database <db-name> 
default character set utf8;

We need to build a function in the installation/update program that finds all tables and generates a script file that changes the character set. If possible, only tables that has one or more string columns need the 'convert to character set utf8' part, but all tables must be changed to 'default character set utf8'.

For safety reasons I think the function should output a SQL script file that an admin executes manually on the database. There may be exotic server configurations that needs modifications to the generated script. Optionally, we may provide a flag to indicate that the script should be executed by the function.

PostgreSQL users

PostgreSQL uses UTF-8 as the default character set so in most cases this should not be a problem. In PostgreSQL the character set is a database-wide setting that can't be changed later, so if there are PostgreSQL users with a BASE database that uses a different character set this will be a big problem for them. The only thing I can think of is to dump the entire database to some kind of SQL script for creating a new database that uses UTF-8 instead. Hopefully this can be done with the pg_dump tool as described here: http://www.postgresql.org/docs/8.1/static/backup.html

Change History (8)

comment:1 Changed 14 years ago by Nicklas Nordborg

Owner: changed from everyone to Nicklas Nordborg
Status: newassigned

comment:2 Changed 14 years ago by Nicklas Nordborg

(In [4636]) References #1169: Create a function that changes an existing BASE database to use UTF-8

The script should now be functional. It is for MySQL only and is used like this:

cd <base-dir>/bin
./onetimefix.sh utf8 -x

The -x option makes the script convert the database immediately. Skip this option to only generate a file (convert-to-utf8.sql) that can be executed manually.

comment:3 Changed 14 years ago by Nicklas Nordborg

(In [4645]) References #1169: Create a function that changes an existing BASE database to use UTF-8

Fixes a NullPointerException?.

comment:4 Changed 14 years ago by Nicklas Nordborg

Resolution: fixed
Status: assignedclosed

Tested on backup of demo-database.

  • Database size: 600+600 MB.
  • Conversion time: 12 min (on my slow development box).

comment:5 in reply to:  2 Changed 14 years ago by Jari Häkkinen

Replying to nicklas:

(In [4636]) References #1169: Create a function that changes an existing BASE database to use UTF-8

The script should now be functional. It is for MySQL only and is used like this:

cd <base-dir>/bin
./onetimefix.sh utf8 -x

The -x option makes the script convert the database immediately. Skip this option to only generate a file (convert-to-utf8.sql) that can be executed manually.

I ran the update successfully on my server but the units still have strange characters. I suppose I need to update the units table, how do I do that?

comment:6 Changed 14 years ago by Nicklas Nordborg

The utf8 fix must be executed against a 2.8 database BEFORE ./updatedb.sh is used to update the database to 2.9. It will not fix a database that has been updated against the trunk.

comment:7 Changed 14 years ago by Nicklas Nordborg

A possible workaround if things have been done in the wrong order is to manually delete some information from the database. This procedure only works if units has not been used by any annotations or annotation types yet.

  1. Delete the UnitSymbols? table: DROP TABLE `UnitSymbols`
  2. Empty the Units table: DELETE FROM `Units`
  3. Run the updatedb.sh script again to re-create the deleted table/data

comment:8 in reply to:  7 Changed 14 years ago by Jari Häkkinen

Replying to nicklas:

A possible workaround if things have been done in the wrong order is to manually delete some information from the database. This procedure only works if units has not been used by any annotations or annotation types yet.

  1. Delete the UnitSymbols? table: DROP TABLE `UnitSymbols`
  2. Empty the Units table: DELETE FROM `Units`
  3. Run the updatedb.sh script again to re-create the deleted table/data

I usually live on the trunk so I run the above recipe and it works great.

Note: See TracTickets for help on using tickets.