#1169 closed task (fixed)
Create a function that changes an existing BASE database to use UTF-8
Reported by: | Nicklas Nordborg | Owned by: | Nicklas Nordborg |
---|---|---|---|
Priority: | blocker | Milestone: | BASE 2.9 |
Component: | install | Version: | |
Keywords: | Cc: |
Description
See #1168 for background info.
The script provided with the BASE installation creates a database that uses UTF-8, but the examples in the documentation doesn't do this. Since this will probably create the database with 'latin1' encoding instead we need to have a function that can convert an existing database to use UTF-8 instead.
In theory it is as simply as executing the SQL below for each table:
alter table <table-name> convert to character set utf8, default character set utf8;
and for the database:
alter database <db-name> default character set utf8;
We need to build a function in the installation/update program that finds all tables and generates a script file that changes the character set. If possible, only tables that has one or more string columns need the 'convert to character set utf8' part, but all tables must be changed to 'default character set utf8'.
For safety reasons I think the function should output a SQL script file that an admin executes manually on the database. There may be exotic server configurations that needs modifications to the generated script. Optionally, we may provide a flag to indicate that the script should be executed by the function.
PostgreSQL users
PostgreSQL uses UTF-8 as the default character set so in most cases this should not be a problem. In PostgreSQL the character set is a database-wide setting that can't be changed later, so if there are PostgreSQL users with a BASE database that uses a different character set this will be a big problem for them. The only thing I can think of is to dump the entire database to some kind of SQL script for creating a new database that uses UTF-8 instead. Hopefully this can be done with the pg_dump tool as described here: http://www.postgresql.org/docs/8.1/static/backup.html
Change History (8)
comment:1 by , 16 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
follow-up: 5 comment:2 by , 16 years ago
comment:3 by , 16 years ago
(In [4645]) References #1169: Create a function that changes an existing BASE database to use UTF-8
Fixes a NullPointerException.
comment:4 by , 16 years ago
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
Tested on backup of demo-database.
- Database size: 600+600 MB.
- Conversion time: 12 min (on my slow development box).
comment:5 by , 16 years ago
Replying to nicklas:
(In [4636]) References #1169: Create a function that changes an existing BASE database to use UTF-8
The script should now be functional. It is for MySQL only and is used like this:
cd <base-dir>/bin ./onetimefix.sh utf8 -xThe -x option makes the script convert the database immediately. Skip this option to only generate a file (convert-to-utf8.sql) that can be executed manually.
I ran the update successfully on my server but the units still have strange characters. I suppose I need to update the units table, how do I do that?
comment:6 by , 16 years ago
The utf8 fix must be executed against a 2.8 database BEFORE ./updatedb.sh is used to update the database to 2.9. It will not fix a database that has been updated against the trunk.
follow-up: 8 comment:7 by , 16 years ago
A possible workaround if things have been done in the wrong order is to manually delete some information from the database. This procedure only works if units has not been used by any annotations or annotation types yet.
- Delete the UnitSymbols table:
DROP TABLE `UnitSymbols`
- Empty the Units table:
DELETE FROM `Units`
- Run the updatedb.sh script again to re-create the deleted table/data
comment:8 by , 16 years ago
Replying to nicklas:
A possible workaround if things have been done in the wrong order is to manually delete some information from the database. This procedure only works if units has not been used by any annotations or annotation types yet.
- Delete the UnitSymbols table:
DROP TABLE `UnitSymbols`
- Empty the Units table:
DELETE FROM `Units`
- Run the updatedb.sh script again to re-create the deleted table/data
I usually live on the trunk so I run the above recipe and it works great.
(In [4636]) References #1169: Create a function that changes an existing BASE database to use UTF-8
The script should now be functional. It is for MySQL only and is used like this:
The -x option makes the script convert the database immediately. Skip this option to only generate a file (convert-to-utf8.sql) that can be executed manually.