Re: Converting a database from LATIN1 to UTF-8

From: Kai Hessing <kai(dot)hessing(at)hobsons(dot)de>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Converting a database from LATIN1 to UTF-8
Date: 2006-03-30 14:45:53
Message-ID: 492992Fmc96eU1@individual.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Tormod Omholt-Jensen wrote:
> We are upgrading our systems to handle new languages and therefore we
> need to change the encoding of our postgres database from LATIN1 to UTF-8.
>
> I have pg_dumped the database and imported it into the new UTF-8 base.
> It seems like this worked just fine.
>
> Can anyone confirm that this is the correct way to do it?

We did the same two month ago in this way:

pg_dump -Fp -p 5612 -f forumdb_export forumdb
iconv -f WINDOWS-1252 -t UTF-8 forumdb_export > hdb_admin_import
psql -q -p 5612 -f hdb_admin_import hdb_admin
rm forumdb_export
rm hdb_admin_import

forumdb was the old db and hdb is the new one. You notice that we don't
convert from LATIN-1 or ISO to UTF-8 instead we're using WINDOWS-1252.
The problem we had was a bad programming in the past and lots of entries
had been written from the HTML-form to the database without any encoding
information. Windows-1252 looks very much like ISO 8859-15 except for
the ASCII Range from 7F to 9F in which for example the Euro-Sign is
located. If you convert from ISO to UTF-8 (btw. LATIN-1 is the same as
ISO 8859-1. Few characters are different to ISO 8859-15).
The database doesn't see the difference between this two decoding types
and so no error is produced. The problem is that you can have two (or
more) encodings in your database depending on the client. 99.9% are
converted correctly, but because of the 0.1% we decided to use the
Windows-Encoding because most clients had been Windows-Clients with IE.
And using this encoding we had to edit 4 or 5 places manually where some
french encoded had slipped to the database.
Fixing the software bugs and using now UTF-8 should have solved this
problem finally.

Another issue are the large objects (BLOBS). I didn't had any idea how
to convert them. So I converted the database without these blobs and
exported/imported the blobs with a php-Script.

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Kai Hessing 2006-03-30 15:11:38 Re: Performance Killer 'IN' ?
Previous Message Tom Lane 2006-03-30 14:34:46 Re: Chasing "signal 11" issues