Converting a database from LATIN1 to UTF-8

Lists: pgsql-general
From: Tormod Omholt-Jensen <tormod(at)boostcom(dot)no>
To: pgsql-general(at)postgresql(dot)org
Subject: Converting a database from LATIN1 to UTF-8
Date: 2006-03-27 11:32:33
Message-ID: 4427CD51.4050404@boostcom.no
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

We are upgrading our systems to handle new languages and therefore we
need to change the encoding of our postgres database from LATIN1 to UTF-8.

I have pg_dumped the database and imported it into the new UTF-8 base.
It seems like this worked just fine.

Can anyone confirm that this is the correct way to do it?

--
Tormod Omholt-Jensen
Developer
Boost Communications AS
www.boostcom.no


From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: Tormod Omholt-Jensen <tormod(at)boostcom(dot)no>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Converting a database from LATIN1 to UTF-8
Date: 2006-03-27 11:42:25
Message-ID: 20060327114225.GE30791@svana.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Mon, Mar 27, 2006 at 01:32:33PM +0200, Tormod Omholt-Jensen wrote:
> We are upgrading our systems to handle new languages and therefore we
> need to change the encoding of our postgres database from LATIN1 to UTF-8.
>
> I have pg_dumped the database and imported it into the new UTF-8 base.
> It seems like this worked just fine.
>
> Can anyone confirm that this is the correct way to do it?

Yep. The only thing you need to check is the clients connecting to the
database. They will start receving their output in UNICODE also. If
they don't want that they need to specify a default encoding.

Other than that your all set.

Have a nice day,
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.


From: Kai Hessing <kai(dot)hessing(at)hobsons(dot)de>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Converting a database from LATIN1 to UTF-8
Date: 2006-03-30 14:45:53
Message-ID: 492992Fmc96eU1@individual.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Tormod Omholt-Jensen wrote:
> We are upgrading our systems to handle new languages and therefore we
> need to change the encoding of our postgres database from LATIN1 to UTF-8.
>
> I have pg_dumped the database and imported it into the new UTF-8 base.
> It seems like this worked just fine.
>
> Can anyone confirm that this is the correct way to do it?

We did the same two month ago in this way:

pg_dump -Fp -p 5612 -f forumdb_export forumdb
iconv -f WINDOWS-1252 -t UTF-8 forumdb_export > hdb_admin_import
psql -q -p 5612 -f hdb_admin_import hdb_admin
rm forumdb_export
rm hdb_admin_import

forumdb was the old db and hdb is the new one. You notice that we don't
convert from LATIN-1 or ISO to UTF-8 instead we're using WINDOWS-1252.
The problem we had was a bad programming in the past and lots of entries
had been written from the HTML-form to the database without any encoding
information. Windows-1252 looks very much like ISO 8859-15 except for
the ASCII Range from 7F to 9F in which for example the Euro-Sign is
located. If you convert from ISO to UTF-8 (btw. LATIN-1 is the same as
ISO 8859-1. Few characters are different to ISO 8859-15).
The database doesn't see the difference between this two decoding types
and so no error is produced. The problem is that you can have two (or
more) encodings in your database depending on the client. 99.9% are
converted correctly, but because of the 0.1% we decided to use the
Windows-Encoding because most clients had been Windows-Clients with IE.
And using this encoding we had to edit 4 or 5 places manually where some
french encoded had slipped to the database.
Fixing the software bugs and using now UTF-8 should have solved this
problem finally.

Another issue are the large objects (BLOBS). I didn't had any idea how
to convert them. So I converted the database without these blobs and
exported/imported the blobs with a php-Script.