Latin1 to UTF-8 ?

From: Aarni Ruuhimäki <aarni(at)kymi(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Latin1 to UTF-8 ?
Date: 2007-08-03 12:37:20
Message-ID: 200708031537.20276.aarni@kymi.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi,

I've set up a new CentOs server with PostgreSQL 8.2.4 and initdb'ed it with
UTF-8.

Ok, and runs fine.

I have a problem with encodings, however. And mainly with the russian cyrillic
characters.

When I testdumped some dbs from the old FC / Pg 8.0.2, all Latin1, I noticed
that some of the dumps show in the Konqueror file browser as 'Plain Text
Documents' and some as 'C++ Source Files'. Both have Latin1 as client
encoding at the top of the files. Changing that gives errors, as expected.

Looking in to the plain text dumps I see all cyrillic characters as &#1056;...
and these go in display fine from the new server's UTF-8 environment.

Some of the 'C++' files have the cyrillics as 'îñåòèòåëåé'. Some have both
'îñåòèòåëåé' and &#1056;... and ofcourse the 'îñåò' characters come out wrong
and unreadable to the browser. (not sure if you an see single quoted ones,
but they look something like hebrew or similar)

I have no idea what browsers / encodings or even keyboard layouts have been
used when the data has been inserted by users through their web
interfaces ...

I tried the -F p switch as the earlier version has no -E for dumps. Same
output. Also with pg_dumpall.

I tried various encodings with iconv too.

So, what would be the proper way to convert the dumps to UTF-8 ? Or any other
solution ? Any other tool to work with the problem files ?

BR,

Aarni
--
Aarni Ruuhimäki

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Devrim GÜNDÜZ 2007-08-03 13:04:27 Re: Suse RPM's
Previous Message Gavin M. Roy 2007-08-03 11:57:08 Re: What do people like to monitor (or in other words, what might be nice in pgsnmpd)?