Lists: | pgsql-general |
---|
From: | Tom Hart <tomhart(at)coopfed(dot)org> |
---|---|
To: | Postgres General List <pgsql-general(at)postgresql(dot)org> |
Subject: | utf8 issue |
Date: | 2008-02-26 19:40:25 |
Message-ID: | 47C46B29.9050507@coopfed.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-general |
Hello everybody. I recently converted my db from ASCII encoding to UTF8
(we have a lot of spanish-speaking members, and need the extra character
support). Everything was working great, but I noticed this error, while
trying to COPY one of our tables from a csv.
ERROR: invalid byte sequence for encoding "UTF8": 0xb9
SQL state: 22021
Hint: This error can also happen if the byte sequence does not match the
encoding expected by the server, which is controlled by "client_encoding".
Context: COPY transaction_import, line 59358
Is there anything I can do to look for these sorts of errors and fix
them? I'm really not great with character encodings and I'm not sure
where to go on this. Any help?
--
Tom Hart
IT Specialist
Cooperative Federal
723 Westcott St.
Syracuse, NY 13210
(315) 471-1116 ext. 202
(315) 476-0567 (fax)
From: | Richard Huxton <dev(at)archonet(dot)com> |
---|---|
To: | Tom Hart <tomhart(at)coopfed(dot)org> |
Cc: | Postgres General List <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: utf8 issue |
Date: | 2008-02-26 20:42:54 |
Message-ID: | 47C479CE.5020509@archonet.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-general |
Tom Hart wrote:
> Hello everybody. I recently converted my db from ASCII encoding to UTF8
> (we have a lot of spanish-speaking members, and need the extra character
> support). Everything was working great, but I noticed this error, while
> trying to COPY one of our tables from a csv.
>
> ERROR: invalid byte sequence for encoding "UTF8": 0xb9
Not converted, I fear.
Why not use latin9 rather than utf8 - that should cover everything and
is probably what it's in anyway.
> Is there anything I can do to look for these sorts of errors and fix
> them? I'm really not great with character encodings and I'm not sure
> where to go on this. Any help?
Google a bit for iconv (and postgresql) to get your characters valid.
--
Richard Huxton
Archonet Ltd
From: | Alvaro Herrera <alvherre(at)commandprompt(dot)com> |
---|---|
To: | Tom Hart <tomhart(at)coopfed(dot)org> |
Cc: | Postgres General List <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: utf8 issue |
Date: | 2008-02-26 21:00:28 |
Message-ID: | 20080226210028.GM5763@alvh.no-ip.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-general |
Tom Hart wrote:
> Hello everybody. I recently converted my db from ASCII encoding to UTF8
> (we have a lot of spanish-speaking members, and need the extra character
> support). Everything was working great, but I noticed this error, while
> trying to COPY one of our tables from a csv.
>
> ERROR: invalid byte sequence for encoding "UTF8": 0xb9
> SQL state: 22021
> Hint: This error can also happen if the byte sequence does not match the
> encoding expected by the server, which is controlled by
> "client_encoding".
> Context: COPY transaction_import, line 59358
Did you try setting client_encoding beforehand? I think a simple
PGOPTIONS="client_encoding=latin9" pg_restore ...
should suffice.
Now, if you have mixed UTF8 and Latin1/Latin9 data, you are hosed
(meaning you'll have to wade through the stuff and figure out what data
is in what encoding, and fix it).
--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.
From: | Tom Hart <tomhart(at)coopfed(dot)org> |
---|---|
To: | Richard Huxton <dev(at)archonet(dot)com> |
Cc: | Postgres General List <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: utf8 issue |
Date: | 2008-02-26 21:04:05 |
Message-ID: | 47C47EC5.2030602@coopfed.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-general |
Richard Huxton wrote:
> Tom Hart wrote:
>> Hello everybody. I recently converted my db from ASCII encoding to
>> UTF8 (we have a lot of spanish-speaking members, and need the extra
>> character support). Everything was working great, but I noticed this
>> error, while trying to COPY one of our tables from a csv.
>>
>> ERROR: invalid byte sequence for encoding "UTF8": 0xb9
>
> Not converted, I fear.
>
> Why not use latin9 rather than utf8 - that should cover everything and
> is probably what it's in anyway.
I was specifically requested to use UTF-8, and the work's already been
done, so converting again is not my favorite option.
>
>> Is there anything I can do to look for these sorts of errors and fix
>> them? I'm really not great with character encodings and I'm not sure
>> where to go on this. Any help?
>
> Google a bit for iconv (and postgresql) to get your characters valid.
>
I already have a php script that does some data scrubbing before the
copy. I added this line to the script and things seem to be working
better now
$line = iconv("ISO-8859-1", "UTF-8", $line);
Thanks for the help guys :-)
--
Tom Hart
IT Specialist
Cooperative Federal
723 Westcott St.
Syracuse, NY 13210
(315) 471-1116 ext. 202
(315) 476-0567 (fax)
From: | "Dean Gibson (DB Administrator)" <postgresql(at)ultimeth(dot)com> |
---|---|
To: | pgsql-general <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: utf8 issue |
Date: | 2008-02-26 21:55:56 |
Message-ID: | 47C48AEC.5020809@ultimeth.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-general |
On 2008-02-26 13:04, Tom Hart wrote:
>>
> I already have a php script that does some data scrubbing before the
> copy. I added this line to the script and things seem to be working
> better now
>
> $line = iconv("ISO-8859-1", "UTF-8", $line);
>
> Thanks for the help guys :-)
>
Read up on the difference between PostgreSQL's server_encoding and
client_encoding.
The "server_encoding" is how the data is stored in the server, and can
be anything compatible (UTF-8, ISO-8859-1, whatever will hold your
character set).
The "client_encoding" is how the incoming (or outgoing) data is
treated/assumed. PostgreSQL does the necessary conversion for you.
You can set/change the "client_encoding" in so many ways, it gives you
total flexibility, in order of increasing priority:
1. You can set it as the default for any database (see ALTER DATABASE ...).
2. You can set it in an environment variable, which means the client
utilities (and I believe the libraries) use that.
3. In PSQL, you can set it with the "\encoding" statement (which applies
to the session or until changed), or the "SET [SESSION | LOCAL ]
client_encoding TO ...", which will set it for the session or just the
current transaction.
I just went through this, and while I initially used "iconv" to get up
and running, I've removed most of those in my scripts and just use the
PostgreSQL conversion instead.
--
Mail to my list address MUST be sent via the mailing list.
All other mail to my list address will bounce.