Re: UTF8 encoding problem

Lists: pgsql-general
From: Garry Saddington <garry(at)schoolteachers(dot)co(dot)uk>
To: pgsql-general(at)postgresql(dot)org
Subject: UTF8 encoding problem
Date: 2008-06-17 21:48:34
Message-ID: 200806172248.34142.garry@schoolteachers.co.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

I am getting illegal UTF8 encoding errors and I have traced it to the £ sign.
I have set lc_monetary to "lc_monetary = 'en_GB.UTF-8'" in postgresql.conf but
this has no effect. How can I sort this problem? Client_encoding =UTF8.
Regards
Garry


From: Michael Fuhr <mike(at)fuhr(dot)org>
To: Garry Saddington <garry(at)schoolteachers(dot)co(dot)uk>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: UTF8 encoding problem
Date: 2008-06-18 01:04:10
Message-ID: 20080618010409.GA30622@winnie.fuhr.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Tue, Jun 17, 2008 at 10:48:34PM +0100, Garry Saddington wrote:
> I am getting illegal UTF8 encoding errors and I have traced it to the £ sign.

What's the exact error message?

> I have set lc_monetary to "lc_monetary = 'en_GB.UTF-8'" in postgresql.conf but
> this has no effect. How can I sort this problem? Client_encoding =UTF8.

Is the data UTF-8? If the error is 'invalid byte sequence for encoding
"UTF8": 0xa3' then you probably need to set client_encoding to latin1,
latin9, or win1252.

--
Michael Fuhr


From: Giorgio Valoti <giorgio_v(at)mac(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: UTF8 encoding problem
Date: 2008-06-18 06:25:07
Message-ID: BD241A58-9AA3-4255-AE32-43EA665EAB0B@mac.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general


On 18/giu/08, at 03:04, Michael Fuhr wrote:

> On Tue, Jun 17, 2008 at 10:48:34PM +0100, Garry Saddington wrote:
>> I am getting illegal UTF8 encoding errors and I have traced it to
>> the £ sign.
>
> What's the exact error message?
>
>> I have set lc_monetary to "lc_monetary = 'en_GB.UTF-8'" in
>> postgresql.conf but
>> this has no effect. How can I sort this problem? Client_encoding
>> =UTF8.
>
> Is the data UTF-8? If the error is 'invalid byte sequence for
> encoding
> "UTF8": 0xa3' then you probably need to set client_encoding to latin1,
> latin9, or win1252.

Why?

--
Giorgio Valoti


From: Garry Saddington <garry(at)schoolteachers(dot)co(dot)uk>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: UTF8 encoding problem
Date: 2008-06-18 06:30:41
Message-ID: 200806180730.41355.garry@schoolteachers.co.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Wednesday 18 June 2008 02:04, Michael Fuhr wrote:
> On Tue, Jun 17, 2008 at 10:48:34PM +0100, Garry Saddington wrote:
> > I am getting illegal UTF8 encoding errors and I have traced it to the £
> > sign.
>
> What's the exact error message?
>
> > I have set lc_monetary to "lc_monetary = 'en_GB.UTF-8'" in
> > postgresql.conf but this has no effect. How can I sort this problem?
> > Client_encoding =UTF8.
>
> Is the data UTF-8? If the error is 'invalid byte sequence for encoding
> "UTF8": 0xa3' then you probably need to set client_encoding to latin1,
> latin9, or win1252.
>
Thanks, that's fixed it.
Garry


From: Michael Fuhr <mike(at)fuhr(dot)org>
To: Giorgio Valoti <giorgio_v(at)mac(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: UTF8 encoding problem
Date: 2008-06-18 13:00:35
Message-ID: 20080618130034.GA17837@winnie.fuhr.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Wed, Jun 18, 2008 at 08:25:07AM +0200, Giorgio Valoti wrote:
> On 18/giu/08, at 03:04, Michael Fuhr wrote:
> > Is the data UTF-8? If the error is 'invalid byte sequence for
> > encoding "UTF8": 0xa3' then you probably need to set client_encoding
> > to latin1, latin9, or win1252.
>
> Why?

UTF-8 has rules about what byte values can occur in sequence;
violations of those rules mean that the data isn't valid UTF-8.
This particular error says that the database received a byte with
the value 0xa3 (163) in a sequence of bytes that wasn't valid UTF-8.

The UTF-8 byte sequence for the pound sign (£) is 0xc2 0xa3. If
Garry got this error (I don't know if he did; I was asking) then
the byte 0xa3 must have appeared in some other sequence that wasn't
valid UTF-8. The usual reason for that is that the data is in some
encoding other than UTF-8.

Common encodings for Western European languages are Latin-1
(ISO-8859-1), Latin-9 (ISO-8859-15), and Windows-1252. All three
of these encodings use a lone 0xa3 to represent the pound sign. If
the data has a pound sign as 0xa3 and the database complains that
it isn't part of a valid UTF-8 sequence then the data is likely to
be in one of these other encodings.

--
Michael Fuhr


From: Garry Saddington <garry(at)schoolteachers(dot)co(dot)uk>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: UTF8 encoding problem
Date: 2008-06-18 15:53:15
Message-ID: 200806181653.15484.garry@schoolteachers.co.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Wednesday 18 June 2008 14:00, Michael Fuhr wrote:
> On Wed, Jun 18, 2008 at 08:25:07AM +0200, Giorgio Valoti wrote:
> > On 18/giu/08, at 03:04, Michael Fuhr wrote:
> > > Is the data UTF-8? If the error is 'invalid byte sequence for
> > > encoding "UTF8": 0xa3' then you probably need to set client_encoding
> > > to latin1, latin9, or win1252.
> >
> > Why?
>
> UTF-8 has rules about what byte values can occur in sequence;
> violations of those rules mean that the data isn't valid UTF-8.
> This particular error says that the database received a byte with
> the value 0xa3 (163) in a sequence of bytes that wasn't valid UTF-8.
>
> The UTF-8 byte sequence for the pound sign (£) is 0xc2 0xa3. If
> Garry got this error (I don't know if he did; I was asking) then
> the byte 0xa3 must have appeared in some other sequence that wasn't
> valid UTF-8. The usual reason for that is that the data is in some
> encoding other than UTF-8.
>
> Common encodings for Western European languages are Latin-1
> (ISO-8859-1), Latin-9 (ISO-8859-15), and Windows-1252. All three
> of these encodings use a lone 0xa3 to represent the pound sign. If
> the data has a pound sign as 0xa3 and the database complains that
> it isn't part of a valid UTF-8 sequence then the data is likely to
> be in one of these other encodings.
>
Thanks, I have traced it to a client_encoding problem and set it to latin1
which has cured the problem.
regards
garry


From: Giorgio Valoti <giorgio_v(at)mac(dot)com>
To: Michael Fuhr <mike(at)fuhr(dot)org>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: UTF8 encoding problem
Date: 2008-06-19 05:56:22
Message-ID: B03E7678-25C9-4D9D-8805-42F59A88E515@mac.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general


On 18/giu/08, at 15:00, Michael Fuhr wrote:

> On Wed, Jun 18, 2008 at 08:25:07AM +0200, Giorgio Valoti wrote:
>> On 18/giu/08, at 03:04, Michael Fuhr wrote:
>>> Is the data UTF-8? If the error is 'invalid byte sequence for
>>> encoding "UTF8": 0xa3' then you probably need to set client_encoding
>>> to latin1, latin9, or win1252.
>>
>> Why?
>
> UTF-8 has rules about what byte values can occur in sequence;
> violations of those rules mean that the data isn't valid UTF-8.
> This particular error says that the database received a byte with
> the value 0xa3 (163) in a sequence of bytes that wasn't valid UTF-8.
>
> The UTF-8 byte sequence for the pound sign (£) is 0xc2 0xa3. If
> Garry got this error (I don't know if he did; I was asking) then
> the byte 0xa3 must have appeared in some other sequence that wasn't
> valid UTF-8. The usual reason for that is that the data is in some
> encoding other than UTF-8.
>
> Common encodings for Western European languages are Latin-1
> (ISO-8859-1), Latin-9 (ISO-8859-15), and Windows-1252. All three
> of these encodings use a lone 0xa3 to represent the pound sign. If
> the data has a pound sign as 0xa3 and the database complains that
> it isn't part of a valid UTF-8 sequence then the data is likely to
> be in one of these other encodings.

Much clearer now, thank you Michael.

--
Giorgio Valoti