Re: The server's LC_CTYPE locale

Lists: pgsql-general
From: Michael Ben-Nes <miki(at)canaan(dot)co(dot)il>
To: postgresql <pgsql-general(at)postgresql(dot)org>
Subject: The server's LC_CTYPE locale
Date: 2006-05-28 18:00:33
Message-ID: 4479E541.9070206@canaan.co.il
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Hello

Im got the following error when the query string was one of the Hebrew
chars:

SELECT upper('ש');
ERROR: invalid multibyte character for locale
HINT: The server's LC_CTYPE locale is probably incompatible with the
database encoding.

after few minutes while gathering info i stoped getting the previous
error and started to get:

#SELECT lower('ש');
ERROR: invalid UTF-8 byte sequence detected near byte 0xf9

# SELECT upper('ש');
ERROR: invalid UTF-8 byte sequence detected near byte 0xf9

#SELECT version();
PostgreSQL 8.1.3 on i486-pc-linux-gnu, compiled by GCC cc (GCC) 4.0.3
(Debian 4.0.3-1)

#show lc_ctype ;
he_IL.utf8

#SHOW SERVER_ENCODING;
UTF8

Any ideas what the problem ?

--

--------------------------------------------------
Michael Ben-Nes - Internet Consultant and Director.
http://www.epoch.co.il - weaving the Net.
Cellular: 054-4848113
--------------------------------------------------


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Michael Ben-Nes <miki(at)canaan(dot)co(dot)il>
Cc: postgresql <pgsql-general(at)postgresql(dot)org>
Subject: Re: The server's LC_CTYPE locale
Date: 2006-05-28 22:04:31
Message-ID: 381.1148853871@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Michael Ben-Nes <miki(at)canaan(dot)co(dot)il> writes:
> Im got the following error when the query string was one of the Hebrew
> chars:

> SELECT upper('');
> ERROR: invalid multibyte character for locale
> HINT: The server's LC_CTYPE locale is probably incompatible with the
> database encoding.

Hmph. I can't reproduce that here (using Fedora 4's version of he_IL.utf8
anyway). I assume your client_encoding was also UTF8? The troublesome
character came through in your email as \327\251 (D7 A9) ... is that
what you were actually entering? The reference to F9 in the other error
message makes me think the character got munged somewhere in the email
chain ...

regards, tom lane


From: Michael Ben-Nes <miki(at)canaan(dot)co(dot)il>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, postgresql <pgsql-general(at)postgresql(dot)org>
Subject: Re: The server's LC_CTYPE locale
Date: 2006-05-29 11:15:11
Message-ID: 447AD7BF.8010100@canaan.co.il
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Tom Lane wrote:

> Michael Ben-Nes <miki(at)canaan(dot)co(dot)il> writes:
>
>> Im got the following error when the query string was one of the Hebrew
>> chars:
>>
>
>
>> SELECT upper('׳©');
>> ERROR: invalid multibyte character for locale
>> HINT: The server's LC_CTYPE locale is probably incompatible with the
>> database encoding.
>>
>
> Hmph. I can't reproduce that here (using Fedora 4's version of he_IL.utf8
> anyway). I assume your client_encoding was also UTF8? The troublesome
> character came through in your email as \327\251 (D7 A9) ... is that
> what you were actually entering? The reference to F9 in the other error
> message makes me think the character got munged somewhere in the email
> chain ...
>
the Client Encoding is UTF8.

Strangely I no longer get the second error:
ERROR: invalid UTF-8 byte sequence detected near byte 0xf9

The first error returned:
# SELECT lower('ש');
ERROR: invalid multibyte character for locale
HINT: The server's LC_CTYPE locale is probably incompatible with the
database encoding.

The character that I sent is:
[ש‎] U+05E9 &#1513; HEBREW LETTER SHIN

Im out of ideas, What else I should check ?
> regards, tom lane
>

--

--------------------------------------------------
Michael Ben-Nes - Internet Consultant and Director.
http://www.epoch.co.il - weaving the Net.
Cellular: 054-4848113
--------------------------------------------------


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Michael Ben-Nes <miki(at)canaan(dot)co(dot)il>
Cc: postgresql <pgsql-general(at)postgresql(dot)org>
Subject: Re: The server's LC_CTYPE locale
Date: 2006-05-29 14:20:33
Message-ID: 13195.1148912433@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Michael Ben-Nes <miki(at)canaan(dot)co(dot)il> writes:
> The character that I sent is:
> [] U+05E9 &#1513; HEBREW LETTER SHIN

Well, that does work out to D7 A9 in UTF8, if I'm doing the arithmetic
correctly.

I can't replicate any problem in either 8.1.4 or HEAD. It's possible
that this is a bug that's been fixed since 8.1.3, but I don't recall
any change in that area. I think more likely the difference is between
the he_IL.utf8 locale definitions in Fedora 4 and Debian. Perhaps you
should check for available updates to the locale.

regards, tom lane


From: Michael Ben-Nes <miki(at)canaan(dot)co(dot)il>
To: postgresql <pgsql-general(at)postgresql(dot)org>
Subject: Re: The server's LC_CTYPE locale
Date: 2006-09-05 11:56:21
Message-ID: 44FD65E5.1090403@canaan.co.il
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

For the record:

Those are the records in my locale.gen

# cat /etc/locale.gen.old
en_US ISO-8859-1
he_IL UTF-8
he_IL ISO-8859-8

I found out that by removing "he_IL ISO-8859-8" i fixed the problem.

Why ? i have no idea ( maybe some collisions because the double he_IL ? ).

Cheers

Michael Ben-Nes wrote:

> Hello
>
>
> Im got the following error when the query string was one of the Hebrew
> chars:
>
>
> SELECT upper('ש');
> ERROR: invalid multibyte character for locale
> HINT: The server's LC_CTYPE locale is probably incompatible with the
> database encoding.
>
>
> after few minutes while gathering info i stoped getting the previous
> error and started to get:
>
>
> #SELECT lower('ש');
> ERROR: invalid UTF-8 byte sequence detected near byte 0xf9
>
> # SELECT upper('ש');
> ERROR: invalid UTF-8 byte sequence detected near byte 0xf9
>
>
> #SELECT version();
> PostgreSQL 8.1.3 on i486-pc-linux-gnu, compiled by GCC cc (GCC) 4.0.3
> (Debian 4.0.3-1)
>
>
> #show lc_ctype ;
> he_IL.utf8
>
>
> #SHOW SERVER_ENCODING;
> UTF8
>
> Any ideas what the problem ?
>
>

--

--------------------------------------------------
Michael Ben-Nes - Internet Consultant and Director.
http://www.epoch.co.il - weaving the Net.
Cellular: 054-4848113
--------------------------------------------------


From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: Michael Ben-Nes <miki(at)canaan(dot)co(dot)il>
Cc: postgresql <pgsql-general(at)postgresql(dot)org>
Subject: Re: The server's LC_CTYPE locale
Date: 2006-09-05 12:20:31
Message-ID: 20060905122031.GG14312@svana.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Tue, Sep 05, 2006 at 02:56:21PM +0300, Michael Ben-Nes wrote:
> For the record:
>
> Those are the records in my locale.gen
>
> # cat /etc/locale.gen.old
> en_US ISO-8859-1
> he_IL UTF-8
> he_IL ISO-8859-8

Yeah, that's wrong. The first column is the identifier, so the last
entry should something like:

he_IL.ISO-8859-8 ISO-8859-8

> Why ? i have no idea ( maybe some collisions because the double he_IL ? ).

You can't do that.

Have a nice day,
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.