Re: BUG #2261: ILIKE seems to be buggy on koi8 input

Lists: pgsql-bugs
From: "Evgeny Gridasov" <eugrid(at)fpm(dot)kubsu(dot)ru>
To: pgsql-bugs(at)postgresql(dot)org
Subject: BUG #2261: ILIKE seems to be buggy on koi8 input
Date: 2006-02-14 17:39:46
Message-ID: 20060214173946.3661AF0B05@svr2.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs


The following bug has been logged online:

Bug reference: 2261
Logged by: Evgeny Gridasov
Email address: eugrid(at)fpm(dot)kubsu(dot)ru
PostgreSQL version: 8.1.2
Operating system: Debian Linux
Description: ILIKE seems to be buggy on koi8 input
Details:

my terminal is RU_ru.KOI8-R,
template1's encoding is UTF8.
ILIKE seems to be buggy when comparing russian strings,
while UPPER/LOWER works OK.

template1=# \encoding koi8;

try to get uppercase of some russian letters:
template1=# select upper('');
upper
-------

(1 row)

result is OK!

next, try to compare uppercase and lowercase using
ILIKE:
template1=# select true where '' ilike '';
bool
------
(0 rows)

OOPS! Nothing happened. But why?

try the same but with latin charset letters:

template1=# select true where 'asdf' ilike 'ASDF';
bool
------
t
(1 row)

Try to compare lowercase with lowercase (russian):

template1=# select true where '' ilike '';
bool
------
t
(1 row)

it works.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Evgeny Gridasov" <eugrid(at)fpm(dot)kubsu(dot)ru>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #2261: ILIKE seems to be buggy on koi8 input
Date: 2006-02-15 17:44:18
Message-ID: 14626.1140025458@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

"Evgeny Gridasov" <eugrid(at)fpm(dot)kubsu(dot)ru> writes:
> my terminal is RU_ru.KOI8-R,
> template1's encoding is UTF8.
> ILIKE seems to be buggy when comparing russian strings,
> while UPPER/LOWER works OK.

I'll bet that the database's locale setting is expecting some encoding
other than UTF8 :-(. You need to have compatible locale and encoding
settings inside the database. You didn't say exactly what the database
LC_COLLATE value is, but if it's RU_ru.KOI8-R, that definitely does not
match UTF8.

regards, tom lane


From: Evgeny Gridasov <eugrid(at)fpm(dot)kubsu(dot)ru>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #2261: ILIKE seems to be buggy on koi8 input
Date: 2006-02-20 18:07:46
Message-ID: 20060220210746.750ff85b.eugrid@fpm.kubsu.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

postgresql server starts with environment:

LC_COLLATE=en_US.UTF-8
LC_ALL=en_US.UTF-8
LANG=en_US.UTF-8

I've tried to set different LC_COLLATE/LC_ALL/LANG settings
but it did not help.

I've tried to change my psql input to unicode russian, but it did not help, too.

'show all' says I've got lc_collate and other lc_* set to en_US.UTF-8.
initdb was run with this locale.
It cannot be modified setting it in postgresql.conf (creation db constant?)
Should I reinit database to get this working or what?
If I should reinit db, what locale should I choose?

BTW, ~* syntax does not also work with upper/lower case russian letters,
while upper()/lower() still work ok.

On Wed, 15 Feb 2006 12:44:18 -0500
Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> "Evgeny Gridasov" <eugrid(at)fpm(dot)kubsu(dot)ru> writes:
> > my terminal is RU_ru.KOI8-R,
> > template1's encoding is UTF8.
> > ILIKE seems to be buggy when comparing russian strings,
> > while UPPER/LOWER works OK.
>
> I'll bet that the database's locale setting is expecting some encoding
> other than UTF8 :-(. You need to have compatible locale and encoding
> settings inside the database. You didn't say exactly what the database
> LC_COLLATE value is, but if it's RU_ru.KOI8-R, that definitely does not
> match UTF8.
>
> regards, tom lane

--
Evgeny Gridasov
Software Engineer
I-Free, Russia


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Evgeny Gridasov <eugrid(at)fpm(dot)kubsu(dot)ru>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #2261: ILIKE seems to be buggy on koi8 input
Date: 2006-02-20 22:05:59
Message-ID: 28946.1140473159@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Evgeny Gridasov <eugrid(at)fpm(dot)kubsu(dot)ru> writes:
> postgresql server starts with environment:
> LC_COLLATE=en_US.UTF-8
> LC_ALL=en_US.UTF-8
> LANG=en_US.UTF-8

Well, that setting shouldn't translate much except A-Z/a-z. If you want
cyrillic upper/lower case conversions you need database's LC_CTYPE to be
ru_RU.something.

regards, tom lane


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: pgsql-bugs(at)postgresql(dot)org
Cc: Evgeny Gridasov <eugrid(at)fpm(dot)kubsu(dot)ru>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: BUG #2261: ILIKE seems to be buggy on koi8 input
Date: 2006-02-21 12:54:58
Message-ID: 200602211354.58896.peter_e@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Evgeny Gridasov wrote:
> It cannot be modified setting it in postgresql.conf (creation db
> constant?) Should I reinit database to get this working or what?

Yes.

> If I should reinit db, what locale should I choose?

Something like ru_RU.utf8.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/