Lists: | pgsql-general |
---|
From: | "Vyacheslav Kalinin" <vka(at)mgcp(dot)com> |
---|---|
To: | PGSQL <pgsql-general(at)postgresql(dot)org> |
Subject: | Regular expression |
Date: | 2008-04-26 19:48:06 |
Message-ID: | 9b1af80e0804261248g68e3993cx6a1d2f9174fd73ed@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-general |
Hello,
Case insensitive pattern matching gives strange results for non-ascii
character (such as UTF-8 encoded cyrillic letters):
test=# select 'б' ~* 'Б' ;
?column?
----------
f
(1 row)
( 'б' and 'Б' are lower and upper case variants of cyrillic 'B')
at the same time:
test=# select 'б' ilike 'Б' ;
?column?
----------
t
(1 row)
(PG 8.3 on Linux, UTF-8 locale)
Also, what could be the reason for that cyrillic letters are not treated by
regexp engine as the part of [:alpha:], [:alnum:], \w etc. classes? Or they
never meant to be?
From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | "Vyacheslav Kalinin" <vka(at)mgcp(dot)com> |
Cc: | PGSQL <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: Regular expression |
Date: | 2008-04-26 20:02:56 |
Message-ID: | 21081.1209240176@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-general |
"Vyacheslav Kalinin" <vka(at)mgcp(dot)com> writes:
> Case insensitive pattern matching gives strange results for non-ascii
> character (such as UTF-8 encoded cyrillic letters):
Yeah, the regex locale support doesn't work well in multibyte character
sets --- it basically will not recognize that non-ASCII characters have
any case variants. Fixing this has been on the TODO list for awhile ...
regards, tom lane