UNICODE string collating, case insensitive matching

From: "Cestmir Hybl Jr(dot)" <cestmir(at)nustep(dot)net>
To: <pgsql-general(at)postgresql(dot)org>
Subject: UNICODE string collating, case insensitive matching
Date: 2003-03-04 19:47:19
Message-ID: 020d01c2e286$de2336e0$0200a8c0@stratos
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hello,

(1) I have a question about multibyte support in PostgreSQL:

Why does collating, character case operations (Upper, Lower, ILIKE) in Postgres use libc locales instead of UNICODE specification when using UTF-8 database encoding. This is useless in real multilingual environment, when strings in multiple languages are stored in the same database. Those strings are NOT treatable by single locale.

There are several UNICODE technical standards, relevant to this:
http://www.unicode.org/reports/tr10/ - Unicode Collation Algorithm
http://www.unicode.org/reports/tr21/ - Case Mappings

(2) Is there someone, who has pgsql database cluster with UTF-8 encoding, *.UTF-8 locale and Upper, Lower, ILIKE functions working properly?

I have compiled sk_SK.UTF-8 locale and string collating works fine (/select ... order by some_field/ query returns properly collated dataset), but (/select Upper(some_field), Lower(some_field)/, and /select ... where some_field ILIKE '%...some non-ASCII text...%'/ does not work.

All of this works fine in sk_SK.ISO-8859-2 locale.

Cestmir Hybl

Browse pgsql-general by date

  From Date Subject
Next Message Andrew Sullivan 2003-03-04 19:54:45 Re: pg_ctl -m fast failing?
Previous Message Dennis Gearon 2003-03-04 19:40:42 triggers NOT showing up in phpPgAdmin