Collation at database level

Lists: pgsql-hackers
From: Radek Strnad <radek(dot)strnad(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Collation at database level
Date: 2008-04-16 12:36:03
Message-ID: 1208349363.5694.58.camel@random-laptop
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

I'm working on the bachelor thesis. The goal of the work will be to
implement collation at database level based on POSIX locales and make
foundations for further national language support development. User will
be able to set collation when creating database or change collation of
existing one. Particulary commands CREATE DATABASE... COLLATE … and
ALTER DATABASE … COLLATE … regarding ANSI standard.
Work will also implement possibility of creating users's own collation
collection – commands CREATE COLLATION … FROM … USING and DROP COLLATION
regaring ANSI standard. Additional features like ascending, descending
ordering and key sensitivity will be included (these are not in ANSI
standard).
The initial part of my work has been completed and submitted as part of
a patch contributed by Alexey Slynko
(http://www.activebait.net/msg00019.html). I'm now in stage of adding
collation catalogs, that will be important for further multi language
support. The problem with POSIX locales is that you never know what
locales user have got installed. I've discovered that some linux distros
don't even have other than UTF-8 based locales. Because of ANSI defines
collations deffined by ISO-8859-1 and UTF-* we need to somehow implement
these collations. From my point of view, to create a catalog will be
extremely slow so I'm thinking of writing two function for collation
that will use both system locales as well as some hard-coded collations.
Any sugestions?

Radek Strnad


From: Gregory Stark <stark(at)enterprisedb(dot)com>
To: "Radek Strnad" <radek(dot)strnad(at)gmail(dot)com>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Collation at database level
Date: 2008-04-16 12:48:47
Message-ID: 87zlruvuc0.fsf@oxford.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Radek Strnad" <radek(dot)strnad(at)gmail(dot)com> writes:

> The problem with POSIX locales is that you never know what
> locales user have got installed. I've discovered that some linux distros
> don't even have other than UTF-8 based locales.

On Debian you're even deeper in it. The user can configure which locales he's
actually interested in having on a machine. They're listed in /etc/locale.gen
but I wouldn't suggest looking there. I think you have to try switching
locales and see if setlocale returns NULL.

> Because of ANSI defines collations deffined by ISO-8859-1 and UTF-* we need
> to somehow implement these collations.

These are encodings. What ANSI spec are you referring to, SQL? What does it
actually say?

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's 24x7 Postgres support!