Re: WIP patch: Collation support

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Martijn van Oosterhout <kleptog(at)svana(dot)org>
Cc: Radek Strnad <radek(dot)strnad(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: WIP patch: Collation support
Date: 2008-09-16 13:30:15
Message-ID: 48CFB4E7.3030301@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Martijn van Oosterhout wrote:
> On Wed, Sep 10, 2008 at 12:51:02PM +0300, Heikki Linnakangas wrote:
>>> Since the set of collations isn't exactly denumerable, we need some way
>>> to allow the user to specify the collation they want. The only
>>> collation PostgreSQL knows about is the C collation. Anything else is
>>> user-defined.
>> Let's just use the name of the OS locale, like we do now. Having a
>> pg_collation catalog just moves the problem elsewhere: we'd still need
>> something in pg_collation to tie the collation to the OS locale.
>
> There's not a one-to-one mapping between collation and locale name. A
> locale name includes information about the charset and a collation may
> have paramters like case-sensetivity and pad-attribute which are not
> present in the locale name. You need a mapping anyway, which is what
> this table is for.

Ideally, we would delegate the case-sensitivity and padding to the
collation implementation (ie. OS setlocale() or ICU). That said, I don't
think operating systems normally ship case-insensitive variants of
locales by default, so I agree it would be nice if we could implement
that ourselves. Still, we could identify case-sensitive locale names for
example by a suffix, like "en_GB.UTF8.case-insensitive".

I agree we will eventually need a way to give shorthand names for
collations, and a pg_collation catalog will then come handy. But that
can wait until we have the basic infrastructure ready to support column
and query-level collation.

>>> But that put us back where we started: every database having the same
>>> collation. We're trying to move away from that. Just reindex everything
>>> and be done with it.
>> That's easier said than done, unfortunately.
>
> I don't see an alternative.

Well, I proposed disallowing using a different collation than the source
database, except for using template0 as the source. That's pretty
limited, but is trivial to implement and still let's you have databases
with different collations in the same cluster.

I worked a bit on Radek's patch, stripping out all the pg_collate and
pg_charset catalog changes and commands, leaving just the core
functionality of database-level collations. It needs some cleanup and
documentation, but something like this I'd like to commit in this commit
fest. The new catalogs can wait until we have a real need for them.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Attachment Content-Type Size
wip-collation-nocatalogs-1.patch text/x-diff 27.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2008-09-16 13:40:25 stored procedure obfuscation - proposal
Previous Message Ibrar Ahmed 2008-09-16 12:27:43 Re: [Review] fix dblink security hole