Proof of concept COLLATE support with patch

From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Proof of concept COLLATE support with patch
Date: 2005-09-02 13:04:21
Message-ID: 20050902130420.GA15466@svana.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Supports any glibc platform and possibly Win32.

Adds:
SELECT ... ORDER BY expr COLLATE 'locale'
CREATE INDEX locale_index ON table(expr COLLATE 'locale')
Index scan used when COLLATE order permits

This is just a proof of concept patch. I didn't send it to -patches
because as Tom pointed out, there's no hope of it getting in due to
platform dependant behaviour.

This patch does not use setlocale and is completely orthoganal to any
locale support already in the backend.

As it turns out, meaningful locale support only needs a handful of
support functions to work. These are listed at the bottom. My patch
only uses the first two, but the third will be needed at some stage.
The use of the last one depends on how the backend ends up support
locales. Both glibc and wine32 have locale sensetive versions of many
functions including:

toupper_l, tolower_l, strfmon_l, strtoul_l, strtof_l, strftime_l, is*_l

A windows function list is at:
http://msdn2.microsoft.com/library/wyzd2bce(en-us,vs.80).aspx

Patch available here:
http://svana.org/kleptog/pgsql/collate1.patch

Implementation notes follow and table of functions is at the bottom.

I hope this helps whenever someone gets around to full COLLATE support.

Have a nice day,

Notes:
* It works by replacing (expr COLLATE 'locale') with
pg_strxfrm(expr, pg_findlocale(locale))
in the parsetree.

pg_findlocale returns an opaque pointer to the locale. It is
STRICT IMMUTABLE and is optimised away in the final query.

pg_strxfrm takes the string and the locale and returns a bytea.
bytea comparison uses memcmp so is safe from other locale effects
in the backend.

* Use of COLLATE for an index will probably double the diskspace
required for that index due to the strxfrm.

* I had to add the functions to pg_proc.h because CREATE FUNCTION
couldn't find them. So they have OIDs I made up. You may need to
initdb, I'm not sure.

You can compile pg_xlocale.c as an shared object and load them
that way too if you want to avoid the initdb.

* Internally they are defined as taking and returning "internal".
CREATE FUNCTION doesn't like that so specify opaque or oid
instead. The declarations are:

create function pg_findlocale(text) returns oid as 'pg_findlocale' language internal strict immutable;
create function pg_strxfrm(text,oid) returns bytea as 'pg_strxfrm' language internal strict immutable;

* The clause ORDER BY 1 COLLATE 'en_AU' breaks, it treats the 1 like
a constant. I couldn't quickly work out how to reference the
columns the right way. Long term that code should be in the
sorting code anyway.

* The locale needs to be in quotes, otherwise the parser converts it
to lower-case. Locale names are case-sensetive on many systems.

* There is a text function strcoll_l for testing collation:

create function pg_strcoll_l(text,text,text) returns int4 as 'pg_strcoll_l' language internal strict immutable;

* Yes this is the easy way out, implementing the inheritence of the
COLLATE attribute will be much more invasive. This gives most
people what they want though.

* Although these functions are documented on Windows, they are not
for glibc, so it is an unstable insterface.

Function Needed glibc Win32
---------------------------------------------------------------------
Function returing opaque newlocale _create_locale
pointer to locale data

strxfrm with locale parameter strxfrm_l _strxfrm_l

Method finding encoding for nl_langinfo_l ???
locale

strcoll with locale parameter strcoll_l _strcoll_l

--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Merlin Moncure 2005-09-02 14:02:36 Re: Call for 7.5 feature completion
Previous Message Koichi Suzuki 2005-09-02 10:36:42 Re: A couple of patches for PostgreSQL 64bit support