Per-column collation, proof of concept

From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Per-column collation, proof of concept
Date: 2010-07-13 18:25:31
Message-ID: 1279045531.32647.14.camel@vanquo.pezone.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Here is a proof of concept for per-column collation support.

Here is how it works: When creating a table, an optional COLLATE clause
can specify a collation name, which is stored (by OID) in pg_attribute.
This becomes part of the type information and is propagated through the
expression parse analysis, like typmod. When an operator or function
call is parsed (transformed), the collations of the arguments are
unified, using some rules (like type analysis, but different in detail).
The collations of the function/operator arguments come either from Var
nodes which in turn got them from pg_attribute, or from other
function and operator calls, or you can override them with explicit
COLLATE clauses (not yet implemented, but will work a bit like
RelabelType). At the end, each function or operator call gets one
collation to use.

The function call itself can then look up the collation using the
fcinfo->flinfo->fn_expr field. (Works for operator calls, but doesn't
work for sort operations, needs more thought.)

A collation is in this implementation defined as an lc_collate string
and an lc_ctype string. The implementation of functions interested in
that information, such as comparison operators, or upper and lower
functions, will take the collation OID that is passed in, look up the
locale string, and use the xlocale.h interface (newlocale(),
strcoll_l()) to compute the result.

(Note that the xlocale stuff is only 10 or so lines in this patch. It
should be feasible to allow other appropriate locale libraries to be
used.)

Loose ends:

- Support function calls (currently only operator calls) (easy)

- Implementation of sort clauses

- Indexing support/integration

- Domain support (should be straightforward)

- Make all expression node types deal with collation information
appropriately

- Explicit COLLATE clause on expressions

- Caching and not leaking memory of locale lookups

- I have typcollatable to mark which types can accept collation
information, but perhaps there should also be proicareaboutcollation
to skip collation resolution when none of the functions in the
expression tree care.

You can start by reading the collate.sql regression test file to see
what it can do. Btw., regression tests only work with "make check
MULTIBYTE=UTF8". And it (probably) only works with glibc for now.

Attachment Content-Type Size
collate.patch text/x-patch 130.4 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2010-07-13 18:36:32 Re: ERROR: argument to pg_get_expr() must come from system catalogs
Previous Message Tom Lane 2010-07-13 18:14:25 Re: explain.c: why trace PlanState and Plan trees separately?