Re: Per-column collation, proof of concept

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Per-column collation, proof of concept
Date: 2010-07-14 17:35:20
Message-ID: AANLkTiloowmvpPEkcrxJ4WhxfJterBwS8wETHQjTo6it@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello

I have only one question - If I understand well you can use collate
just for sort. What is your plan for range search operation? Sort is
interesting and I am sure important for multilangual applications, for
me - more important is case sensitive, case insensitive, accent
sensitive, insensitive filtering - do you have a plan for it?

Regards

Pavel Stehule

2010/7/13 Peter Eisentraut <peter_e(at)gmx(dot)net>:
> Here is a proof of concept for per-column collation support.
>
> Here is how it works: When creating a table, an optional COLLATE clause
> can specify a collation name, which is stored (by OID) in pg_attribute.
> This becomes part of the type information and is propagated through the
> expression parse analysis, like typmod.  When an operator or function
> call is parsed (transformed), the collations of the arguments are
> unified, using some rules (like type analysis, but different in detail).
> The collations of the function/operator arguments come either from Var
> nodes which in turn got them from pg_attribute, or from other
> function and operator calls, or you can override them with explicit
> COLLATE clauses (not yet implemented, but will work a bit like
> RelabelType).  At the end, each function or operator call gets one
> collation to use.
>

what about DISTINCT clause, maybe GROUP BY clause ?

regards

Pavel

> The function call itself can then look up the collation using the
> fcinfo->flinfo->fn_expr field.  (Works for operator calls, but doesn't
> work for sort operations, needs more thought.)
>
> A collation is in this implementation defined as an lc_collate string
> and an lc_ctype string.  The implementation of functions interested in
> that information, such as comparison operators, or upper and lower
> functions, will take the collation OID that is passed in, look up the
> locale string, and use the xlocale.h interface (newlocale(),
> strcoll_l()) to compute the result.
>
> (Note that the xlocale stuff is only 10 or so lines in this patch.  It
> should be feasible to allow other appropriate locale libraries to be
> used.)
>
> Loose ends:
>
> - Support function calls (currently only operator calls) (easy)
>
> - Implementation of sort clauses
>
> - Indexing support/integration
>
> - Domain support (should be straightforward)
>
> - Make all expression node types deal with collation information
>  appropriately
>
> - Explicit COLLATE clause on expressions
>
> - Caching and not leaking memory of locale lookups
>
> - I have typcollatable to mark which types can accept collation
>  information, but perhaps there should also be proicareaboutcollation
>  to skip collation resolution when none of the functions in the
>  expression tree care.
>
> You can start by reading the collate.sql regression test file to see
> what it can do.  Btw., regression tests only work with "make check
> MULTIBYTE=UTF8".  And it (probably) only works with glibc for now.
>
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2010-07-14 17:37:57 Re: Branch created, let the experiment begin ...
Previous Message Joshua D. Drake 2010-07-14 17:25:56 Re: Branch created, let the experiment begin ...