Per-column collation, proof of concept

Lists: pgsql-hackers
From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Per-column collation, proof of concept
Date: 2010-07-13 18:25:31
Message-ID: 1279045531.32647.14.camel@vanquo.pezone.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Here is a proof of concept for per-column collation support.

Here is how it works: When creating a table, an optional COLLATE clause
can specify a collation name, which is stored (by OID) in pg_attribute.
This becomes part of the type information and is propagated through the
expression parse analysis, like typmod. When an operator or function
call is parsed (transformed), the collations of the arguments are
unified, using some rules (like type analysis, but different in detail).
The collations of the function/operator arguments come either from Var
nodes which in turn got them from pg_attribute, or from other
function and operator calls, or you can override them with explicit
COLLATE clauses (not yet implemented, but will work a bit like
RelabelType). At the end, each function or operator call gets one
collation to use.

The function call itself can then look up the collation using the
fcinfo->flinfo->fn_expr field. (Works for operator calls, but doesn't
work for sort operations, needs more thought.)

A collation is in this implementation defined as an lc_collate string
and an lc_ctype string. The implementation of functions interested in
that information, such as comparison operators, or upper and lower
functions, will take the collation OID that is passed in, look up the
locale string, and use the xlocale.h interface (newlocale(),
strcoll_l()) to compute the result.

(Note that the xlocale stuff is only 10 or so lines in this patch. It
should be feasible to allow other appropriate locale libraries to be
used.)

Loose ends:

- Support function calls (currently only operator calls) (easy)

- Implementation of sort clauses

- Indexing support/integration

- Domain support (should be straightforward)

- Make all expression node types deal with collation information
appropriately

- Explicit COLLATE clause on expressions

- Caching and not leaking memory of locale lookups

- I have typcollatable to mark which types can accept collation
information, but perhaps there should also be proicareaboutcollation
to skip collation resolution when none of the functions in the
expression tree care.

You can start by reading the collate.sql regression test file to see
what it can do. Btw., regression tests only work with "make check
MULTIBYTE=UTF8". And it (probably) only works with glibc for now.

Attachment Content-Type Size
collate.patch text/x-patch 130.4 KB

From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Peter Eisentraut" <peter_e(at)gmx(dot)net>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Per-column collation, proof of concept
Date: 2010-07-14 17:21:26
Message-ID: 4C3DABC602000025000335B1@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:

> Here is a proof of concept for per-column collation support.

Did you want a WIP review of that patch? (CF closing to new
submissions soon....)

-Kevin


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Per-column collation, proof of concept
Date: 2010-07-14 17:35:20
Message-ID: AANLkTiloowmvpPEkcrxJ4WhxfJterBwS8wETHQjTo6it@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hello

I have only one question - If I understand well you can use collate
just for sort. What is your plan for range search operation? Sort is
interesting and I am sure important for multilangual applications, for
me - more important is case sensitive, case insensitive, accent
sensitive, insensitive filtering - do you have a plan for it?

Regards

Pavel Stehule

2010/7/13 Peter Eisentraut <peter_e(at)gmx(dot)net>:
> Here is a proof of concept for per-column collation support.
>
> Here is how it works: When creating a table, an optional COLLATE clause
> can specify a collation name, which is stored (by OID) in pg_attribute.
> This becomes part of the type information and is propagated through the
> expression parse analysis, like typmod.  When an operator or function
> call is parsed (transformed), the collations of the arguments are
> unified, using some rules (like type analysis, but different in detail).
> The collations of the function/operator arguments come either from Var
> nodes which in turn got them from pg_attribute, or from other
> function and operator calls, or you can override them with explicit
> COLLATE clauses (not yet implemented, but will work a bit like
> RelabelType).  At the end, each function or operator call gets one
> collation to use.
>

what about DISTINCT clause, maybe GROUP BY clause ?

regards

Pavel

> The function call itself can then look up the collation using the
> fcinfo->flinfo->fn_expr field.  (Works for operator calls, but doesn't
> work for sort operations, needs more thought.)
>
> A collation is in this implementation defined as an lc_collate string
> and an lc_ctype string.  The implementation of functions interested in
> that information, such as comparison operators, or upper and lower
> functions, will take the collation OID that is passed in, look up the
> locale string, and use the xlocale.h interface (newlocale(),
> strcoll_l()) to compute the result.
>
> (Note that the xlocale stuff is only 10 or so lines in this patch.  It
> should be feasible to allow other appropriate locale libraries to be
> used.)
>
> Loose ends:
>
> - Support function calls (currently only operator calls) (easy)
>
> - Implementation of sort clauses
>
> - Indexing support/integration
>
> - Domain support (should be straightforward)
>
> - Make all expression node types deal with collation information
>  appropriately
>
> - Explicit COLLATE clause on expressions
>
> - Caching and not leaking memory of locale lookups
>
> - I have typcollatable to mark which types can accept collation
>  information, but perhaps there should also be proicareaboutcollation
>  to skip collation resolution when none of the functions in the
>  expression tree care.
>
> You can start by reading the collate.sql regression test file to see
> what it can do.  Btw., regression tests only work with "make check
> MULTIBYTE=UTF8".  And it (probably) only works with glibc for now.
>
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Per-column collation, proof of concept
Date: 2010-07-14 21:11:44
Message-ID: 1279141904.30100.4.camel@vanquo.pezone.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On ons, 2010-07-14 at 19:35 +0200, Pavel Stehule wrote:
> I have only one question - If I understand well you can use collate
> just for sort. What is your plan for range search operation?

My patch does range searches. Sorting uses the same operators, so both
will be supported. (Sorting is not yet implemented, as I had written.)

> Sort is
> interesting and I am sure important for multilangual applications, for
> me - more important is case sensitive, case insensitive, accent
> sensitive, insensitive filtering - do you have a plan for it?

You may be able to do some of these by using appropriate locale
definitions. I'd need some examples to be able to tell for sure.

> what about DISTINCT clause, maybe GROUP BY clause ?

DISTINCT and GROUP BY work with equality, which is not affected by
locales (at least under the current rules).


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Per-column collation, proof of concept
Date: 2010-07-15 03:57:08
Message-ID: AANLkTimPLCtsqgYJiAbezFNybkbQxVr2NKQEJyVLFbhK@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2010/7/14 Peter Eisentraut <peter_e(at)gmx(dot)net>:
> On ons, 2010-07-14 at 19:35 +0200, Pavel Stehule wrote:
>> I have only one question - If I understand well you can use collate
>> just for sort. What is your plan for range search operation?
>
> My patch does range searches.  Sorting uses the same operators, so both
> will be supported.  (Sorting is not yet implemented, as I had written.)
>
>> Sort is
>> interesting and I am sure important for multilangual applications, for
>> me - more important is case sensitive, case insensitive, accent
>> sensitive, insensitive filtering - do you have a plan for it?
>
> You may be able to do some of these by using appropriate locale
> definitions.  I'd need some examples to be able to tell for sure.
>
>> what about DISTINCT clause, maybe GROUP BY clause ?
>
> DISTINCT and GROUP BY work with equality, which is not affected by
> locales (at least under the current rules).
>

:( maybe we have to enhance a locales - or do some work in this way.
In Czech's IS is relative often operation some like

name = 'Stěhule' COLLATION cs_CZ_cs_ai -- compare case insensitive
accent insensitive

PostgreSQL is last db, that doesn't integreated support for it

Regards

Pavel

>
>


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Per-column collation, proof of concept
Date: 2010-07-15 08:44:26
Message-ID: 1279183466.28822.11.camel@fsopti579.F-Secure.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On tor, 2010-07-15 at 05:57 +0200, Pavel Stehule wrote:
> :( maybe we have to enhance a locales - or do some work in this way.
> In Czech's IS is relative often operation some like
>
> name = 'Stěhule' COLLATION cs_CZ_cs_ai -- compare case insensitive
> accent insensitive
>
> PostgreSQL is last db, that doesn't integreated support for it

Well, the comparison function varstr_cmp() contains this comment:

/*
* In some locales strcoll() can claim that nonidentical strings are
* equal. Believing that would be bad news for a number of reasons,
* so we follow Perl's lead and sort "equal" strings according to
* strcmp().
*/

This might not be strictly necessary, seeing that citext obviously
doesn't work that way, but resolving this is really an orthogonal issue.
If you fix that and you have a locale that does what you want, my patch
will help you get your example working.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Per-column collation, proof of concept
Date: 2010-07-15 15:24:21
Message-ID: 5575.1279207461@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Peter Eisentraut <peter_e(at)gmx(dot)net> writes:
> Well, the comparison function varstr_cmp() contains this comment:

> /*
> * In some locales strcoll() can claim that nonidentical strings are
> * equal. Believing that would be bad news for a number of reasons,
> * so we follow Perl's lead and sort "equal" strings according to
> * strcmp().
> */

> This might not be strictly necessary, seeing that citext obviously
> doesn't work that way, but resolving this is really an orthogonal issue.

The problem with not doing that is it breaks hashing --- hash joins and
hash aggregation being the real pain points.

citext works around this in a rather klugy fashion by decreeing that two
strings are equal iff their str_tolower() conversions are bitwise equal.
So it can hash the str_tolower() representation. But that's kinda slow
and it fails in the general case anyhow, I think.

regards, tom lane


From: Greg Stark <gsstark(at)mit(dot)edu>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Per-column collation, proof of concept
Date: 2010-07-15 17:04:19
Message-ID: AANLkTikAXop_9dqQeVzC-NfeY0hpHTwCPjJPMTAsavmM@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Jul 15, 2010 at 4:24 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> The problem with not doing that is it breaks hashing --- hash joins and
> hash aggregation being the real pain points.
>
> citext works around this in a rather klugy fashion by decreeing that two
> strings are equal iff their str_tolower() conversions are bitwise equal.
> So it can hash the str_tolower() representation.  But that's kinda slow
> and it fails in the general case anyhow, I think.

I think the general equivalent would be to call strxfrm and hash the
result of that.

--
greg


From: Jaime Casanova <jaime(at)2ndquadrant(dot)com>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Per-column collation, proof of concept
Date: 2010-08-02 06:43:40
Message-ID: AANLkTi=SBCk+zvGAjzrg8uTAk-w2+ipQ5=6VjSXR==0i@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Jul 13, 2010 at 1:25 PM, Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:
> Here is a proof of concept for per-column collation support.
>

Hi,

i was looking at this.

nowadays, CREATE DATABASE has a lc_collate clause. is the new collate
clause similar as the lc_collate?
i mean, is lc_collate what we will use as a default?

if yes, then probably we need to use pg_collation there too because
lc_collate and the new collate clause use different collation names.
"""
postgres=# create database test with lc_collate 'en_US.UTF-8';
CREATE DATABASE
test=# create table t1 (col1 text collate "en_US.UTF-8");
ERROR: collation "en_US.UTF-8" does not exist
test=# create table t1 (col1 text collate "en_US.utf8");
CREATE TABLE
"""

also i got errors from regression tests when MULTIBYTE=UTF8
(attached). it seems i was trying to create locales that weren't
defined on locales.txt (from were was fed that file?). i added a line
to that file (for es_EC.utf8) then i create a table with a column
using that collate and execute "select * from t2 where col1 > 'n'; "
and i got this error: "ERROR: could not create locale "es_EC.utf8""
(of course, that last part was me messing the things up, but it show
we shouldn't be using a file locales.txt, i think)

i can attach a collate to a domain but i can't see where are we
storing that info (actually it says it's not collatable):

--
Jaime Casanova         www.2ndQuadrant.com
Soporte y capacitación de PostgreSQL

Attachment Content-Type Size
regression.diffs application/octet-stream 1.6 KB

From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Jaime Casanova <jaime(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Per-column collation, proof of concept
Date: 2010-08-03 16:32:13
Message-ID: 1280853133.9554.7.camel@vanquo.pezone.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On mån, 2010-08-02 at 01:43 -0500, Jaime Casanova wrote:
> nowadays, CREATE DATABASE has a lc_collate clause. is the new collate
> clause similar as the lc_collate?
> i mean, is lc_collate what we will use as a default?

Yes, if you do not specify anything per column, the database default is
used.

How to integrate the per-database or per-cluster configuration with the
new system is something to figure out in the future.

> if yes, then probably we need to use pg_collation there too because
> lc_collate and the new collate clause use different collation names.
> """
> postgres=# create database test with lc_collate 'en_US.UTF-8';
> CREATE DATABASE
> test=# create table t1 (col1 text collate "en_US.UTF-8");
> ERROR: collation "en_US.UTF-8" does not exist
> test=# create table t1 (col1 text collate "en_US.utf8");
> CREATE TABLE
> """

This is something that libc does for you. The locale as listed by
locale -a is called "en_US.utf8", but apparently libc takes
"en_US.UTF-8" as well.

> also i got errors from regression tests when MULTIBYTE=UTF8
> (attached). it seems i was trying to create locales that weren't
> defined on locales.txt (from were was fed that file?). i added a line
> to that file (for es_EC.utf8) then i create a table with a column
> using that collate and execute "select * from t2 where col1 > 'n'; "
> and i got this error: "ERROR: could not create locale "es_EC.utf8""
> (of course, that last part was me messing the things up, but it show
> we shouldn't be using a file locales.txt, i think)

It might be that you don't have those locales installed in your system.
locales.txt is created by using locale -a. Check what that gives you.

> i can attach a collate to a domain but i can't see where are we
> storing that info (actually it says it's not collatable):

Domain support is not done yet.


From: Jaime Casanova <jaime(at)2ndquadrant(dot)com>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Per-column collation, proof of concept
Date: 2010-08-14 07:05:30
Message-ID: AANLkTin5aq1Xq-eetWSM6OjK88j7b_+STWUTa4P8xL8Y@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

sorry for the delay...
btw, the patch no longer apply cleanly but most are just hunks the
worst it's in src/backend/catalog/namespace.c because
FindConversionByName() is now called get_conversion_oid()... so maybe
this function should be named get_collation_oid(), i guess

On Tue, Aug 3, 2010 at 11:32 AM, Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:
> On mån, 2010-08-02 at 01:43 -0500, Jaime Casanova wrote:
>> nowadays, CREATE DATABASE has a lc_collate clause. is the new collate
>> clause similar as the lc_collate?
>> i mean, is lc_collate what we will use as a default?
>
> Yes, if you do not specify anything per column, the database default is
> used.
>
> How to integrate the per-database or per-cluster configuration with the
> new system is something to figure out in the future.
>

well at least pg_collation should be a shared catalog, no?
and i think we shouldn't be thinking in this without think first how
to integrate this with at least per-database configuration

>> if yes, then probably we need to use pg_collation there too because
>> lc_collate and the new collate clause use different collation names.
>> """
>> postgres=# create database test with lc_collate 'en_US.UTF-8';
>> CREATE DATABASE
>> test=# create table t1 (col1 text collate "en_US.UTF-8");
>> ERROR:  collation "en_US.UTF-8" does not exist
>> test=# create table t1 (col1 text collate "en_US.utf8");
>> CREATE TABLE
>> """
>
> This is something that libc does for you.  The locale as listed by
> locale -a is called "en_US.utf8", but apparently libc takes
> "en_US.UTF-8" as well.
>

ok, but at least this is confusing

also, it doesn't recognize C collate although it is in the locales.txt
"""
test3=# create database test4 with template=template0 encoding 'utf-8'
lc_collate='C';
CREATE DATABASE
test3=# create table t3 (col1 text collate "C" );
ERROR: collation "C" does not exist
"""

BTW, why the double quotes?

>> also i got errors from regression tests when MULTIBYTE=UTF8
>> (attached). it seems i was trying to create locales that weren't
>> defined on locales.txt (from were was fed that file?). i added a line
>> to that file (for es_EC.utf8) then i create a table with a column
>> using that collate and execute "select * from t2 where col1 > 'n'; "
>> and i got this error: "ERROR:  could not create locale "es_EC.utf8""
>> (of course, that last part was me messing the things up, but it show
>> we shouldn't be using a file locales.txt, i think)
>
> It might be that you don't have those locales installed in your system.
> locales.txt is created by using locale -a.  Check what that gives you.
>

sorry to state the obvious but this doesn't work on windows, does it?
and for some reason it also didn't work on a centos 5 (this error
ocurred when initdb'ing)
"""
loading system objects' descriptions ... ok
creating collations ...FATAL: invalid byte sequence for encoding
"UTF8": 0xe56c09
CONTEXT: COPY tmp_pg_collation, line 86
STATEMENT: COPY tmp_pg_collation FROM
E'/usr/local/pgsql/9.1/share/locales.txt';
"""

--
Jaime Casanova         www.2ndQuadrant.com
Soporte y capacitación de PostgreSQL


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Jaime Casanova <jaime(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Per-column collation, proof of concept
Date: 2010-08-17 03:56:02
Message-ID: 1282017362.10562.4.camel@vanquo.pezone.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On lör, 2010-08-14 at 02:05 -0500, Jaime Casanova wrote:
> btw, the patch no longer apply cleanly but most are just hunks the
> worst it's in src/backend/catalog/namespace.c because
> FindConversionByName() is now called get_conversion_oid()... so maybe
> this function should be named get_collation_oid(), i guess

OK, that will need to be adjusted.

> well at least pg_collation should be a shared catalog, no?
> and i think we shouldn't be thinking in this without think first how
> to integrate this with at least per-database configuration

Good point. But one might also want to create "private" collations, so
a collation in a schema would also be useful. Tricky.

> also, it doesn't recognize C collate although it is in the locales.txt
> """
> test3=# create database test4 with template=template0 encoding 'utf-8'
> lc_collate='C';
> CREATE DATABASE
> test3=# create table t3 (col1 text collate "C" );
> ERROR: collation "C" does not exist
> """

I've fixed this in the meantime. Your version of the patch doesn't
support the C locale yet.

> BTW, why the double quotes?

Because the name contains upper case letters?

> sorry to state the obvious but this doesn't work on windows, does it?

Probably not, but hopefully there is some similar API that could be used
on Windows.

> and for some reason it also didn't work on a centos 5 (this error
> ocurred when initdb'ing)
> """
> loading system objects' descriptions ... ok
> creating collations ...FATAL: invalid byte sequence for encoding
> "UTF8": 0xe56c09
> CONTEXT: COPY tmp_pg_collation, line 86
> STATEMENT: COPY tmp_pg_collation FROM
> E'/usr/local/pgsql/9.1/share/locales.txt';
> """

Hmm, what is in that file on that line?


From: Jaime Casanova <jaime(at)2ndquadrant(dot)com>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Per-column collation, proof of concept
Date: 2010-08-17 06:16:14
Message-ID: AANLkTimnD2DbS71VjBdA4sgKRrfKJV8c3Gke7qSWGJxG@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Aug 16, 2010 at 10:56 PM, Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:
> On lör, 2010-08-14 at 02:05 -0500, Jaime Casanova wrote:
>
>> BTW, why the double quotes?
>
> Because the name contains upper case letters?
>

why everything seems so obvious once someone else state it? :)

>> sorry to state the obvious but this doesn't work on windows, does it?
>
> Probably not, but hopefully there is some similar API that could be used
> on Windows.
>

good luck with that! ;)
seriously, maybe this helps
http://msdn.microsoft.com/en-us/library/system.windows.forms.inputlanguage.installedinputlanguages.aspx
but probably you will need to write the code yourself... at least i
don't think there is something like "locale -a"

>> and for some reason it also didn't work on a centos 5 (this error
>> ocurred when initdb'ing)
>> """
>> loading system objects' descriptions ... ok
>> creating collations ...FATAL:  invalid byte sequence for encoding
>> "UTF8": 0xe56c09
>> CONTEXT:  COPY tmp_pg_collation, line 86
>> STATEMENT:  COPY tmp_pg_collation FROM
>> E'/usr/local/pgsql/9.1/share/locales.txt';
>> """
>
> Hmm, what is in that file on that line?
>
>

bokmål ISO-8859-1

(i'm attaching the locales.txt just in case)

--
Jaime Casanova         www.2ndQuadrant.com
Soporte y capacitación de PostgreSQL

Attachment Content-Type Size
locales.txt text/plain 12.4 KB

From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Jaime Casanova <jaime(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Per-column collation, proof of concept
Date: 2010-08-18 16:29:25
Message-ID: 1282148965.27661.1.camel@vanquo.pezone.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On tis, 2010-08-17 at 01:16 -0500, Jaime Casanova wrote:
> >> creating collations ...FATAL: invalid byte sequence for encoding
> >> "UTF8": 0xe56c09
> >> CONTEXT: COPY tmp_pg_collation, line 86
> >> STATEMENT: COPY tmp_pg_collation FROM
> >> E'/usr/local/pgsql/9.1/share/locales.txt';
> >> """
> >
> > Hmm, what is in that file on that line?
> >
> >
>
> bokmål ISO-8859-1

Hey, that borders on genius: Use a non-ASCII letter in the name of a
locale whose purpose it is to configure how non-ASCII letters are
interpreted. :-/

Interestingly, I don't see this on a Debian system. Good thing to know
that this needs separate testing on different Linux variants.


From: Jaime Casanova <jaime(at)2ndquadrant(dot)com>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Per-column collation, proof of concept
Date: 2010-08-18 17:55:58
Message-ID: AANLkTi=cAmemjG16qPYXtyuOi9UDe222+LFrL7wTv7pp@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Aug 18, 2010 at 11:29 AM, Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:
> On tis, 2010-08-17 at 01:16 -0500, Jaime Casanova wrote:
>> >> creating collations ...FATAL:  invalid byte sequence for encoding
>> >> "UTF8": 0xe56c09
>> >> CONTEXT:  COPY tmp_pg_collation, line 86
>> >> STATEMENT:  COPY tmp_pg_collation FROM
>> >> E'/usr/local/pgsql/9.1/share/locales.txt';
>> >> """
>> >
>> > Hmm, what is in that file on that line?
>> >
>> >
>>
>> bokmål  ISO-8859-1
>
> Hey, that borders on genius: Use a non-ASCII letter in the name of a
> locale whose purpose it is to configure how non-ASCII letters are
> interpreted. :-/
>
> Interestingly, I don't see this on a Debian system.  Good thing to know
> that this needs separate testing on different Linux variants.
>
>

Yeah! and when installing centos 5 i don't have a chance to choose
what locales i want, it just installs all of them

--
Jaime Casanova         www.2ndQuadrant.com
Soporte y capacitación de PostgreSQL