Re: Per-column collation, proof of concept

From: Jaime Casanova <jaime(at)2ndquadrant(dot)com>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Per-column collation, proof of concept
Date: 2010-08-14 07:05:30
Message-ID: AANLkTin5aq1Xq-eetWSM6OjK88j7b_+STWUTa4P8xL8Y@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

sorry for the delay...
btw, the patch no longer apply cleanly but most are just hunks the
worst it's in src/backend/catalog/namespace.c because
FindConversionByName() is now called get_conversion_oid()... so maybe
this function should be named get_collation_oid(), i guess

On Tue, Aug 3, 2010 at 11:32 AM, Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:
> On mån, 2010-08-02 at 01:43 -0500, Jaime Casanova wrote:
>> nowadays, CREATE DATABASE has a lc_collate clause. is the new collate
>> clause similar as the lc_collate?
>> i mean, is lc_collate what we will use as a default?
>
> Yes, if you do not specify anything per column, the database default is
> used.
>
> How to integrate the per-database or per-cluster configuration with the
> new system is something to figure out in the future.
>

well at least pg_collation should be a shared catalog, no?
and i think we shouldn't be thinking in this without think first how
to integrate this with at least per-database configuration

>> if yes, then probably we need to use pg_collation there too because
>> lc_collate and the new collate clause use different collation names.
>> """
>> postgres=# create database test with lc_collate 'en_US.UTF-8';
>> CREATE DATABASE
>> test=# create table t1 (col1 text collate "en_US.UTF-8");
>> ERROR:  collation "en_US.UTF-8" does not exist
>> test=# create table t1 (col1 text collate "en_US.utf8");
>> CREATE TABLE
>> """
>
> This is something that libc does for you.  The locale as listed by
> locale -a is called "en_US.utf8", but apparently libc takes
> "en_US.UTF-8" as well.
>

ok, but at least this is confusing

also, it doesn't recognize C collate although it is in the locales.txt
"""
test3=# create database test4 with template=template0 encoding 'utf-8'
lc_collate='C';
CREATE DATABASE
test3=# create table t3 (col1 text collate "C" );
ERROR: collation "C" does not exist
"""

BTW, why the double quotes?

>> also i got errors from regression tests when MULTIBYTE=UTF8
>> (attached). it seems i was trying to create locales that weren't
>> defined on locales.txt (from were was fed that file?). i added a line
>> to that file (for es_EC.utf8) then i create a table with a column
>> using that collate and execute "select * from t2 where col1 > 'n'; "
>> and i got this error: "ERROR:  could not create locale "es_EC.utf8""
>> (of course, that last part was me messing the things up, but it show
>> we shouldn't be using a file locales.txt, i think)
>
> It might be that you don't have those locales installed in your system.
> locales.txt is created by using locale -a.  Check what that gives you.
>

sorry to state the obvious but this doesn't work on windows, does it?
and for some reason it also didn't work on a centos 5 (this error
ocurred when initdb'ing)
"""
loading system objects' descriptions ... ok
creating collations ...FATAL: invalid byte sequence for encoding
"UTF8": 0xe56c09
CONTEXT: COPY tmp_pg_collation, line 86
STATEMENT: COPY tmp_pg_collation FROM
E'/usr/local/pgsql/9.1/share/locales.txt';
"""

--
Jaime Casanova         www.2ndQuadrant.com
Soporte y capacitación de PostgreSQL

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2010-08-14 09:47:54 Re: WIP partial replication patch
Previous Message Boszormenyi Zoltan 2010-08-14 06:40:24 Re: WIP partial replication patch