Re: Per-column collation, work in progress

From: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Per-column collation, work in progress
Date: 2010-09-22 10:44:24
Message-ID: AANLkTindnuQ1Yipo-u-O3=mimncrrrtdoZ85xRthhJfp@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Sep 16, 2010 at 5:46 AM, Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:
> Following up on my previous patch [0], here is a fairly complete
> implementation of this feature.  The general description and
> implementation outline of the previous message still apply.  This patch
> contains documentation and regression tests, which can serve as further
> explanations.

I tested the patch on database with encoding=UTF8 and locale-C.
I have a couple of questions and comments.

* CREATE TABLE (LIKE table_with_collation) doesn't inherit collations.
We need to copy collations by default, or add INCLUDING COLLATE option.

* upper() doesn't work if a column has a collation.
It still works if a column doesn't have a collation.
postgres=# \d tbl
Table "public.tbl"
Column | Type | Modifiers
--------+------+--------------------
c | text | collate C
ja | text | collate ja_JP.utf8

postgres=# SELECT name, setting FROM pg_settings WHERE name IN
('lc_ctype', 'lc_collate');
name | setting
------------+---------
lc_collate | C
lc_ctype | C
(2 rows)

postgres=# SELECT upper(c) FROM tbl;
ERROR: invalid multibyte character for locale
HINT: The server's LC_CTYPE locale is probably incompatible with the
database encoding.
postgres=# SELECT upper(ja) FROM tbl;
ERROR: invalid multibyte character for locale
HINT: The server's LC_CTYPE locale is probably incompatible with the
database encoding

* Comparison of strings with different collations is forbidden,
but assignment is allowed, right?

postgres=# SELECT * FROM tbl WHERE c = ja;
ERROR: collation mismatch between implicit collations "C" and "ja_JP.utf8"
LINE 1: SELECT * FROM tbl WHERE c = ja;
^
HINT: You can override the collation by applying the COLLATE clause
to one or both expressions.
postgres=# INSERT INTO tbl(c, ja) SELECT ja, c FROM tbl;
INSERT 0 6

* psql \d needs a separator between collate and not null modifiers.
postgres=# ALTER TABLE tbl ALTER COLUMN c SET NOT NULL;
ALTER TABLE
postgres=# \d tbl
Table "public.tbl"
Column | Type | Modifiers
--------+------+--------------------
c | text | collate Cnot null <= HERE
ja | text | collate ja_JP.utf8

> the feature overall only works on Linux/glibc.

We could support it also on MSVC.
http://msdn.microsoft.com/en-us/library/a7cwbx4t(v=VS.90).aspx -- _strcoll_l
http://msdn.microsoft.com/en-us/library/45119yx3(v=VS.90).aspx -- _towupper_l

--
Itagaki Takahiro

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2010-09-22 11:07:18 Re: Configuring synchronous replication
Previous Message Abhijit Menon-Sen 2010-09-22 10:44:07 Re: Multi-branch committing in git, revisited