Re: Range Types and extensions

From: Darren Duncan <darren(at)darrenduncan(dot)net>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Florian Pflug <fgp(at)phlo(dot)org>, Jeff Davis <pgsql(at)j-davis(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Merlin Moncure <mmoncure(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Range Types and extensions
Date: 2011-06-13 04:47:28
Message-ID: 4DF59660.8080206@darrenduncan.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Robert Haas wrote:
> On Sun, Jun 12, 2011 at 7:53 AM, Florian Pflug <fgp(at)phlo(dot)org> wrote:
>>> I think the collation is going to have to be baked into the type
>>> definition, no? You can't just up and change the collation of the
>>> column as you could for a straight text column, if that might cause
>>> the contents of some rows to be viewed as invalid.
>> Now you've lost me. If a text range is simply a pair of strings,
>> as I suggested, and collations are applied only during comparison
>> and RANGE_EMPTY(), why would the collation have to be baked into
>> the type?
>>
>> If you're referring to the case
>> (1) Create table with text-range column and collation C1
>> (2) Add check constraint containing RANGE_EMPTY()
>> (3) Add data
>> (4) Alter column to have collation C2, possibly changing
>> the result of RANGE_EMPTY() for existing ranges.
>> then that points to a problem with ALTER COLUMN.
>
> No, I'm saying that you might have a column containing '[a, Z)', and
> someone might change the collation of the column from en_US to C.
> When the collation was en_US, the column could legally contain that
> value, but now that the collation is C, it can't. ALTER TABLE isn't
> going to recheck the validity of the data when someone changes the
> collation: that's only supposed to affect the sort order, not the
> definition of what is a legal value.

You can have the same collation problem even without range types.

Consider the following:
(1) Create table with the 2 text columns {L,R} and both columns have the
collation en_US.
(2) Add check constraint requiring "L <= R".
(3) Add a record with the value 'a' for L and 'Z' for R.
(4) Alter the columns to have the collation C.

Good language design principles demand that the semantics for this simplified
case and the semantics for replacing {L,R} with a single range-of-text-typed
column be the same, including what happens with CHECK and ALTER TABLE.

Likewise, anything that affects ORDER BY should affect {<,>,<=,>=} and friends
the same way and vice-versa and likewise should affect range validity.

It makes sense for collation to be considered part of text data types, and
changing collation is casting from one text type to another. Generally
speaking, any inherent or applied aspect of a text or other value (such as
collation) that affects the results of any deterministic operations on those
values (such as sorting) should be considered part of the data type of those values.

-- Darren Duncan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2011-06-13 04:58:28 Re: Range Types and extensions
Previous Message Greg Smith 2011-06-13 04:38:52 Re: pgbench--new transaction type