Re: sortsupport for text

From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Peter Geoghegan" <peter(at)2ndquadrant(dot)com>
Cc: "Robert Haas" <robertmhaas(at)gmail(dot)com>, "Greg Stark" <stark(at)mit(dot)edu>, "PG Hackers" <pgsql-hackers(at)postgresql(dot)org>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: sortsupport for text
Date: 2012-06-19 16:45:36
Message-ID: 4FE0666002000025000486D5@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Peter Geoghegan <peter(at)2ndquadrant(dot)com> wrote:
> Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov> wrote:

>> Since we appear to be questioning everything in this area, I'll
>> raise something which has been bugging me for a while: in some
>> other systems I've used, the "tie-breaker" comparison for
>> equivalent values comes after equivalence sorting on *all* sort
>> keys, rather than *each* sort key.
>
> Are you sure that they actually have a tie-breaker, and don't just
> make the distinction between equality and equivalence (if only
> internally)?

I'm pretty sure that when I was using Sybase ASE the order for
non-equal values was always predictable, and it behaved in the
manner I describe below. I'm less sure about any other product.

> I don't see why you'd want a tie-breaker across multiple keys. I
> mean, you could, I just don't see any reason to.

So that you can have entirely repeatable results across multiple
runs?

>> test=# select * from c order by 2;
>> last_name | first_name
>> -----------+------------
>> smith | bob
>> SMITH | EDWARD
>> smith | peter
>> (3 rows)
>>
>> This seems completely wrong:
>>
>> test=# select * from c order by 1,2;
>> last_name | first_name
>> -----------+------------
>> smith | bob
>> smith | peter
>> SMITH | EDWARD
>> (3 rows)
>
> Agreed. Definitely a POLA violation.
>
>> I'm sure the latter is harder to do and slower to execute; but
>> the former just doesn't seem defensible as correct.
>
> This same gripe is held by the author of that sorting document I
> linked to from the Unicode consortium, with a very similar
> example. So it seems like this could be a win from several
> perspectives, as it would enable the strxfrm() optimisation. I'm
> pretty sure that pg_upgrade wouldn't be very happy about this, so
> we'd have to have a legacy compatibility mode.

At a minimum, it could require rebuilding indexes with multiple
columns where any indexed value before the last index column can be
equivalent but not equal.

-Kevin

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2012-06-19 16:48:39 Near-duplicate RI NO ACTION and RESTRICT triggers
Previous Message Boszormenyi Zoltan 2012-06-19 16:44:04 Re: [PATCH] lock_timeout and common SIGALRM framework