Quick Links

Re: sortsupport for text

From:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To:	"Peter Geoghegan" <peter(at)2ndquadrant(dot)com>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	"Robert Haas" <robertmhaas(at)gmail(dot)com>, "Greg Stark" <stark(at)mit(dot)edu>, "PG Hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: sortsupport for text
Date:	2012-06-19 15:17:37
Message-ID:	4FE051C102000025000486B4@gw.wicourts.gov
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Peter Geoghegan <peter(at)2ndquadrant(dot)com> wrote:

> So, just to give a bit more weight to my argument that we should
> recognise that equivalent strings ought to be treated identically

Since we appear to be questioning everything in this area, I'll
raise something which has been bugging me for a while: in some other
systems I've used, the "tie-breaker" comparison for equivalent
values comes after equivalence sorting on *all* sort keys, rather
than *each* sort key. For example, this much makes sense with
lc_collate = 'en_US.UTF-8':

test=# create table c (last_name text not null, first_name text);
CREATE TABLE
test=# insert into c values ('smith', 'bob'), ('smith', 'peter'),
('SMITH', 'EDWARD');
INSERT 0 3
test=# select * from c order by 2;
last_name | first_name
-----------+------------
smith | bob
SMITH | EDWARD
smith | peter
(3 rows)

This seems completely wrong:

test=# select * from c order by 1,2;
last_name | first_name
-----------+------------
smith | bob
smith | peter
SMITH | EDWARD
(3 rows)

I have seen other databases which get it in the order I would expect
-- where the C compare only matters within groups of equivalent
rows. It seems that PostgreSQL effectively orders by:

last_name using collation 'en_US.UTF-8'
last_name using collation 'C'
first_name using collation 'en_US.UTF-8'
first_name using collation 'C'

while some other products order by:

last_name using collation 'en_US.UTF-8'
first_name using collation 'en_US.UTF-8'
last_name using collation 'C'
first_name using collation 'C'

I'm sure the latter is harder to do and slower to execute; but the
former just doesn't seem defensible as correct.

-Kevin

In response to

Re: sortsupport for text at 2012-06-19 14:41:17 from Peter Geoghegan

Responses

Re: sortsupport for text at 2012-06-19 16:17:29 from Peter Geoghegan

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	David Fetter	2012-06-19 15:33:30	Re: initdb and fsync
Previous Message	Kohei KaiGai	2012-06-19 15:15:34	Re: pgsql_fdw in contrib