Re: sortsupport for text

From: Peter Geoghegan <peter(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Greg Stark <stark(at)mit(dot)edu>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: sortsupport for text
Date: 2012-06-17 16:43:17
Message-ID: CAEYLb_XpBGAUA=UJ=W2rK8eyMBYYAwBbqy0Q7W-ziMb+qwn1-w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 17 June 2012 17:01, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> I'm not sure I agree with this decision; why should we presume to know
>> better than the glibc locale what constitutes equality?
>
> The killer reason why it must be like that is that you can't use hash
> methods on text if text equality is some unknown condition subtly
> different from bitwise equality.

Fair enough, but I doubt that we need to revert the changes made in
this commit to texteq in addition to the changes I'd like to see in
order to be semantically self-consistent. That is because there is
often a distinction made between equality and equivalence, and we
could adopt this distinction. strcoll() could be said to be just
making a representation that its two arguments are equivalent (and not
necessarily equal) when it returns 0. This distinction is explicitly
made in the C++ standard library, and failing to understand it can
result in bugs:

http://www.cplusplus.com/reference/algorithm/equal_range/

Note the use of the word "equivalent" rather than "equal" in the text.
equal_range is a bit of a misnomer.

This distinction is important enough to have warranted an entire
subsection of the book "Effective STL" by Scott Meyers, a
well-respected expert on the language. This comes up more often than
you'd think - "std::set::insert" determines if an element already
exists (to know if it must replace it) based on equivalency (usually,
though not necessarily, defined in terms of operator< ), whereas the
"find" algorithm finds elements based on equality (operator==).

> My recollection is that there were
> some other problems as well, but I'm too lazy to search the archives
> for you.

Fair enough. I'll search for it myself later. I'm about to head out now.

>> It's seems very likely that the main
>> one was the then-need to guard against poor quality qsort()
>> implementations that went quadratic in the face of lots of duplicates,
>
> No, I don't recall that that had anything to do with it.

Oh, okay. It looked very much like the "avoid equality at all costs"
thing you still see some of in tuplesort.c .

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Euler Taveira 2012-06-17 16:49:04 Re: libpq compression
Previous Message Tom Lane 2012-06-17 16:29:53 Re: libpq compression