Re: Warts with SELECT DISTINCT

From: Bruno Wolff III <bruno(at)wolff(dot)to>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Greg Stark <gsstark(at)mit(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Warts with SELECT DISTINCT
Date: 2006-05-04 14:06:11
Message-ID: 20060504140611.GA19321@wolff.to
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, May 04, 2006 at 02:39:33 -0400,
Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Bruno Wolff III <bruno(at)wolff(dot)to> writes:
> > ... it would be OK to rewrite
> > SELECT DISTINCT x ORDER BY foo(x)
> > as
> > SELECT DISTINCT ON (foo(x), x) x ORDER BY foo(x)
>
> This assumes that x = y implies foo(x) = foo(y), which is something
> that's not necessarily the case, mainly because a datatype's "="
> function need not have a lot to do with the behavior of arbitrary
> functions foo(), especially if foo() yields a different datatype.
> The citext datatype is an easy counterexample: it thinks "foo" = "Foo",
> but md5() of those values will not yield the same answers.
>
> The bottom line here is that this sort of deduction requires more
> understanding of the properties of datatypes and functions than
> our existing catalogs allow the planner to obtain.

Thanks for pointing that out. I should have realized that this was the same
(or at least close to) issue I was thinking would be a problem initially, but
then I started thinking that '=' promised more than it did and assumed that
x = y implies foo(x) = foo(y), which as you point out isn't always true.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2006-05-04 14:11:46 Re: Rethinking locking for database create/drop vs connection startup
Previous Message Larry Rosenman 2006-05-04 13:28:40 autovacuum logging, part deux.