Re: ORDER BY vs. volatile functions

Lists: pgsql-hackers
From: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
To: pgsql-hackers(at)postgresql(dot)org
Subject: ORDER BY vs. volatile functions
Date: 2009-11-16 05:28:15
Message-ID: 87my2nhwkp.fsf@news-spur.riddles.org.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

This query:

select random() from generate_series(1,10) order by random();

produces sorted output. Should it?

--
Andrew (irc:RhodiumToad)


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: ORDER BY vs. volatile functions
Date: 2009-11-16 05:44:00
Message-ID: 2654.1258350240@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk> writes:
> This query:
> select random() from generate_series(1,10) order by random();
> produces sorted output.
> Should it?

It always has; we'd doubtless break some apps if we changed that.

regards, tom lane


From: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
To: tgl(at)sss(dot)pgh(dot)pa(dot)us (Tom Lane), pgsql-hackers(at)postgresql(dot)org
Subject: Re: ORDER BY vs. volatile functions
Date: 2009-11-16 06:04:15
Message-ID: 87einzhuv3.fsf@news-spur.riddles.org.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

>>>>> "Tom" == Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

>> This query:
>> select random() from generate_series(1,10) order by random();
>> produces sorted output.
>> Should it?

Tom> It always has; we'd doubtless break some apps if we changed that.

For bonus weirdness:

select distinct random(),random() from generate_series(1,10);
set enable_hashagg=off;
select distinct random(),random() from generate_series(1,10);

I think _that_ one is a bug.

--
Andrew (irc:RhodiumToad)


From: Ron Mayer <rm_pg(at)cheapcomplexdevices(dot)com>
To: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: ORDER BY vs. volatile functions
Date: 2009-11-16 12:19:53
Message-ID: 4B014369.9040609@cheapcomplexdevices.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Gierth wrote:
> This query:
>
> select random() from generate_series(1,10) order by random();
> produces sorted output. Should it?

I recall a workaround from a different thread[1] if specifically
were looking for random ordering of random numbers is:
select random() from foo order by random()+1;

The thread has more odd corner cases with multiple calls
to random() and sorts as well.

[1] http://archives.postgresql.org/pgsql-general/2006-11/msg01544.php


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: ORDER BY vs. volatile functions
Date: 2009-11-16 15:00:47
Message-ID: 18797.1258383647@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk> writes:
> For bonus weirdness:

> select distinct random(),random() from generate_series(1,10);
> set enable_hashagg=off;
> select distinct random(),random() from generate_series(1,10);

> I think _that_ one is a bug.

Hmm. I think the first one is a bug --- the two invocations of
random() in the tlist shouldn't be folded together.

regards, tom lane


From: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: ORDER BY vs. volatile functions
Date: 2009-11-16 17:49:16
Message-ID: 87my2mgy4s.fsf@news-spur.riddles.org.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

>>>>> "Tom" == Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

>> For bonus weirdness:

>> select distinct random(),random() from generate_series(1,10);
>> set enable_hashagg=off;
>> select distinct random(),random() from generate_series(1,10);

>> I think _that_ one is a bug.

Tom> Hmm. I think the first one is a bug --- the two invocations of
Tom> random() in the tlist shouldn't be folded together.

That's what I meant.

If you try it using nextval(), you'll notice that the function does
in fact get called twice per row, but one of the results is thrown
away and replaced with the other one.

--
Andrew.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: ORDER BY vs. volatile functions
Date: 2009-11-16 17:52:35
Message-ID: 22884.1258393955@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk> writes:
> If you try it using nextval(), you'll notice that the function does
> in fact get called twice per row, but one of the results is thrown
> away and replaced with the other one.

Yeah. The problem is that setrefs.c is generating a tlist for the
hashagg node in which both output expressions point to the first
output of the underlying scan node, because it's just relying on
equal() to match up the expressions. I'm testing a fix now ...

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Ron Mayer <rm_pg(at)cheapcomplexdevices(dot)com>
Cc: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: ORDER BY vs. volatile functions
Date: 2009-11-16 18:31:24
Message-ID: 23538.1258396284@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Ron Mayer <rm_pg(at)cheapcomplexdevices(dot)com> writes:
> [1] http://archives.postgresql.org/pgsql-general/2006-11/msg01544.php

FWIW, the behavior has changed from the time of that discussion ---
we now track sort ordering using EquivalenceClasses, which are able
to distinguish different instances of textually equal() volatile
expressions. The particular cases of
select random() from foo order by 1;
select random() from foo order by random();
still behave the same, but that's intentional for backwards
compatibility (and because SQL99 forbids the first, which would mean
there's no way to get the behavior except via deprecated syntax).
Beyond the case of order by/group by items being matched to tlist
items, I'd generally expect that the system should act as though
different textual instances of random() are evaluated separately.

regards, tom lane


From: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
To: tgl(at)sss(dot)pgh(dot)pa(dot)us (Tom Lane), Ron Mayer <rm_pg(at)cheapcomplexdevices(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: ORDER BY vs. volatile functions
Date: 2009-11-16 19:10:23
Message-ID: 87d43igv4w.fsf@news-spur.riddles.org.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

>>>>> "Tom" == Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

Tom> Ron Mayer <rm_pg(at)cheapcomplexdevices(dot)com> writes:
>> [1] http://archives.postgresql.org/pgsql-general/2006-11/msg01544.php

Tom> FWIW, the behavior has changed from the time of that discussion ---
Tom> we now track sort ordering using EquivalenceClasses, which are able
Tom> to distinguish different instances of textually equal() volatile
Tom> expressions. The particular cases of
Tom> select random() from foo order by 1;
Tom> select random() from foo order by random();
Tom> still behave the same, but that's intentional for backwards
Tom> compatibility (and because SQL99 forbids the first, which would mean
Tom> there's no way to get the behavior except via deprecated syntax).

SQL99 doesn't forbid:

select random() as r from foo order by r;

or

select r from (select random() as r from foo) as s order by r;

--
Andrew (irc:RhodiumToad)