Re: Strange behavior of some volatile function like random(), nextval()

Lists: pgsql-hackers
From: Alex Ignatov <a(dot)ignatov(at)postgrespro(dot)ru>
To: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Strange behavior of some volatile function like random(), nextval()
Date: 2016-06-29 10:43:32
Message-ID: 414eda7d-739e-0fea-cb50-2ab7c1c0152f@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hello!

Got some strange behavior of random() function:

postgres=# select (select random() ) from generate_series(1,10) as i;
random
-------------------
0.831577288918197
0.831577288918197
0.831577288918197
0.831577288918197
0.831577288918197
0.831577288918197
0.831577288918197
0.831577288918197
0.831577288918197
0.831577288918197
(10 rows)

postgres=# select (select random()+i*0 ) from generate_series(1,10) as i;
?column?
--------------------
0.97471913928166
0.0532126761972904
0.331358563620597
0.0573496259748936
0.321165383327752
0.48836630070582
0.444201893173158
0.0729857799597085
0.661443184129894
0.706566562876105
(10 rows)

postgres=# explain select (select random() ) from generate_series(1,10)
as i;
QUERY PLAN
--------------------------------------------------------------------------
Function Scan on generate_series i (cost=0.02..10.01 rows=1000 width=0)
InitPlan 1 (returns $0)
-> Result (cost=0.00..0.01 rows=1 width=0)
(3 rows)

postgres=# explain select (select random()+i*0 ) from
generate_series(1,10) as i;
QUERY PLAN
--------------------------------------------------------------------------
Function Scan on generate_series i (cost=0.00..30.00 rows=1000 width=4)
SubPlan 1
-> Result (cost=0.00..0.02 rows=1 width=0)
(3 rows)

postgres=# \df+ random();
List of functions
Schema | Name | Result data type | Argument data types | Type |
Security | Volatility | Owner | Language | Source code | Description
------------+--------+------------------+---------------------+--------+----------+------------+----------+----------+-------------+--------------
pg_catalog | random | double precision | | normal
| invoker | volatile | postgres | internal | drandom | random value
(1 row)

Also:

postgres=# create sequence test;
CREATE SEQUENCE
postgres=# SELECT (SELECT nextval('test')) FROM generate_series(1,10) as i;
nextval
---------
1
1
1
1
1
1
1
1
1
1
(10 rows)

postgres=# SELECT (SELECT nextval('test')+i*0) FROM
generate_series(1,10) as i;
?column?
----------
2
3
4
5
6
7
8
9
10
11
(10 rows)

postgres=# \df+ nextval() ;

List of functions
Schema | Name | Result data type | Argument data types | Type
| Security | Volatility | Owner | Language | Source code |
Description
------------+---------+------------------+---------------------+--------+----------+------------+----------+----------+-------------+---------------------
pg_catalog | nextval | bigint | regclass | normal
| invoker | volatile | postgres | internal | nextval_oid | sequence
next value
(1 row)

Both function is volatile so from docs :

"A VOLATILE function can do anything, including modifying the database.
It can return different results on successive calls with the same
arguments. The optimizer makes no assumptions about the behavior of such
functions. A query using a volatile function will re-evaluate the
function at every row where its value is needed."

Something wrong with executor? Is it bug or executor feature related
with subquery?

--
Alex Ignatov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Alex Ignatov <a(dot)ignatov(at)postgrespro(dot)ru>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Strange behavior of some volatile function like random(), nextval()
Date: 2016-06-29 11:34:59
Message-ID: CAB7nPqQYO+4HYmrYGBXE3V9G8fJuUsdZ4U-gMgx91QVPUMmRVw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Jun 29, 2016 at 7:43 PM, Alex Ignatov <a(dot)ignatov(at)postgrespro(dot)ru> wrote:
> Hello!
>
> Got some strange behavior of random() function:
>
> postgres=# select (select random() ) from generate_series(1,10) as i;
> random
> -------------------
> 0.831577288918197
> [...]
> (10 rows)

I recall that this is treated as an implicit LATERAL, meaning that
random() is calculated only once.

> postgres=# select (select random()+i*0 ) from generate_series(1,10) as i;
> ?column?
> --------------------
> 0.97471913928166
> [...]
> (10 rows)

But not that. So those results do not surprise me.
--
Michael


From: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: Alex Ignatov <a(dot)ignatov(at)postgrespro(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Strange behavior of some volatile function like random(), nextval()
Date: 2016-06-29 12:30:01
Message-ID: CAKFQuwbxcFcCN-FYqwSeH0-GkwLhC6rzVOWSDG=v9vYJLR0e0g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

More specifically...
On Wed, Jun 29, 2016 at 7:34 AM, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
wrote:

> On Wed, Jun 29, 2016 at 7:43 PM, Alex Ignatov <a(dot)ignatov(at)postgrespro(dot)ru>
> wrote:
> > Hello!
> >
> > Got some strange behavior of random() function:
> >
> > postgres=# select (select random() ) from generate_series(1,10) as i;
> > random
> > -------------------
> > 0.831577288918197
> > [...]
> > (10 rows)
>
> I recall that this is treated as an implicit LATERAL, meaning that
> random() is calculated only once.
>

A non-correlated (i.e., does not refer to outer variables) subquery placed
into the target-list need only have its value computed once - so that is
what happens. The fact that a volatile function can return different
values given the same arguments doesn't mean much when the function is only
ever called a single time.​

> > postgres=# select (select random()+i*0 ) from generate_series(1,10) as i;
> > ?column?
> > --------------------
> > 0.97471913928166
> > [...]
> > (10 rows)
>
> But not that. So those results do not surprise me.
>
>
​A correlated subquery, on the other hand, has to be called once for every
row and is evaluated within the context supplied by said row​. Each time
random is called it returns a new value.

Section 4.2.11 (9.6 docs)
https://www.postgresql.org/docs/9.6/static/sql-expressions.html#SQL-SYNTAX-SCALAR-SUBQUERIES

Maybe this could be worded better but the first part talks about a single
execution while "any one execution" is mentioned in reference to "the
surrounding query".

​I do think that defining "correlated" and "non-correlated" subqueries
within this section would be worthwhile.

David J.


From: Alex Ignatov <a(dot)ignatov(at)postgrespro(dot)ru>
To: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Strange behavior of some volatile function like random(), nextval()
Date: 2016-06-29 14:05:59
Message-ID: 4c9f73eb-f0da-1d21-e529-8ec5e42b887c@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On 29.06.2016 15:30, David G. Johnston wrote:
> More specifically...
> On Wed, Jun 29, 2016 at 7:34 AM, Michael Paquier
> <michael(dot)paquier(at)gmail(dot)com <mailto:michael(dot)paquier(at)gmail(dot)com>>wrote:
>
> On Wed, Jun 29, 2016 at 7:43 PM, Alex Ignatov
> <a(dot)ignatov(at)postgrespro(dot)ru <mailto:a(dot)ignatov(at)postgrespro(dot)ru>> wrote:
> > Hello!
> >
> > Got some strange behavior of random() function:
> >
> > postgres=# select (select random() ) from generate_series(1,10)
> as i;
> > random
> > -------------------
> > 0.831577288918197
> > [...]
> > (10 rows)
>
> I recall that this is treated as an implicit LATERAL, meaning that
> random() is calculated only once.
>
>
> A non-correlated (i.e., does not refer to outer variables) subquery
> placed into the target-list need only have its value computed once -
> so that is what happens. The fact that a volatile function can return
> different values given the same arguments doesn't mean much when the
> function is only ever called a single time.​
>
>
> > postgres=# select (select random()+i*0 ) from
> generate_series(1,10) as i;
> > ?column?
> > --------------------
> > 0.97471913928166
> > [...]
> > (10 rows)
>
> But not that. So those results do not surprise me.
>
>
> ​A correlated subquery, on the other hand, has to be called once for
> every row and is evaluated within the context supplied by said row​.
> Each time random is called it returns a new value.
>
> Section 4.2.11 (9.6 docs)
> https://www.postgresql.org/docs/9.6/static/sql-expressions.html#SQL-SYNTAX-SCALAR-SUBQUERIES
>
> Maybe this could be worded better but the first part talks about a
> single execution while "any one execution" is mentioned in reference
> to "the surrounding query".
>
> ​I do think that defining "correlated" and "non-correlated" subqueries
> within this section would be worthwhile.
>
> David J.
> ​
>
>
In this subquery(below) we have reference to outer variables but it is
not working as it should(or i dont understand something):

postgres=# postgres=# select id, ( select string_agg('a','') from
generate_series(1,trunc(10*random()+1)::int) where id=id) from
generate_series(1,10) as id;
id | string_agg
----+------------
1 | aaa
2 | aaa
...
but this query(with reference to outer var) working perfectly:
postgres=# select id,(select random() where id=id) from
generate_series(1,10) as id;
id | random
----+--------------------
1 | 0.974509597290307
2 | 0.219822214450687
...

Also this query is working good( (id-id) do the job):
postgres=# select id, ( select string_agg('a','') from
generate_series(1,trunc(10*random()+1)::int+(id-id)) ) from
generate_series(1,10) as id;
id | string_agg
----+------------
1 | aaaaaaa
2 | aaaaa
...

It means that even reference to outer variables doesn't mean that
executor execute volatile function from subquery every time. Or there is
something else what i should know?

Alex Ignatov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alex Ignatov <a(dot)ignatov(at)postgrespro(dot)ru>
Cc: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Strange behavior of some volatile function like random(), nextval()
Date: 2016-06-29 14:48:09
Message-ID: 7445.1467211689@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

[ Please do not quote the entire thread in each followup. That's
disrespectful of your readers' time, and will soon cause people to
stop reading the thread, meaning you don't get answers. ]

Alex Ignatov <a(dot)ignatov(at)postgrespro(dot)ru> writes:
> In this subquery(below) we have reference to outer variables but it is
> not working as it should(or i dont understand something):

> postgres=# postgres=# select id, ( select string_agg('a','') from
> generate_series(1,trunc(10*random()+1)::int) where id=id) from
> generate_series(1,10) as id;

The inner generate_series() call does not contain any outer references, so
it doesn't get recomputed. There's a comment in ExecReScanFunctionScan
about that:

* Here we have a choice whether to drop the tuplestores (and recompute
* the function outputs) or just rescan them. We must recompute if an
* expression contains changed parameters, else we rescan.
*
* XXX maybe we should recompute if the function is volatile? But in
* general the executor doesn't conditionalize its actions on that.

So you get some random number of generate_series output rows on the first
call, but then each subsequent run of the subquery just rescans those same
rows. I do not think this is wrong or bad, really; if it was done
differently, examples such as this same generate_series call on the inside
of a nestloop join would behave very strangely.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Alex Ignatov <a(dot)ignatov(at)postgrespro(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Strange behavior of some volatile function like random(), nextval()
Date: 2016-06-29 15:04:51
Message-ID: 7986.1467212691@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"David G. Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com> writes:
> A correlated subquery, on the other hand, has to be called once for every
> row and is evaluated within the context supplied by said row. Each time
> random is called it returns a new value.

> Section 4.2.11 (9.6 docs)
> https://www.postgresql.org/docs/9.6/static/sql-expressions.html#SQL-SYNTAX-SCALAR-SUBQUERIES

> Maybe this could be worded better but the first part talks about a single
> execution while "any one execution" is mentioned in reference to "the
> surrounding query".

> I do think that defining "correlated" and "non-correlated" subqueries
> within this section would be worthwhile.

Hmm ... a quick look around says we don't define or use those terms
anywhere. I agree this could stand to be addressed somewhere, but I'm
not sure if 4.2.11 is the most appropriate place. I don't think the
issue is unique to scalar subqueries.

regards, tom lane