Common Table Expressions applied; some issues remain

Lists: pgsql-hackers
From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Yoshiyuki Asaba <y-asaba(at)sraoss(dot)co(dot)jp>, Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Common Table Expressions applied; some issues remain
Date: 2008-10-04 22:22:23
Message-ID: 9623.1223158943@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I've applied the latest version of the CTE patch. Congratulations on
making that happen!

There are still some loose ends that need to be considered, though.

1. As committed, the patch takes an extremely hard line about WITH
queries being evaluated independently of the main query and only once
per main query execution. This could be seen as a good thing --- it
provides much more determinism for execution of volatile functions
within complex queries than was really available in the past. It could
also be seen as a bad thing --- in particular, we won't push any
limiting qualifications from the main query into the WITH queries.
So for instance

WITH q AS ( SELECT * FROM foo )
SELECT * FROM q WHERE key = 42;

is going to be executed quite inefficiently; it won't use an index on
foo.key. I think we don't have much choice about this in the case of
recursive WITH queries: it would be pretty difficult to determine
whether pushing a restriction into a recursive WITH would change the
results incorrectly. However, for plain non-recursive WITHs it's all
a matter of definition. I gather from
http://www.oracle-developer.net/display.php?id=212
that Oracle chooses to treat WITH-queries as if they were plain
sub-selects if they're non-recursive and only referenced once.
That is, Oracle would rewrite the above into

SELECT * FROM ( SELECT * FROM foo ) AS q WHERE key = 42;

and then flatten the sub-select and optimize normally. It would
not be hard to make Postgres do the same, but then we would lose
some guarantees about predictable execution of volatile functions.

I'm inclined to think that there is no reason to provide two
different syntaxes to do the same thing, and so having the WITH
syntax behave like this is okay. But it could well result in
performance surprises for people who are used to Oracle.

Any thoughts on what to do? One possibility is to flatten only
if the subquery doesn't contain any volatile functions.

2. The patch didn't touch the implicit-RTE code, which means that

WITH q AS ( SELECT ... )
SELECT q.*

will fail even if you've got add_missing_from enabled. I'm inclined
to think that this violates the principle of least surprise. On
the other hand, add_missing_from is certainly a legacy thing and maybe
we shouldn't bother expending any extra code to make it work with
new features. Thoughts?

3. ruleutils.c's get_name_for_var_field() hasn't implemented the
RTE_CTE case, which means that it doesn't work to reverse-list
examples like this:

explain verbose with qq as (select x from (values(1,2),(3,4)) as x(c1,c2))
select * from (select (x).c2 from qq offset 0) ss;

The reason I let this go is that while poking into it I found out that
get_name_for_var_field is pretty broken already; this fails in HEAD:

explain verbose select (x).c2 from
(select x from (values(1,2),(3,4)) as x(c1,c2) offset 0) ss ;

and this fails even in the back branches:

explain select * from
(select x from (values(1,2),(3,4)) as x(c1,c2) offset 0) ss
where (x).c2 > 0;

It seems we need some redesign in and around EXPLAIN to make that work
nicely, so I figured it would be reasonable to tackle that stuff as a
separate patch.

regards, tom lane


From: "Jaime Casanova" <jcasanov(at)systemguards(dot)com(dot)ec>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Yoshiyuki Asaba" <y-asaba(at)sraoss(dot)co(dot)jp>, "Tatsuo Ishii" <t-ishii(at)sra(dot)co(dot)jp>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Common Table Expressions applied; some issues remain
Date: 2008-10-04 23:21:51
Message-ID: 3073cc9b0810041621p168fef24t11a7cda7c0a99135@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Oct 4, 2008 at 5:22 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> that Oracle chooses to treat WITH-queries as if they were plain
> sub-selects if they're non-recursive and only referenced once.
> That is, Oracle would rewrite the above into
>
> SELECT * FROM ( SELECT * FROM foo ) AS q WHERE key = 42;
>
> and then flatten the sub-select and optimize normally. It would
> not be hard to make Postgres do the same, but then we would lose
> some guarantees about predictable execution of volatile functions.
>
[...]
>
> Any thoughts on what to do? One possibility is to flatten only
> if the subquery doesn't contain any volatile functions.
>

maybe i'm missing something but AFAIR postgres will not try to
optimize (push down/pull up) if it see any volatile function.

--
Atentamente,
Jaime Casanova
Soporte y capacitación de PostgreSQL
Asesoría y desarrollo de sistemas
Guayaquil - Ecuador
Cel. +59387171157


From: Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp>
To: tgl(at)sss(dot)pgh(dot)pa(dot)us
Cc: y-asaba(at)sraoss(dot)co(dot)jp, t-ishii(at)sra(dot)co(dot)jp, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Common Table Expressions applied; some issues remain
Date: 2008-10-04 23:44:36
Message-ID: 20081005.084436.82511246.t-ishii@sraoss.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> I've applied the latest version of the CTE patch. Congratulations on
> making that happen!

Great! and thanks, Tom. Without your great help, we cannot make it
reality. I also would like to thank to everyone who helped this
project!

> There are still some loose ends that need to be considered, though.

I think in addition to them, we need to update ecpg.

> 1. As committed, the patch takes an extremely hard line about WITH
> queries being evaluated independently of the main query and only once
> per main query execution. This could be seen as a good thing --- it
> provides much more determinism for execution of volatile functions
> within complex queries than was really available in the past. It could
> also be seen as a bad thing --- in particular, we won't push any
> limiting qualifications from the main query into the WITH queries.
> So for instance
>
> WITH q AS ( SELECT * FROM foo )
> SELECT * FROM q WHERE key = 42;
>
> is going to be executed quite inefficiently; it won't use an index on
> foo.key. I think we don't have much choice about this in the case of
> recursive WITH queries: it would be pretty difficult to determine
> whether pushing a restriction into a recursive WITH would change the
> results incorrectly. However, for plain non-recursive WITHs it's all
> a matter of definition. I gather from
> http://www.oracle-developer.net/display.php?id=212
> that Oracle chooses to treat WITH-queries as if they were plain
> sub-selects if they're non-recursive and only referenced once.
> That is, Oracle would rewrite the above into
>
> SELECT * FROM ( SELECT * FROM foo ) AS q WHERE key = 42;
>
> and then flatten the sub-select and optimize normally. It would
> not be hard to make Postgres do the same, but then we would lose
> some guarantees about predictable execution of volatile functions.
>
> I'm inclined to think that there is no reason to provide two
> different syntaxes to do the same thing, and so having the WITH
> syntax behave like this is okay. But it could well result in
> performance surprises for people who are used to Oracle.
>
> Any thoughts on what to do? One possibility is to flatten only
> if the subquery doesn't contain any volatile functions.
>
>
> 2. The patch didn't touch the implicit-RTE code, which means that
>
> WITH q AS ( SELECT ... )
> SELECT q.*
>
> will fail even if you've got add_missing_from enabled. I'm inclined
> to think that this violates the principle of least surprise. On
> the other hand, add_missing_from is certainly a legacy thing and maybe
> we shouldn't bother expending any extra code to make it work with
> new features. Thoughts?
>
>
> 3. ruleutils.c's get_name_for_var_field() hasn't implemented the
> RTE_CTE case, which means that it doesn't work to reverse-list
> examples like this:
>
> explain verbose with qq as (select x from (values(1,2),(3,4)) as x(c1,c2))
> select * from (select (x).c2 from qq offset 0) ss;
>
> The reason I let this go is that while poking into it I found out that
> get_name_for_var_field is pretty broken already; this fails in HEAD:
>
> explain verbose select (x).c2 from
> (select x from (values(1,2),(3,4)) as x(c1,c2) offset 0) ss ;
>
> and this fails even in the back branches:
>
> explain select * from
> (select x from (values(1,2),(3,4)) as x(c1,c2) offset 0) ss
> where (x).c2 > 0;
>
> It seems we need some redesign in and around EXPLAIN to make that work
> nicely, so I figured it would be reasonable to tackle that stuff as a
> separate patch.
--
Tatsuo Ishii
SRA OSS, Inc. Japan


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Yoshiyuki Asaba <y-asaba(at)sraoss(dot)co(dot)jp>, Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp>, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Common Table Expressions applied; some issues remain
Date: 2008-10-05 01:01:03
Message-ID: 48E811CF.6030800@hagander.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> I've applied the latest version of the CTE patch. Congratulations on
> making that happen!

This is great news. A big thanks to all the people involved in making
this happen.

I've had several people come up to me during the conference that I've
been at that just ended asking if I thought this would make it into 8.4.
It's certainly a much asked-for feature!

//Magnus


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Yoshiyuki Asaba <y-asaba(at)sraoss(dot)co(dot)jp>, Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp>, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Common Table Expressions applied; some issues remain
Date: 2008-10-05 06:11:11
Message-ID: 48E85A7F.5000806@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> that Oracle chooses to treat WITH-queries as if they were plain
> sub-selects if they're non-recursive and only referenced once.
> That is, Oracle would rewrite the above into
>
> SELECT * FROM ( SELECT * FROM foo ) AS q WHERE key = 42;
>
> and then flatten the sub-select and optimize normally. It would
> not be hard to make Postgres do the same, but then we would lose
> some guarantees about predictable execution of volatile functions.
>
> I'm inclined to think that there is no reason to provide two
> different syntaxes to do the same thing, and so having the WITH
> syntax behave like this is okay. But it could well result in
> performance surprises for people who are used to Oracle.
>
> Any thoughts on what to do? One possibility is to flatten only
> if the subquery doesn't contain any volatile functions.

I don't think we should overload syntax choices with optimization hints.
We don't really know why or how people will be using this syntax, and
labeling it from the start as "will have unusual performance behavior"
isn't a good sell.

As a precedent, consider the JOIN syntax, which is obviously redundant
and in its first implementation contained an implicit optimization hint
with regard to join order that later had to be done away with because it
confused users (I think). The CTE case is quite similar, and maybe the
GUC answer of old could apply here as well. But I think by default we
should abide by SQL's declarative approach of "Tell me what you want and
I'll execute it any way I like."

Also, why is predictability about volatile function executation a
requirement? Is there some typical use case that involves sequences
functions here or something?

> 2. The patch didn't touch the implicit-RTE code, which means that
>
> WITH q AS ( SELECT ... )
> SELECT q.*
>
> will fail even if you've got add_missing_from enabled. I'm inclined
> to think that this violates the principle of least surprise. On
> the other hand, add_missing_from is certainly a legacy thing and maybe
> we shouldn't bother expending any extra code to make it work with
> new features. Thoughts?

Yes, it's legacy. I wouldn't bother.


From: Gregory Stark <stark(at)enterprisedb(dot)com>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Yoshiyuki Asaba <y-asaba(at)sraoss(dot)co(dot)jp>, Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp>, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Common Table Expressions applied; some issues remain
Date: 2008-10-05 08:25:55
Message-ID: 87prmfo3cc.fsf@oxford.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Peter Eisentraut <peter_e(at)gmx(dot)net> writes:

> Tom Lane wrote:
>
>> 2. The patch didn't touch the implicit-RTE code, which means that
>>
>> WITH q AS ( SELECT ... )
>> SELECT q.*
>>
>> will fail even if you've got add_missing_from enabled. I'm inclined
>> to think that this violates the principle of least surprise. On
>> the other hand, add_missing_from is certainly a legacy thing and maybe
>> we shouldn't bother expending any extra code to make it work with
>> new features. Thoughts?
>
> Yes, it's legacy. I wouldn't bother.

The results would be even more suprising if there *is* a table named "q"
though...

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's Slony Replication support!


From: Gregory Stark <stark(at)enterprisedb(dot)com>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Yoshiyuki Asaba <y-asaba(at)sraoss(dot)co(dot)jp>, Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp>, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Common Table Expressions applied; some issues remain
Date: 2008-10-05 08:32:11
Message-ID: 87iqs7o31w.fsf@oxford.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


> Tom Lane wrote:
>> that Oracle chooses to treat WITH-queries as if they were plain
>> sub-selects if they're non-recursive and only referenced once.
>> That is, Oracle would rewrite the above into
>>
>> SELECT * FROM ( SELECT * FROM foo ) AS q WHERE key = 42;
>>
>> and then flatten the sub-select and optimize normally. It would
>> not be hard to make Postgres do the same, but then we would lose
>> some guarantees about predictable execution of volatile functions.
>>
>> I'm inclined to think that there is no reason to provide two
>> different syntaxes to do the same thing, and so having the WITH
>> syntax behave like this is okay. But it could well result in
>> performance surprises for people who are used to Oracle.
>>
>> Any thoughts on what to do? One possibility is to flatten only
>> if the subquery doesn't contain any volatile functions.

I think we should always inline the view if there's a single call site. If
people want to control the subsequent flattening they can do it the same way
they can do today for inline views using OFFSET 0.

The question in my mind is if we can do better for CTEs with multiple call
sites. If we have no volatile function calls in them then we should be free to
inline some or all call sites. I'm not sure we have enough information early
enough to make the decision though.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL training!


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Gregory Stark <stark(at)enterprisedb(dot)com>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, Yoshiyuki Asaba <y-asaba(at)sraoss(dot)co(dot)jp>, Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Common Table Expressions applied; some issues remain
Date: 2008-10-05 15:19:33
Message-ID: 2500.1223219973@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Gregory Stark <stark(at)enterprisedb(dot)com> writes:
> Peter Eisentraut <peter_e(at)gmx(dot)net> writes:
>> Tom Lane wrote:
>>> 2. The patch didn't touch the implicit-RTE code, which means that
>>>
>>> WITH q AS ( SELECT ... )
>>> SELECT q.*
>>>
>>> will fail even if you've got add_missing_from enabled.
>>
>> Yes, it's legacy. I wouldn't bother.

> The results would be even more suprising if there *is* a table named "q"
> though...

Yeah, the real problem is not so much that it might fail as that it
might silently do something quite different from what you would expect.

CVS HEAD documentation states

(In fact, the WITH query hides any real table of the same name for the
purposes of the primary query. If necessary, you can refer to a real
table of the same name by schema-qualifying the table's name.)

If we don't fix this, I think we'd have to add some disclaimer about
how WITH clauses *don't* hide real tables in the case of implicit RTE
additions. That seems much uglier than fixing it.

(Hmm, memo to self: I'll bet ruleutils.c's decision about whether it
needs to schema-qualify a reverse-listed table name doesn't take this
into account.)

regards, tom lane


From: "Dickson S(dot) Guedes" <guediz(at)gmail(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Gregory Stark" <stark(at)enterprisedb(dot)com>, "Peter Eisentraut" <peter_e(at)gmx(dot)net>, "Yoshiyuki Asaba" <y-asaba(at)sraoss(dot)co(dot)jp>, "Tatsuo Ishii" <t-ishii(at)sra(dot)co(dot)jp>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Common Table Expressions applied; some issues remain
Date: 2008-10-05 20:55:33
Message-ID: f0b79c6f0810051355m29b48fd8h76701a8a375dc130@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi all,

While i'm testing the HEAD version of CVS with this new feature, i
found a possible bug and like that more persons could try it in you
own box.

The attached file is a log of my test and I'm using a unprivileged
user to do it.

Thanks.
--
[]s
Dickson S. Guedes
-------------------------------------
Projeto Colmeia - Florianopolis, SC
(48) 3322-1185 ramal: 27
http://makeall.wordpress.com/
http://pgcon.postgresql.org.br/
http://planeta.postgresql.org.br/

Attachment Content-Type Size
bug_pgsql_cte.txt text/plain 4.9 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Dickson S(dot) Guedes" <guediz(at)gmail(dot)com>
Cc: "Gregory Stark" <stark(at)enterprisedb(dot)com>, "Peter Eisentraut" <peter_e(at)gmx(dot)net>, "Yoshiyuki Asaba" <y-asaba(at)sraoss(dot)co(dot)jp>, "Tatsuo Ishii" <t-ishii(at)sra(dot)co(dot)jp>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Common Table Expressions applied; some issues remain
Date: 2008-10-05 22:52:01
Message-ID: 14950.1223247121@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Dickson S. Guedes" <guediz(at)gmail(dot)com> writes:
> While i'm testing the HEAD version of CVS with this new feature, i
> found a possible bug and like that more persons could try it in you
> own box.

Yeah, that's a bug (two different ones in fact). Fixed --- thanks for
the report!

regards, tom lane


From: Decibel! <decibel(at)decibel(dot)org>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Yoshiyuki Asaba <y-asaba(at)sraoss(dot)co(dot)jp>, Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp>, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Common Table Expressions applied; some issues remain
Date: 2008-10-06 15:36:32
Message-ID: 814D6CF7-5AD9-4B43-90F4-C8775B4B2DD7@decibel.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Oct 5, 2008, at 1:11 AM, Peter Eisentraut wrote:
> I don't think we should overload syntax choices with optimization
> hints. We don't really know why or how people will be using this
> syntax, and labeling it from the start as "will have unusual
> performance behavior" isn't a good sell.
>
> As a precedent, consider the JOIN syntax, which is obviously
> redundant and in its first implementation contained an implicit
> optimization hint with regard to join order that later had to be
> done away with because it confused users (I think). The CTE case
> is quite similar, and maybe the GUC answer of old could apply here
> as well. But I think by default we should abide by SQL's
> declarative approach of "Tell me what you want and I'll execute it
> any way I like."

Agreed. It's already horrible that we suggest people use OFFSET 0,
only because we don't want to define formal optimizer hints (and
that's *exactly* what OFFSET 0 is).
--
Decibel!, aka Jim C. Nasby, Database Architect decibel(at)decibel(dot)org
Give your computer some brain candy! www.distributed.net Team #1828


From: "Robert Haas" <robertmhaas(at)gmail(dot)com>
To: Decibel! <decibel(at)decibel(dot)org>
Cc: "Peter Eisentraut" <peter_e(at)gmx(dot)net>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Yoshiyuki Asaba" <y-asaba(at)sraoss(dot)co(dot)jp>, "Tatsuo Ishii" <t-ishii(at)sra(dot)co(dot)jp>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Common Table Expressions applied; some issues remain
Date: 2008-10-06 20:04:23
Message-ID: 603c8f070810061304n55bb5ca7i9dbf80cfa58de998@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> Agreed. It's already horrible that we suggest people use OFFSET 0, only
> because we don't want to define formal optimizer hints (and that's *exactly*
> what OFFSET 0 is).

Yes, especially since TFM says:

"OFFSET 0 is the same as omitting the OFFSET clause."

Unless I'm looking at the wrong part of the manual?

...Robert


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Gregory Stark <stark(at)enterprisedb(dot)com>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, Yoshiyuki Asaba <y-asaba(at)sraoss(dot)co(dot)jp>, Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Common Table Expressions applied; some issues remain
Date: 2008-10-07 00:39:15
Message-ID: 1563.1223339955@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

[ back to the when-to-inline-WITHs discussion ]

Gregory Stark <stark(at)enterprisedb(dot)com> writes:
>> Tom Lane wrote:
>>> Any thoughts on what to do? One possibility is to flatten only
>>> if the subquery doesn't contain any volatile functions.

> I think we should always inline the view if there's a single call site. If
> people want to control the subsequent flattening they can do it the same way
> they can do today for inline views using OFFSET 0.

That's certainly a defensible choice, and in fact was what I had
intended to do at one point (that's why CommonTableExpr.cterefcount
is in there). However, the extent to which you can prevent duplicate
evaluation in an inline view is actually pretty limited. As an example
consider

select ... from table1,
(select expensive_function(...) from table2 offset 0) ss
where table1.key = table2.key;

If the planner chooses to do this as a nestloop with table2 on the
inside, then expensive_function() can get evaluated multiple times on
the same row of table2. We really don't make very many guarantees about
what will happen with functions inside inlined views, even with "offset
0" as an optimization fence. So I was thinking that taking a strong
reading of the spec's wording about single evaluation of WITH clauses
might provide useful leverage for people who need to control evaluation
of expensive or volatile functions better than they can now.

Another possibility that we could think about is: if a CTE is only
referenced once, then push down any restriction clauses that are
available at the single call site, but still execute it using the
CteScan materialization logic. The evaluation guarantee would then
look like "no row of the CTE's result is evaluated twice, but some rows
might not be evaluated at all". What we'd pay for this is that the CTE
could not be the inside of a nestloop with inner indexscan using a join
condition, since we don't have any way to keep track of which rows were
already fetched in that case.

regards, tom lane


From: "Robert Haas" <robertmhaas(at)gmail(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Gregory Stark" <stark(at)enterprisedb(dot)com>, "Peter Eisentraut" <peter_e(at)gmx(dot)net>, "Yoshiyuki Asaba" <y-asaba(at)sraoss(dot)co(dot)jp>, "Tatsuo Ishii" <t-ishii(at)sra(dot)co(dot)jp>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Common Table Expressions applied; some issues remain
Date: 2008-10-07 04:18:10
Message-ID: 603c8f070810062118u5fad359dlbb1bfc2d129ace2a@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> If the planner chooses to do this as a nestloop with table2 on the
> inside, then expensive_function() can get evaluated multiple times on
> the same row of table2. We really don't make very many guarantees about
> what will happen with functions inside inlined views, even with "offset
> 0" as an optimization fence. So I was thinking that taking a strong
> reading of the spec's wording about single evaluation of WITH clauses
> might provide useful leverage for people who need to control evaluation
> of expensive or volatile functions better than they can now.

+1 for a strong reading. I think the ability to prevent multiple
evaluations of expensive functions is key here.

> Another possibility that we could think about is: if a CTE is only
> referenced once, then push down any restriction clauses that are
> available at the single call site, but still execute it using the
> CteScan materialization logic. The evaluation guarantee would then
> look like "no row of the CTE's result is evaluated twice, but some rows
> might not be evaluated at all".

Assuming a perfectly intelligent optimizer, the only advantage of the
=1 guarantee over the <=1 guarantee is that you can evaluate the
entire CTE for side-effects and then fetch back only a subset of the
data to return to the user. This seems likely to be a pretty rare use
case, though, and the rest of the time you'd presumably prefer for
performance reasons to have as little of the CTE as possible
executed.... so +1 for <=1.

> What we'd pay for this is that the CTE
> could not be the inside of a nestloop with inner indexscan using a join
> condition, since we don't have any way to keep track of which rows were
> already fetched in that case.

Is it not possible to consider both plans? That is, compare the cost
of evaluating every row and then doing a nestloop with inner indexscan
versus using some other plan and evaluating only the rows meeting the
quals?

As a side note, in theory, I think you could generalize this to CTEs
with multiple call sites by taking the logical OR of the available
quals. This might not be worth it, though unless the quals are highly
selective.

...Robert


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Gregory Stark <stark(at)enterprisedb(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Yoshiyuki Asaba <y-asaba(at)sraoss(dot)co(dot)jp>, Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Common Table Expressions applied; some issues remain
Date: 2009-01-08 03:30:55
Message-ID: 200901080330.n083UtB15496@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Is this a TODO?

---------------------------------------------------------------------------

Tom Lane wrote:
> [ back to the when-to-inline-WITHs discussion ]
>
> Gregory Stark <stark(at)enterprisedb(dot)com> writes:
> >> Tom Lane wrote:
> >>> Any thoughts on what to do? One possibility is to flatten only
> >>> if the subquery doesn't contain any volatile functions.
>
> > I think we should always inline the view if there's a single call site. If
> > people want to control the subsequent flattening they can do it the same way
> > they can do today for inline views using OFFSET 0.
>
> That's certainly a defensible choice, and in fact was what I had
> intended to do at one point (that's why CommonTableExpr.cterefcount
> is in there). However, the extent to which you can prevent duplicate
> evaluation in an inline view is actually pretty limited. As an example
> consider
>
> select ... from table1,
> (select expensive_function(...) from table2 offset 0) ss
> where table1.key = table2.key;
>
> If the planner chooses to do this as a nestloop with table2 on the
> inside, then expensive_function() can get evaluated multiple times on
> the same row of table2. We really don't make very many guarantees about
> what will happen with functions inside inlined views, even with "offset
> 0" as an optimization fence. So I was thinking that taking a strong
> reading of the spec's wording about single evaluation of WITH clauses
> might provide useful leverage for people who need to control evaluation
> of expensive or volatile functions better than they can now.
>
> Another possibility that we could think about is: if a CTE is only
> referenced once, then push down any restriction clauses that are
> available at the single call site, but still execute it using the
> CteScan materialization logic. The evaluation guarantee would then
> look like "no row of the CTE's result is evaluated twice, but some rows
> might not be evaluated at all". What we'd pay for this is that the CTE
> could not be the inside of a nestloop with inner indexscan using a join
> condition, since we don't have any way to keep track of which rows were
> already fetched in that case.
>
> regards, tom lane
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Gregory Stark <stark(at)enterprisedb(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Yoshiyuki Asaba <y-asaba(at)sraoss(dot)co(dot)jp>, Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Common Table Expressions applied; some issues remain
Date: 2009-01-08 03:40:15
Message-ID: 5579.1231386015@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Bruce Momjian <bruce(at)momjian(dot)us> writes:
> Is this a TODO?

I'm inclined to leave it as-is, at least till we get some field
feedback about how people want it to behave.

regards, tom lane


From: Greg Stark <stark(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Yoshiyuki Asaba <y-asaba(at)sraoss(dot)co(dot)jp>, Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Common Table Expressions applied; some issues remain
Date: 2009-05-22 15:59:03
Message-ID: 4136ffa0905220859i74168759n8237b103cbbed369@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

(quoting more than usual to provide context because this is such an old thread)

On Sat, Oct 4, 2008 at 11:22 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> I've applied the latest version of the CTE patch.  Congratulations on
> making that happen!
>
> There are still some loose ends that need to be considered, though.
>
> 1. As committed, the patch takes an extremely hard line about WITH
> queries being evaluated independently of the main query and only once
> per main query execution.  This could be seen as a good thing --- it
> provides much more determinism for execution of volatile functions
> within complex queries than was really available in the past.  It could
> also be seen as a bad thing --- in particular, we won't push any
> limiting qualifications from the main query into the WITH queries.
> So for instance
>
>        WITH q AS ( SELECT * FROM foo )
>        SELECT * FROM q WHERE key = 42;
>
> is going to be executed quite inefficiently; it won't use an index on
> foo.key.  I think we don't have much choice about this in the case of
> recursive WITH queries: it would be pretty difficult to determine
> whether pushing a restriction into a recursive WITH would change the
> results incorrectly.  However, for plain non-recursive WITHs it's all
> a matter of definition.  I gather from
> http://www.oracle-developer.net/display.php?id=212
> that Oracle chooses to treat WITH-queries as if they were plain
> sub-selects if they're non-recursive and only referenced once.
> That is, Oracle would rewrite the above into
>
>        SELECT * FROM ( SELECT * FROM foo ) AS q WHERE key = 42;
>
> and then flatten the sub-select and optimize normally.  It would
> not be hard to make Postgres do the same, but then we would lose
> some guarantees about predictable execution of volatile functions.
>
> I'm inclined to think that there is no reason to provide two
> different syntaxes to do the same thing, and so having the WITH
> syntax behave like this is okay.  But it could well result in
> performance surprises for people who are used to Oracle.
>
> Any thoughts on what to do?  One possibility is to flatten only
> if the subquery doesn't contain any volatile functions.
>

One possibility would be to not flatten the query but find these quals
and copy them onto the cte when planning the cte. So we would still
materialize the result and avoid duplicate execution but only fetch
the records which we know a caller will need. We could even do that
for multiple callers if we join their quals with an OR -- that still
might allow a bitmap index scan.

I'm not sure we will work out with the order of in which the various
phases of analysis are done on the outer query compared to the
subquery.

--
greg


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Greg Stark <stark(at)enterprisedb(dot)com>
Cc: Yoshiyuki Asaba <y-asaba(at)sraoss(dot)co(dot)jp>, Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Common Table Expressions applied; some issues remain
Date: 2009-05-26 23:47:45
Message-ID: 21843.1243381665@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Greg Stark <stark(at)enterprisedb(dot)com> writes:
> [ point 1 here remains unresolved:
> http://archives.postgresql.org/message-id/9623.1223158943@sss.pgh.pa.us ]

> One possibility would be to not flatten the query but find these quals
> and copy them onto the cte when planning the cte. So we would still
> materialize the result and avoid duplicate execution but only fetch
> the records which we know a caller will need. We could even do that
> for multiple callers if we join their quals with an OR -- that still
> might allow a bitmap index scan.

I'm not too thrilled about that solution because it still eliminates
predictability of execution of volatile functions. It's really just a
partial form of subquery pullup, so we're paying all the disadvantages
for only a subset of the advantages.

I could still see doing what I mentioned in the prior message, which is
to flatten CTEs as if they are plain sub-selects when

1. they are non-recursive,
2. they are referenced only once, and
3. they contain no volatile functions.

Restriction #3 is what we need to ensure we aren't causing visible
semantics changes. You could argue #2 either way, I guess, but my
feeling is that if someone is using a doubly referenced CTE then he's
probably doing something more complex than we are currently prepared
to optimize well. I think we should let that case go until we
understand typical usage and possible optimizations better.

regards, tom lane


From: Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Greg Stark <stark(at)enterprisedb(dot)com>, Yoshiyuki Asaba <y-asaba(at)sraoss(dot)co(dot)jp>, Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Common Table Expressions applied; some issues remain
Date: 2009-05-27 03:29:58
Message-ID: e08cc0400905262029x2e08051fvc622500939a95255@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2009/5/27 Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>:
> Greg Stark <stark(at)enterprisedb(dot)com> writes:
>> [ point 1 here remains unresolved:
>>   http://archives.postgresql.org/message-id/9623.1223158943@sss.pgh.pa.us ]
>
>> One possibility would be to not flatten the query but find these quals
>> and copy them onto the cte when planning the cte. So we would still
>> materialize the result and avoid duplicate execution but only fetch
>> the records which we know a caller will need. We could even do that
>> for multiple callers if we join their quals with an OR -- that still
>> might allow a bitmap index scan.
>
> I'm not too thrilled about that solution because it still eliminates
> predictability of execution of volatile functions.  It's really just a
> partial form of subquery pullup, so we're paying all the disadvantages
> for only a subset of the advantages.
>
> I could still see doing what I mentioned in the prior message, which is
> to flatten CTEs as if they are plain sub-selects when
>
> 1. they are non-recursive,
> 2. they are referenced only once, and
> 3. they contain no volatile functions.
>

And 4. only if the sub-selects use index scan? Or in other cases would
it be effective?

Regards,

--
Hitoshi Harada


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>
Cc: Greg Stark <stark(at)enterprisedb(dot)com>, Yoshiyuki Asaba <y-asaba(at)sraoss(dot)co(dot)jp>, Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Common Table Expressions applied; some issues remain
Date: 2009-05-27 03:36:34
Message-ID: 26465.1243395394@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com> writes:
> 2009/5/27 Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>:
>> I could still see doing what I mentioned in the prior message, which is
>> to flatten CTEs as if they are plain sub-selects when
>>
>> 1. they are non-recursive,
>> 2. they are referenced only once, and
>> 3. they contain no volatile functions.

> And 4. only if the sub-selects use index scan? Or in other cases would
> it be effective?

Uh ... you've got the causality backwards, and I don't see the point of
such a restriction anyway.

regards, tom lane


From: Greg Stark <stark(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Yoshiyuki Asaba <y-asaba(at)sraoss(dot)co(dot)jp>, Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Common Table Expressions applied; some issues remain
Date: 2009-05-27 11:10:50
Message-ID: 4136ffa0905270410w57cb6c4ald8662f50370683ff@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, May 27, 2009 at 12:47 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> I'm not too thrilled about that solution because it still eliminates
> predictability of execution of volatile functions.

How so? It means the volatile function might only be executed for the
matching rows but the rows will still have the same value for the same
rows for all references to the CTE which seems like the key property
to me.

--
greg


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Greg Stark <stark(at)enterprisedb(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Yoshiyuki Asaba <y-asaba(at)sraoss(dot)co(dot)jp>, Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Common Table Expressions applied; some issues remain
Date: 2009-05-27 11:20:01
Message-ID: 4A1D21E1.1000309@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Greg Stark wrote:
> On Wed, May 27, 2009 at 12:47 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> I'm not too thrilled about that solution because it still eliminates
>> predictability of execution of volatile functions.
>
> How so? It means the volatile function might only be executed for the
> matching rows but the rows will still have the same value for the same
> rows for all references to the CTE which seems like the key property
> to me.

A volatile function could have side-effects, ie. insert rows to another
table. I would not recommend a design that relies on such behavior, but
it should be predictable how often the volatile function is run if you
do that.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Greg Stark <stark(at)enterprisedb(dot)com>
Cc: Yoshiyuki Asaba <y-asaba(at)sraoss(dot)co(dot)jp>, Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Common Table Expressions applied; some issues remain
Date: 2009-05-27 13:44:33
Message-ID: 7826.1243431873@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Greg Stark <stark(at)enterprisedb(dot)com> writes:
> On Wed, May 27, 2009 at 12:47 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> I'm not too thrilled about that solution because it still eliminates
>> predictability of execution of volatile functions.

> How so? It means the volatile function might only be executed for the
> matching rows

Exactly. If the point of the CTE is to ensure that nextval() is
executed N times, and it actually gets executed less than that,
then we've broken the semantics in a visible way.

regards, tom lane