CTE inlining

Lists: pgsql-hackers
From: Ilya Shkuratov <motr(dot)ilya(at)ya(dot)ru>
To: pgsql-hackers(at)postgresql(dot)org
Subject: CTE inlining
Date: 2017-04-29 17:45:00
Message-ID: 5351711493487900@web53g.yandex.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hello, dear hackers!

There is task in todo list about optional CTE optimization fence
disabling.

I am not interested at this point in disabling mechanism
implementation, but I would like to discuss the optimization
mechanism, that should work when the fence is disabled.

It seems, that we can replace CTE with subquery, so the optimizer
can do all available optimizations. This idea is quite
straightforward, but I could not find a discussion of it.
(Maybe it is so, because everyone knows that the idea is bad and it is
not worth to discuss. But I hope it is not, so I start this thread. =))

First of all, to such replacement to be valid, the CTE must be
1. non-writable (e.g. be of form: SELECT ...),
2. do not use VOLATILE or STABLE functions,
3. ... (maybe there must be more restrictions?)

Also, before inlining, we should check that some optimization
can be applied, using functions from
'pull_up_subqueries_recurse' and 'subquery_push_qual'.

If it is true, and there only one reference to CTE,
we can inline it immediately.

What it is not clear is how we should estimate whether it is worth
to inline, when there is multiple references. Here are my preliminary
ideas.

Let consider "pull up subquery" and "push down qualifiers" cases
separately.

For "push down qualifiers", if `subquery_push_qual` is `true`,
we can do the following:
1. copy CTE subquery,
2. push down quals,
3. find paths,
3. inline if cost of
(CTE scan) > (cheapest_path(subquery) + subquery scan)

Probably, this approach is not feasible, because it involves subquery
replaning, and we should consider a more "lightweight" heuristic.

For "pull up subquery" similar approach may lead to duplicate planning
of the whole query, that almost sure is too expensive.
So I wonder, is it possible to estimate a join predicate selectivity
against CTE subquery result and inline it if selectivity is "high" enough?
(If it is possible the same can be applied to the first case.)

I would be glad to hear feedback on described approach.

Ilya Shkuratov


From: Craig Ringer <craig(dot)ringer(at)2ndquadrant(dot)com>
To: Ilya Shkuratov <motr(dot)ilya(at)ya(dot)ru>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: CTE inlining
Date: 2017-04-30 03:51:18
Message-ID: CAMsr+YFFkbkp7NW0J5QykX-raEJCDaHaqXGpDJmNHW0ah3jw8g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 30 Apr. 2017 07:56, "Ilya Shkuratov" <motr(dot)ilya(at)ya(dot)ru> wrote:

Hello, dear hackers!

There is task in todo list about optional CTE optimization fence
disabling.

I am not interested at this point in disabling mechanism
implementation, but I would like to discuss the optimization
mechanism, that should work when the fence is disabled.

It's looking at what other DBMSes do.

Notably MS SQL Server. AFAIK its CTEs are a lot like query-scoped views.
They are simply updatable where possible, so you can write

WITH x AS (...)
UPDATE x SET ...

I do not know how MS SQL handles inlining and pullup/pushdown vs
materialization, handles multiple evaluation costs, etc.

This is the model I would want to aim for.

It seems, that we can replace CTE with subquery, so the optimizer
can do all available optimizations. This idea is quite
straightforward, but I could not find a discussion of it.
(Maybe it is so, because everyone knows that the idea is bad and it is
not worth to discuss. But I hope it is not, so I start this thread. =))

It's not bad for SELECT.

But there are complexities.

- CTE terms may contain data-mutating functions people are relying on not
multiply executing;

- we document that in postgres CTEs act as optimisation fences even with
the standard syntax. So users rely on this as a query hint. Personally I
want to relnotes this and tell people to use our OFFSET 0 hint instead, or
add a NOINLINE option to our CTEs, then make pg allow inlining by default.
This is a BC break, but not a big one if we restrict inlining of volatile.
And since we don't have query hints (*cough*) by project policy, we can't
really object to removing one can we?

- as you noted, it is hard to decide when it's worth inlining vs
materializing for CTE terms referenced more than once. We should possibly
start small and only inline single reference terms in the first release.
We'd continue to force materializing of multiple reference terms.

That'd at least help people who use CTEs to write clearer queries not
suffer for it. And it'd give us experience to help with conservatively
introducing multiple reference inlining.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Craig Ringer <craig(dot)ringer(at)2ndquadrant(dot)com>
Cc: Ilya Shkuratov <motr(dot)ilya(at)ya(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: CTE inlining
Date: 2017-04-30 04:28:46
Message-ID: 19072.1493526526@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Craig Ringer <craig(dot)ringer(at)2ndquadrant(dot)com> writes:
> - as you noted, it is hard to decide when it's worth inlining vs
> materializing for CTE terms referenced more than once.

[ raised eyebrow... ] Please explain why the answer isn't trivially
"never".

There's already a pretty large hill to climb here in the way of
breaking peoples' expectations about CTEs being optimization
fences. Breaking the documented semantics about CTEs being
single-evaluation seems to me to be an absolute non-starter.

regards, tom lane


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Craig Ringer <craig(dot)ringer(at)2ndquadrant(dot)com>, Ilya Shkuratov <motr(dot)ilya(at)ya(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: CTE inlining
Date: 2017-04-30 05:19:21
Message-ID: CAFj8pRDS4PoM2inJ9V981OxgD+0h3kg5vrAm5EW7V7SoKt_g_g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2017-04-30 6:28 GMT+02:00 Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>:

> Craig Ringer <craig(dot)ringer(at)2ndquadrant(dot)com> writes:
> > - as you noted, it is hard to decide when it's worth inlining vs
> > materializing for CTE terms referenced more than once.
>
> [ raised eyebrow... ] Please explain why the answer isn't trivially
> "never".
>
> There's already a pretty large hill to climb here in the way of
> breaking peoples' expectations about CTEs being optimization
> fences. Breaking the documented semantics about CTEs being
> single-evaluation seems to me to be an absolute non-starter.
>
>
why we cannot to introduce GUC option - enable_cteoptfence ?

Regards

Pavel

> regards, tom lane
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


From: Andres Freund <andres(at)anarazel(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Craig Ringer <craig(dot)ringer(at)2ndquadrant(dot)com>, Ilya Shkuratov <motr(dot)ilya(at)ya(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: CTE inlining
Date: 2017-04-30 05:28:16
Message-ID: 20170430052816.n2j52g4za37emp2f@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2017-04-30 00:28:46 -0400, Tom Lane wrote:
> There's already a pretty large hill to climb here in the way of
> breaking peoples' expectations about CTEs being optimization
> fences. Breaking the documented semantics about CTEs being
> single-evaluation seems to me to be an absolute non-starter.

If all referenced functions are non-volatile, I don't quite see the
problem? Personally I believe we'll have to offer a proper
anti-inlining workaround anyway, and in that case there's really nothing
that should stop us from inlining CTE without volatile functions twice?

- Andres


From: Craig Ringer <craig(dot)ringer(at)2ndquadrant(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Ilya Shkuratov <motr(dot)ilya(at)ya(dot)ru>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: CTE inlining
Date: 2017-04-30 05:58:14
Message-ID: CAMsr+YF-9M51ujw-MRXHtT61rW2d+XUU_u9RaHYSRjg4wG7CqQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 30 Apr. 2017 13:28, "Andres Freund" <andres(at)anarazel(dot)de> wrote:

On 2017-04-30 00:28:46 -0400, Tom Lane wrote:
> There's already a pretty large hill to climb here in the way of
> breaking peoples' expectations about CTEs being optimization
> fences. Breaking the documented semantics about CTEs being
> single-evaluation seems to me to be an absolute non-starter.

If all referenced functions are non-volatile, I don't quite see the
problem? Personally I believe we'll have to offer a proper
anti-inlining workaround anyway, and in that case there's really nothing
that should stop us from inlining CTE without volatile functions twice?

Exactly.

The initial implementation had limitations. So they got documented as
features, not bugs or possible future enhancements. Yay? So we're stuck
with it forever?

I agree we shouldn't break working, correct queries such that they return
different results. But if someone is lying about volatility they don't get
the expectation of correctness. And we have a policy against hints, so
surely we should be keen to remove this hack that serves as a hint - right?

We have OFFSET 0 for anyone really depending on it, and at least when you
read that you know to go "wtf" and look at the manual, wheras the CTE fence
behaviour is invisible and silent.

Yes, experienced and established postgres users expect the optimisation
fence behaviour. They abuse it as a query hint or work around it with
subqueries in FROM. They also know OFFSET 0 ... and ideally should even
read the relnotes. Users from other DMBSes looking to migrate, and new
users, are regularly surprised by our CTEs. I see it a lot on Stack
Overflow and other places outside our comfortable walls.

Personally I find it very annoying when I'd like to use CTEs to structure
queries more readably, but land up having to use subqueries in FROM instead.

Like the work Andes has been doing on our bizarre handing of SRFs in the
SELECT target list I really think it's just something that needs to be done.


From: Andres Freund <andres(at)anarazel(dot)de>
To: Craig Ringer <craig(dot)ringer(at)2ndquadrant(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Ilya Shkuratov <motr(dot)ilya(at)ya(dot)ru>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: CTE inlining
Date: 2017-04-30 07:46:33
Message-ID: 20170430074633.k2w5pvtze7ra2kto@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

On 2017-04-30 13:58:14 +0800, Craig Ringer wrote:
> We have OFFSET 0 for anyone really depending on it, and at least when you
> read that you know to go "wtf" and look at the manual, wheras the CTE fence
> behaviour is invisible and silent.

I don't think that's a good idea. What if you need to prevent inlining
of something that actually needs an offset? What if the behaviour of
offset is ever supposed to change? Relying more on that seems to just
be repeating the mistake around CTEs.

> Like the work Andes has been doing on our bizarre handing of SRFs in the
> SELECT target list I really think it's just something that needs to be
> done.

With help from Tom, luckily...

Greetings,

Andres Freund


From: Ilya Shkuratov <motr(dot)ilya(at)ya(dot)ru>
To: Craig Ringer <craig(dot)ringer(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: CTE inlining
Date: 2017-04-30 08:34:48
Message-ID: 5949171493541288@web26j.yandex.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

<div> </div><div> </div><div>30.04.2017, 08:58, "Craig Ringer" &lt;craig(dot)ringer(at)2ndquadrant(dot)com&gt;:</div><blockquote type="cite"><div><div> <div> <div>On 30 Apr. 2017 13:28, "Andres Freund" &lt;<a href="mailto:andres(at)anarazel(dot)de">andres(at)anarazel(dot)de</a>&gt; wrote:<blockquote style="margin:0 0 0 0.8ex;border-left:1px #ccc solid;padding-left:1ex;"><div>On <span>2017-04-30 00</span>:28:46 -0400, Tom Lane wrote:<br />&gt; There's already a pretty large hill to climb here in the way of<br />&gt; breaking peoples' expectations about CTEs being optimization<br />&gt; fences.  Breaking the documented semantics about CTEs being<br />&gt; single-evaluation seems to me to be an absolute non-starter.<br /> </div>If all referenced functions are non-volatile, I don't quite see the<br />problem?  Personally I believe we'll have to offer a proper<br />anti-inlining workaround anyway, and in that case there's really nothing<br />that should stop us from inlining CTE without volatile functions twice?</blockquote></div></div></div><div> </div><div>Exactly.</div><div> </div><div>The initial implementation had limitations. So they got documented as features, not bugs or possible future enhancements. Yay? So we're stuck with it forever?</div><div> </div><div>I agree we shouldn't break working, correct queries such that they return different results. But if someone is lying about volatility they don't get the expectation of correctness. And we have a policy against hints, so surely we should be keen to remove this hack that serves as a hint - right? </div><div> </div><div>We have OFFSET 0 for anyone really depending on it, and at least when you read that you know to go "wtf" and look at the manual, wheras the CTE fence behaviour is invisible and silent.</div><div> </div><div><div style="font-family:sans-serif;">Yes, experienced and established postgres users expect the optimisation fence behaviour. They abuse it as a query hint or work around it with subqueries in FROM. They also know OFFSET 0 ... and ideally should even read the relnotes. Users from other DMBSes looking to migrate, and new users, are regularly surprised by our CTEs. I see it a lot on Stack Overflow and other places outside our comfortable walls. </div><div style="font-family:sans-serif;"> </div><div style="font-family:sans-serif;">Personally I find it very annoying when I'd like to use CTEs to structure queries more readably, but land up having to use subqueries in FROM instead.</div><div style="font-family:sans-serif;"> </div><div style="font-family:sans-serif;">Like the work Andes has been doing on our bizarre handing of SRFs in the SELECT target list I really think it's just something that needs to be done.</div></div><div> </div></div></blockquote><div> </div><div>Also, I would like to remind that the disabling optimization fence is suggested to be OPTIONAL.</div><div>So we don't break peoples' expectations, nor documented semantics.</div>

Attachment Content-Type Size
unknown_filename text/html 2.9 KB

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Craig Ringer <craig(dot)ringer(at)2ndquadrant(dot)com>
Cc: Ilya Shkuratov <motr(dot)ilya(at)ya(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: CTE inlining
Date: 2017-04-30 21:54:48
Message-ID: 03074a71-b6b5-9fb8-0ac2-e63a0732ca6e@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 04/30/2017 06:28 AM, Tom Lane wrote:
> Craig Ringer <craig(dot)ringer(at)2ndquadrant(dot)com> writes:
>> - as you noted, it is hard to decide when it's worth inlining vs
>> materializing for CTE terms referenced more than once.
>
> [ raised eyebrow... ] Please explain why the answer isn't trivially
> "never".
>
> There's already a pretty large hill to climb here in the way of
> breaking peoples' expectations about CTEs being optimization
> fences. Breaking the documented semantics about CTEs being
> single-evaluation seems to me to be an absolute non-starter.
>

I'm not sure that's a universal expectation, though. I know there are
people who actually do rely on that intentionally, no doubt about that.
And we'd nee to make it work for them.

But I keep running into people who face serious performance issues
exactly because not realizing this, and using CTEs as named subqueries.
And when I tell them "optimization fence" they react "Whaaaaaaat?"

If I had to make up some numbers, I'd say the "Whaaaaat?" group is about
10x the group of people who intentionally rely on CTEs being
optimization fences.

FWIW I don't know how to do this. There were multiple attempts at this
in the past, none of them succeeded. But perhaps we could at least
propagate some of the CTE features, so that the outside query can
benefit from that (e.g. when the CTE is sorted, we could set the
sortkeys). That wouldn't break the fence thing, but it would allow other
stuff.

regard

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: CTE inlining
Date: 2017-04-30 22:00:18
Message-ID: d3fdf85c-46ae-41fe-8e90-bf4edc5c9e81@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 04/30/2017 09:46 AM, Andres Freund wrote:
> Hi,
>
> On 2017-04-30 13:58:14 +0800, Craig Ringer wrote:
>> We have OFFSET 0 for anyone really depending on it, and at least when you
>> read that you know to go "wtf" and look at the manual, wheras the CTE fence
>> behaviour is invisible and silent.
>
> I don't think that's a good idea. What if you need to prevent inlining
> of something that actually needs an offset? What if the behaviour of
> offset is ever supposed to change? Relying more on that seems to just
> be repeating the mistake around CTEs.
>

I agree with this. But OFFSET 0 would force people to modify the queries
anyway, so why not just introduce a new keyword instead? Something like:

WITH FENCED a (SELECT ...)

But I think something like that was proposed not too long ago, and did
not make it for some reason.

There's a lot of other CTE improvements that would be great. Say, being
able to define indexes on them, but that's really a separate topic.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: David Fetter <david(at)fetter(dot)org>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Craig Ringer <craig(dot)ringer(at)2ndquadrant(dot)com>, Ilya Shkuratov <motr(dot)ilya(at)ya(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: CTE inlining
Date: 2017-04-30 22:25:23
Message-ID: 20170430222523.GB9169@fetter.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Apr 30, 2017 at 11:54:48PM +0200, Tomas Vondra wrote:
> On 04/30/2017 06:28 AM, Tom Lane wrote:
> > Craig Ringer <craig(dot)ringer(at)2ndquadrant(dot)com> writes:
> > > - as you noted, it is hard to decide when it's worth inlining vs
> > > materializing for CTE terms referenced more than once.
> >
> > [ raised eyebrow... ] Please explain why the answer isn't trivially
> > "never".
> >
> > There's already a pretty large hill to climb here in the way of
> > breaking peoples' expectations about CTEs being optimization
> > fences. Breaking the documented semantics about CTEs being
> > single-evaluation seems to me to be an absolute non-starter.
> >
>
> I'm not sure that's a universal expectation, though. I know there
> are people who actually do rely on that intentionally, no doubt
> about that. And we'd nee to make it work for them.
>
> But I keep running into people who face serious performance issues
> exactly because not realizing this, and using CTEs as named
> subqueries. And when I tell them "optimization fence" they react
> "Whaaaaaaat?"
>
> If I had to make up some numbers, I'd say the "Whaaaaat?" group is
> about 10x the group of people who intentionally rely on CTEs being
> optimization fences.

I suspect you're off by at least a couple of orders of magnitude here,
which make this even more important to deal with.

Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


From: Andres Freund <andres(at)anarazel(dot)de>
To: Ilya Shkuratov <motr(dot)ilya(at)ya(dot)ru>
Cc: Craig Ringer <craig(dot)ringer(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: CTE inlining
Date: 2017-04-30 23:15:30
Message-ID: 20170430231530.dfasuem2kwrfipxb@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2017-04-30 11:34:48 +0300, Ilya Shkuratov wrote:
> Also, I would like to remind that the disabling optimization fence is suggested
> to be OPTIONAL.
> So we don't break peoples' expectations, nor documented semantics.

I think however is that that's not good enough, because it'll surprise
people forever, whereas using something as an intentional barrier
usually is, well, intentional.

Greetings,

Andres Freund


From: Andres Freund <andres(at)anarazel(dot)de>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Craig Ringer <craig(dot)ringer(at)2ndquadrant(dot)com>, Ilya Shkuratov <motr(dot)ilya(at)ya(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: CTE inlining
Date: 2017-04-30 23:21:33
Message-ID: 20170430232133.o7tcwxja7w6ah4jv@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2017-04-30 07:19:21 +0200, Pavel Stehule wrote:
> why we cannot to introduce GUC option - enable_cteoptfence ?

Doesn't really solve the issue, and we've generally shied away from GUCs
that influence behaviour after a few bad experiences. What if you want
one CTE inlined, but another one not?

- Andres


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Craig Ringer <craig(dot)ringer(at)2ndquadrant(dot)com>, Ilya Shkuratov <motr(dot)ilya(at)ya(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: CTE inlining
Date: 2017-05-01 04:22:50
Message-ID: CAFj8pRAFWxQTQJJ5Fr3F85oFWATTyHt0BoU+0-pXsO6u0nUviA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2017-05-01 1:21 GMT+02:00 Andres Freund <andres(at)anarazel(dot)de>:

> On 2017-04-30 07:19:21 +0200, Pavel Stehule wrote:
> > why we cannot to introduce GUC option - enable_cteoptfence ?
>
> Doesn't really solve the issue, and we've generally shied away from GUCs
> that influence behaviour after a few bad experiences. What if you want
> one CTE inlined, but another one not?
>

It change behave in same sense like enable_nestloop, enable_hashjoin, ...
with same limits.

Regards

Pavel

>
> - Andres
>


From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: CTE inlining
Date: 2017-05-01 13:05:06
Message-ID: 3d54c456-a094-380d-88b4-08ed6df177da@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 05/01/2017 06:22 AM, Pavel Stehule wrote:
>
>
> 2017-05-01 1:21 GMT+02:00 Andres Freund <andres(at)anarazel(dot)de
> <mailto:andres(at)anarazel(dot)de>>:
>
> On 2017-04-30 07:19:21 +0200, Pavel Stehule wrote:
> > why we cannot to introduce GUC option - enable_cteoptfence ?
>
> Doesn't really solve the issue, and we've generally shied away from GUCs
> that influence behaviour after a few bad experiences. What if you want
> one CTE inlined, but another one not?
>
>
> It change behave in same sense like enable_nestloop, enable_hashjoin,
> ... with same limits.
>

Those (and also the other enable_*) GUCs are a great example why we
should not use GUCs for tweaking planner behavior, except perhaps for
the purpose of investigation. It's an extremely blunt tool.

You typically want to affect just a single node in the query plan (say,
one join), but those options don't allow you to do that. It's all or
nothing thing.

Even if you're OK with affecting the whole query, it's a separate
control channel - it's not embedded in the query, the user has to set it
somehow. So you either set it for the whole session (affecting all the
other queries that don't really need it), or you set it before each
query. Which however sucks for a number of reasons, e.g. if you have a
slow query in the log, how do you know with what GUC values it was
executed? (You don't, and there's no way to find out.)

Exactly the same issues would affect this new GUC. It would be
impossible to use multiple CTEs in the query with different fencing
behavior, and it would be just as difficult to investigate.

So no more planner-affecting GUCs, please, particularly if we expect
regular users to use them.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: CTE inlining
Date: 2017-05-01 13:22:42
Message-ID: 7c784fd6-36ff-dd30-aad6-6cac3b6292cc@2ndQuadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 05/01/2017 09:05 AM, Tomas Vondra wrote:
> On 05/01/2017 06:22 AM, Pavel Stehule wrote:
>>
>>
>> 2017-05-01 1:21 GMT+02:00 Andres Freund <andres(at)anarazel(dot)de
>> <mailto:andres(at)anarazel(dot)de>>:
>>
>> On 2017-04-30 07:19:21 +0200, Pavel Stehule wrote:
>> > why we cannot to introduce GUC option - enable_cteoptfence ?
>>
>> Doesn't really solve the issue, and we've generally shied away
>> from GUCs
>> that influence behaviour after a few bad experiences. What if
>> you want
>> one CTE inlined, but another one not?
>>
>>
>> It change behave in same sense like enable_nestloop, enable_hashjoin,
>> ... with same limits.
>>
>
> Those (and also the other enable_*) GUCs are a great example why we
> should not use GUCs for tweaking planner behavior, except perhaps for
> the purpose of investigation. It's an extremely blunt tool.
>
> You typically want to affect just a single node in the query plan
> (say, one join), but those options don't allow you to do that. It's
> all or nothing thing.
>
> Even if you're OK with affecting the whole query, it's a separate
> control channel - it's not embedded in the query, the user has to set
> it somehow. So you either set it for the whole session (affecting all
> the other queries that don't really need it), or you set it before
> each query. Which however sucks for a number of reasons, e.g. if you
> have a slow query in the log, how do you know with what GUC values it
> was executed? (You don't, and there's no way to find out.)
>
> Exactly the same issues would affect this new GUC. It would be
> impossible to use multiple CTEs in the query with different fencing
> behavior, and it would be just as difficult to investigate.
>
> So no more planner-affecting GUCs, please, particularly if we expect
> regular users to use them.
>
>

+1

I still see users wanting to use the enable_foo settings in production.

Having had years of telling users that CTEs are an optimization fence it
doesn't seem at all nice for us to turn around and change our mind about
that. I have relied on it in the past and I'm sure I'm very far from
alone in that.

Maybe we could allow a "decorator" that would tell the planner the CTE
could be inlined?

WITH INLINE mycte AS ( ...)

cheers

andrew

--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Craig Ringer <craig(at)2ndquadrant(dot)com>
To: Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: CTE inlining
Date: 2017-05-01 13:46:02
Message-ID: CAMsr+YEkOjHEsGbHAPy0ki0WSD1-KkQEiq8LO_EJFcoyodMTqA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 1 May 2017 at 21:22, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com> wrote:

> Having had years of telling users that CTEs are an optimization fence it
> doesn't seem at all nice for us to turn around and change our mind about
> that. I have relied on it in the past and I'm sure I'm very far from
> alone in that.
>
> Maybe we could allow a "decorator" that would tell the planner the CTE
> could be inlined?
>
> WITH INLINE mycte AS ( ...)

I'd rather reverse that so we behave like other implementations by
default, and have extension syntax for our no-inline query hint. And
yes, that's what it is, because we'd only inline when we could produce
semantically equivalent results anyway.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: David Fetter <david(at)fetter(dot)org>
To: Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: CTE inlining
Date: 2017-05-01 14:17:34
Message-ID: 20170501141734.GD28992@fetter.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, May 01, 2017 at 09:22:42AM -0400, Andrew Dunstan wrote:
> > So no more planner-affecting GUCs, please, particularly if we expect
> > regular users to use them.
>
> +1
>
> I still see users wanting to use the enable_foo settings in production.
>
> Having had years of telling users that CTEs are an optimization fence it
> doesn't seem at all nice for us to turn around and change our mind about
> that. I have relied on it in the past and I'm sure I'm very far from
> alone in that.

You are certainly not alone, but I believe that in this you're missing
the vast majority (we hope) of PostgreSQL users. These are the users
who have yet to adopt PostgreSQL, and have the quite reasonable
expectation that ordinary-looking grammar *isn't* an optimization
fence.

> Maybe we could allow a "decorator" that would tell the planner the CTE
> could be inlined?
>
> WITH INLINE mycte AS ( ...)

+1 for a decorator, -1 for this one.

We already have an explicit optimization fence with OFFSET 0, and I
think making optimization fences explicit is how we should continue.
I'd be more in favor of something along the lines of

WITH FENCED /* Somewhat fuzzy. What fence? */
or
WITH AT_MOST_ONCE /* Clearer, but not super precise */
or
WITH UNIQUE_ATOMIC /* More descriptive, but not super clear without the docs in hand */

or something along that line.

Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


From: Andreas Karlsson <andreas(at)proxel(dot)se>
To: David Fetter <david(at)fetter(dot)org>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: CTE inlining
Date: 2017-05-01 14:26:22
Message-ID: b9ff8365-95af-e1ed-3c41-f2a778b78d77@proxel.se
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 05/01/2017 04:17 PM, David Fetter wrote:
>> Maybe we could allow a "decorator" that would tell the planner the CTE
>> could be inlined?
>>
>> WITH INLINE mycte AS ( ...)
>
> +1 for a decorator, -1 for this one.

I am not sure I like decorators since this means adding an ad hoc query
hint directly into the SQL syntax which is something which I requires
serious consideration.

> We already have an explicit optimization fence with OFFSET 0, and I
> think making optimization fences explicit is how we should continue.
> I'd be more in favor of something along the lines of
>
> WITH FENCED /* Somewhat fuzzy. What fence? */
> or
> WITH AT_MOST_ONCE /* Clearer, but not super precise */
> or
> WITH UNIQUE_ATOMIC /* More descriptive, but not super clear without the docs in hand */
>
> or something along that line.

What about WITH MATERIALIZED, borrowing from the MySQL terminology
"materialized subquery"?

Andreas


From: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
To: Andreas Karlsson <andreas(at)proxel(dot)se>
Cc: David Fetter <david(at)fetter(dot)org>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: CTE inlining
Date: 2017-05-01 14:33:26
Message-ID: CAKFQuwayrvy8F00DJh-P-PXMG7Vjv3Qzb=o-gr89s=SjZ_OZ=w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, May 1, 2017 at 7:26 AM, Andreas Karlsson <andreas(at)proxel(dot)se> wrote:

> On 05/01/2017 04:17 PM, David Fetter wrote:
>
>> Maybe we could allow a "decorator" that would tell the planner the CTE
>>> could be inlined?
>>>
>>> WITH INLINE mycte AS ( ...)
>>>
>>
>> +1 for a decorator, -1 for this one.
>>
>
> I am not sure I like decorators since this means adding an ad hoc query
> hint directly into the SQL syntax which is something which I requires
> serious consideration.

​Given that we already have
​"​
prevent optimization
​"​
syntax why do we need a decorator on the CTE?

>
>
> We already have an explicit optimization fence with OFFSET 0, and I
>> think making optimization fences explicit is how we should continue.
>> I'd be more in favor of something along the lines of
>>
>> WITH FENCED /* Somewhat fuzzy. What fence? */
>> or
>> WITH AT_MOST_ONCE /* Clearer, but not super precise */
>> or
>> WITH UNIQUE_ATOMIC /* More descriptive, but not super clear without
>> the docs in hand */
>>
>> or something along that line.
>>
>
> What about WITH MATERIALIZED, borrowing from the MySQL terminology
> "materialized subquery"?

​I would shorten that to "WITH MAT" except that I don't think that having
two way to introduce an optimization fence is worthwhile.

If we don't optimize SRFs-in-target-list, and thus avoid multiple function
evaluation for (composite_col).*, I believe a significant number of
intentional optimization fence uses will be safe but behavioral changes.

David J.


From: Andreas Karlsson <andreas(at)proxel(dot)se>
To: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
Cc: David Fetter <david(at)fetter(dot)org>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: CTE inlining
Date: 2017-05-01 14:43:11
Message-ID: 4b7bdda9-a94a-684a-9059-7a423afe03d3@proxel.se
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 05/01/2017 04:33 PM, David G. Johnston wrote:
> On Mon, May 1, 2017 at 7:26 AM, Andreas Karlsson <andreas(at)proxel(dot)se
> I am not sure I like decorators since this means adding an ad hoc
> query hint directly into the SQL syntax which is something which I
> requires serious consideration.
>
> ​Given that we already have
> ​"​
> prevent optimization
> ​"​
> syntax why do we need a decorator on the CTE?

I do not think I follow. Me and some other people here would ideally
allow CTEs to be inlined by default. Some people today use CTEs as
optimization fences, to for example control join order, and the
suggestion here is to add new syntax for CTEs to allow them to
selectively be used as optimization fences.

> ​I would shorten that to "WITH MAT" except that I don't think that
> having two way to introduce an optimization fence is worthwhile.

You mean OFFSET 0? I have never been a fan of using it as an
optimization fence. I do not think OFFSET 0 conveys clearly enough to
the reader that is is an optimization fence.

Andreas


From: Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
To: Andreas Karlsson <andreas(at)proxel(dot)se>
Cc: David Fetter <david(at)fetter(dot)org>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: CTE inlining
Date: 2017-05-01 14:44:44
Message-ID: CADkLM=diezey-2kAiyMTX3Fq_PnYiDQ3mMFp+Nh3gQVdabnRCQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, May 1, 2017 at 10:26 AM, Andreas Karlsson <andreas(at)proxel(dot)se> wrote:

> What about WITH MATERIALIZED, borrowing from the MySQL terminology
> "materialized subquery"?

+1, you beat me to it.


From: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
To: Andreas Karlsson <andreas(at)proxel(dot)se>
Cc: David Fetter <david(at)fetter(dot)org>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: CTE inlining
Date: 2017-05-01 14:53:06
Message-ID: CAKFQuwaxWAOB8kCwwJjAx=co+Nmmy5_HOgs37_-jRoUqc09=Rg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, May 1, 2017 at 7:43 AM, Andreas Karlsson <andreas(at)proxel(dot)se> wrote:

> On 05/01/2017 04:33 PM, David G. Johnston wrote:
> > On Mon, May 1, 2017 at 7:26 AM, Andreas Karlsson <andreas(at)proxel(dot)se
> > I am not sure I like decorators since this means adding an ad hoc
> > query hint directly into the SQL syntax which is something which I
> > requires serious consideration.
>
> > ​I would shorten that to "WITH MAT" except that I don't think that
> > having two way to introduce an optimization fence is worthwhile.
>
> You mean OFFSET 0? I have never been a fan of using it as an optimization
> fence. I do not think OFFSET 0 conveys clearly enough to the reader that is
> is an optimization fence.

​I think I was being too literal in my thinking. Proposing that we
effectively do away with OFFSET 0 and instead recommend "WITH MAT name AS
()" for subqueries requiring standalone evaluation is something I can agree
with. I too have always though of OFFSET 0 as hack-ish which is why my SRF
usage examples have always used CTEs.

David J.​


From: Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>
To: David Fetter <david(at)fetter(dot)org>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: CTE inlining
Date: 2017-05-01 15:04:47
Message-ID: d2a3c0b7-568a-b729-cb82-19132b9f54ea@2ndQuadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 05/01/2017 10:17 AM, David Fetter wrote:
> On Mon, May 01, 2017 at 09:22:42AM -0400, Andrew Dunstan wrote:
>>> So no more planner-affecting GUCs, please, particularly if we expect
>>> regular users to use them.
>> +1
>>
>> I still see users wanting to use the enable_foo settings in production.
>>
>> Having had years of telling users that CTEs are an optimization fence it
>> doesn't seem at all nice for us to turn around and change our mind about
>> that. I have relied on it in the past and I'm sure I'm very far from
>> alone in that.
> You are certainly not alone, but I believe that in this you're missing
> the vast majority (we hope) of PostgreSQL users. These are the users
> who have yet to adopt PostgreSQL, and have the quite reasonable
> expectation that ordinary-looking grammar *isn't* an optimization
> fence.
>

I am not in favor of seriously inconveniencing a significant number of
our existing users for the sake of a bunch of people who don't now and
might never in the future use Postgres. I think the bar for silent
behaviour changes needs to be a bit higher than this one is.

cheers

andrew

--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Craig Ringer <craig(at)2ndquadrant(dot)com>
To: Andreas Karlsson <andreas(at)proxel(dot)se>
Cc: David Fetter <david(at)fetter(dot)org>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: CTE inlining
Date: 2017-05-02 02:38:02
Message-ID: CAMsr+YFRVCavAxeVAqbwMXsv=ZrgW5h4Wjhzx9NUnKB11G4ZKQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 1 May 2017 at 22:26, Andreas Karlsson <andreas(at)proxel(dot)se> wrote:

> I am not sure I like decorators since this means adding an ad hoc query hint
> directly into the SQL syntax which is something which I requires serious
> consideration.

And mangling the semantics of existing syntax doesn't?

That's what we do right now so we can pretend we don't have query
hints while still having query hints.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Craig Ringer <craig(dot)ringer(at)2ndquadrant(dot)com>, Ilya Shkuratov <motr(dot)ilya(at)ya(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: CTE inlining
Date: 2017-05-02 02:42:18
Message-ID: CA+TgmoYEQpxrpLvPeJhjH85z2qnw2qiTCtuN8TeUt2iBFAghvA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Apr 30, 2017 at 5:54 PM, Tomas Vondra
<tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
> But I keep running into people who face serious performance issues exactly
> because not realizing this, and using CTEs as named subqueries. And when I
> tell them "optimization fence" they react "Whaaaaaaat?"
>
> If I had to make up some numbers, I'd say the "Whaaaaat?" group is about 10x
> the group of people who intentionally rely on CTEs being optimization
> fences.

+1.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Craig Ringer <craig(at)2ndquadrant(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: CTE inlining
Date: 2017-05-02 02:45:40
Message-ID: CAMsr+YFZbNEYfYV9GNgL_y4ETtsHqeJm_U3+0Ebodjfrewf+Vw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 1 May 2017 at 21:05, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
> On 05/01/2017 06:22 AM, Pavel Stehule wrote:
>>
>>
>>
>> 2017-05-01 1:21 GMT+02:00 Andres Freund <andres(at)anarazel(dot)de
>> <mailto:andres(at)anarazel(dot)de>>:
>>
>> On 2017-04-30 07:19:21 +0200, Pavel Stehule wrote:
>> > why we cannot to introduce GUC option - enable_cteoptfence ?
>>
>> Doesn't really solve the issue, and we've generally shied away from
>> GUCs
>> that influence behaviour after a few bad experiences. What if you
>> want
>> one CTE inlined, but another one not?
>>
>>
>> It change behave in same sense like enable_nestloop, enable_hashjoin,
>> ... with same limits.
>>
>
> Those (and also the other enable_*) GUCs are a great example why we should
> not use GUCs for tweaking planner behavior, except perhaps for the purpose
> of investigation. It's an extremely blunt tool.

Definitely agree. You can't use them to affect only a portion of a
query. In fact there's no way to automatically scope them to one query
at all. They're also very big hammers, and as we introduce new types
of plan nodes they won't offer comprehensive control without being
amended. They're a tool of last resort for dealing with problems.

> Exactly the same issues would affect this new GUC. It would be impossible to
> use multiple CTEs in the query with different fencing behavior, and it would
> be just as difficult to investigate.

Yeah, I think a GUC is a non-starter.

If we want fence behaviour, we should require people to declare their
desire for fence behaviour, rather than treating it as a sort of
hint-as-a-bug that we grandfather in because we're so desperate not to
admit we have hints.

Before forming a strong view on this, I strongly everyone stick your
head outside the postgresql.org lists for a while. In my experience
even regular postgres users I see pop up on places like Stack Overflow
tend to react to this behaviour with "WTF?!" and wonder why we haven't
fixed this limitation yet, viewing it as a bug not a feature.

The same logic being applied here should've prevented us from ever introducing:

* inlining of SQL functions
* inlining of views
* inlining of subqueries

... but somehow, this one is different.

We're not even removing functionality. You can still use the OFFSET 0
hack. If you need a nonzero offset that's fine, because we don't
inline over OFFSET anyway.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Craig Ringer <craig(at)2ndquadrant(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: CTE inlining
Date: 2017-05-02 06:23:26
Message-ID: CAMsr+YGFX3ohSsaZOb3JvRBwd03BBVvjwz8rZFcLV7TmJttt_A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2 May 2017 at 10:45, Craig Ringer <craig(at)2ndquadrant(dot)com> wrote:

> If we want fence behaviour, we should require people to declare their
> desire for fence behaviour, rather than treating it as a sort of
> hint-as-a-bug that we grandfather in because we're so desperate not to
> admit we have hints.

Answers to my question here https://dba.stackexchange.com/q/27425/7788
suggest that there's no justification in the standard for treating it
as a mandatory fence.

Here https://www.postgresql.org/message-id/29918.1320244719@sss.pgh.pa.us
Tom says

"CTEs are also treated as optimization fences; this is not so much an
optimizer limitation as to keep the semantics sane when the CTE contains
a writable query."

In my view, if we address that by walking the query tree for volatile
functions, we should be perfectly fine to inline single-reference
CTEs. Is that too simplistic?

Is the sticking point simply this docs text:

"A useful property of WITH queries is that they are evaluated only
once per execution of the parent query, even if they are referred to
more than once by the parent query or sibling WITH queries. Thus,
expensive calculations that are needed in multiple places can be
placed within a WITH query to avoid redundant work. Another possible
application is to prevent unwanted multiple evaluations of functions
with side-effects. However, the other side of this coin is that the
optimizer is less able to push restrictions from the parent query down
into a WITH query than an ordinary subquery. The WITH query will
generally be evaluated as written, without suppression of rows that
the parent query might discard afterwards. (But, as mentioned above,
evaluation might stop early if the reference(s) to the query demand
only a limited number of rows.)"

?

... because to me, that seems to offer a great deal of wiggle room in
"less able", "generally", etc, even if we treated the docs as a hard
specification of future behaviour. (Which we don't, since we change
things and document the changes).

We may also stop evaluation early, so there's already no guarantee the
full query is run.

Is there any *technical* rather than policy reason we could not safely
inline a CTE term referenced by only one query, where the CTE term is
a SELECT, contains no volatile function calls, is not RECURSIVE, and
does not its self reference another non-inlineable CTE term?

Josh Kupershmidt, in a comment on my blog post on this topic some time
ago (https://blog.2ndquadrant.com/postgresql-ctes-are-optimization-fences/),
pointed out that

"One notable exception to the “(non-writeable) CTEs are always
materialized” rule is that if a non-writeable CTE node is not
referenced anywhere, it won’t actually be evaluated. For example, this
query returns 1 instead of bombing out:

WITH not_executed AS (SELECT 1/0),

executed AS (SELECT 1)

SELECT * FROM executed;"

Prior discussions:

* https://www.postgresql.org/message-id/201209191305.44674.db@kavod.com
* http://archives.postgresql.org/pgsql-performance/2011-10/msg00208.php
* https://www.postgresql.org/message-id/29918.1320244719@sss.pgh.pa.us

Relevant posts where users get confused by our behaviour:

* https://dba.stackexchange.com/q/127828/7788
* https://news.ycombinator.com/item?id=7023907
* https://dba.stackexchange.com/q/84760/7788
* https://dba.stackexchange.com/q/97393/7788
* http://stackoverflow.com/q/20403792/398670
* http://stackoverflow.com/q/33731068/398670
* ....

Also, comments on my post:

* https://blog.2ndquadrant.com/postgresql-ctes-are-optimization-fences/

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Andreas Karlsson <andreas(at)proxel(dot)se>
To: Craig Ringer <craig(at)2ndquadrant(dot)com>
Cc: David Fetter <david(at)fetter(dot)org>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: CTE inlining
Date: 2017-05-02 12:40:55
Message-ID: 570bfb75-43ac-b423-8d4a-2a6f55219d92@proxel.se
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 05/02/2017 04:38 AM, Craig Ringer wrote:
> On 1 May 2017 at 22:26, Andreas Karlsson <andreas(at)proxel(dot)se> wrote:
>> I am not sure I like decorators since this means adding an ad hoc query hint
>> directly into the SQL syntax which is something which I requires serious
>> consideration.
>
> And mangling the semantics of existing syntax doesn't?
>
> That's what we do right now so we can pretend we don't have query
> hints while still having query hints.

I am in favor of removing the optimization fence from CTEs, and strongly
prefer no fence being the default behavior since SQL is a declarative
language and I think it is reasonable to assume that CTEs can be
inlined. But the question is how to best remove the fence while taking
into account that quite many use them as optimization fences today.

I see some alternatives, none of them perfect.

1. Just remove the optimization fence and let people add OFFSET 0 to
their queries if they want an optimization fence. This lets us keep
pretending that we do not have query hints (and therefore do not have to
formalize any syntax for them) while still allowing people to add
optimization fences.

2. Add a decorator for WITH (e.g. "WITH MATERIALIZED x (...") to add an
explicit optimization fence. This will for the first time add official
support for a query hint in the syntax which is a quite big precedent.

3. Add a new GUC which can enable and disable the optimization fence.
This is a very clumsy tool, but maybe good enough for some users and
some people here in this thread have complained about our similar GUCs.

4. Add some new more generic query hinting facility. This is a lot of
work and something which would be very hard to get consensus for.

Andreas


From: Oleg Bartunov <obartunov(at)gmail(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Craig Ringer <craig(dot)ringer(at)2ndquadrant(dot)com>, Ilya Shkuratov <motr(dot)ilya(at)ya(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: CTE inlining
Date: 2017-05-02 13:06:08
Message-ID: CAF4Au4x1WfsuwK2au9UACemar2ZkkCyp8TfA0mTXE3MRgvcaMA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, May 1, 2017 at 7:22 AM, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:
>
>
> 2017-05-01 1:21 GMT+02:00 Andres Freund <andres(at)anarazel(dot)de>:
>>
>> On 2017-04-30 07:19:21 +0200, Pavel Stehule wrote:
>> > why we cannot to introduce GUC option - enable_cteoptfence ?
>>
>> Doesn't really solve the issue, and we've generally shied away from GUCs
>> that influence behaviour after a few bad experiences. What if you want
>> one CTE inlined, but another one not?
>
>
> It change behave in same sense like enable_nestloop, enable_hashjoin, ...
> with same limits.

And then we recall plan hints :)

>
> Regards
>
> Pavel
>
>>
>>
>> - Andres
>
>


From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Craig Ringer <craig(dot)ringer(at)2ndquadrant(dot)com>, Ilya Shkuratov <motr(dot)ilya(at)ya(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: CTE inlining
Date: 2017-05-02 14:13:08
Message-ID: CAHyXU0yOHGWAaTaHN9PUEFLCqGbksEUtLvuH4ruHANH4SovYWg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Apr 30, 2017 at 6:21 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> On 2017-04-30 07:19:21 +0200, Pavel Stehule wrote:
>> why we cannot to introduce GUC option - enable_cteoptfence ?
>
> Doesn't really solve the issue, and we've generally shied away from GUCs
> that influence behaviour after a few bad experiences. What if you want
> one CTE inlined, but another one not?

Yeah. Are we absolutely opposed to SQL syntax against WITH that
allows or disallows fencing? for example,

WITH [MATERIALIZED]

Pushing people to OFFSET 0 is a giant step backwards IMO, and as in
implementation detail is also subject to change.

merlin


From: Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Craig Ringer <craig(dot)ringer(at)2ndquadrant(dot)com>, Ilya Shkuratov <motr(dot)ilya(at)ya(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: CTE inlining
Date: 2017-05-02 14:44:33
Message-ID: f4a6e5be-b322-e224-e9fc-8fed9bff284c@2ndQuadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 05/02/2017 10:13 AM, Merlin Moncure wrote:
> On Sun, Apr 30, 2017 at 6:21 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
>> On 2017-04-30 07:19:21 +0200, Pavel Stehule wrote:
>>> why we cannot to introduce GUC option - enable_cteoptfence ?
>> Doesn't really solve the issue, and we've generally shied away from GUCs
>> that influence behaviour after a few bad experiences. What if you want
>> one CTE inlined, but another one not?
> Yeah. Are we absolutely opposed to SQL syntax against WITH that
> allows or disallows fencing? for example,
>
> WITH [MATERIALIZED]
>
> Pushing people to OFFSET 0 is a giant step backwards IMO, and as in
> implementation detail is also subject to change.
>
>

Agreed, it's an ugly as sin and completely non-obvious hack.

cheers

andrew

--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Craig Ringer <craig(dot)ringer(at)2ndquadrant(dot)com>, Ilya Shkuratov <motr(dot)ilya(at)ya(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: CTE inlining
Date: 2017-05-02 15:43:45
Message-ID: 2333be52-7cb8-e722-d30e-f42e37fd66c0@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 5/2/17 4:44 PM, Andrew Dunstan wrote:
>
>
> On 05/02/2017 10:13 AM, Merlin Moncure wrote:
>> On Sun, Apr 30, 2017 at 6:21 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
>>> On 2017-04-30 07:19:21 +0200, Pavel Stehule wrote:
>>>> why we cannot to introduce GUC option - enable_cteoptfence ?
>>> Doesn't really solve the issue, and we've generally shied away from GUCs
>>> that influence behaviour after a few bad experiences. What if you want
>>> one CTE inlined, but another one not?
>> Yeah. Are we absolutely opposed to SQL syntax against WITH that
>> allows or disallows fencing? for example,
>>
>> WITH [MATERIALIZED]
>>
>> Pushing people to OFFSET 0 is a giant step backwards IMO, and as in
>> implementation detail is also subject to change.
>>
>>
>
> Agreed, it's an ugly as sin and completely non-obvious hack.
>

Isn't OFFSET 0 an implementation detail anyway? Who says the planner
couldn't get smarter in the future, realize OFFSET 0 is no-op?

In that case replacing CTE optimization fence with "OFFSET 0" would be
akin to painting yourself into a corner, waiting for the pain to dry,
walking over to another corner and painting yourself into that one.

cheers

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Thomas Kellerer <spam_eater(at)gmx(dot)net>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: CTE inlining
Date: 2017-05-02 16:01:14
Message-ID: 1493740874227-5959313.post@n3.nabble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> Relevant posts where users get confused by our behaviour:
> ....

And Markus Winand's blog: http://modern-sql.com/feature/with/performance

Databases generally obey this principle, although PostgreSQL represents
a big exception

and

Besides PostgreSQL, all tested databases optimize with clauses in the
same way
that they optimize views or derived tables"

--
View this message in context: http://www.postgresql-archive.org/CTE-inlining-tp5958992p5959313.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


From: David Fetter <david(at)fetter(dot)org>
To: Andreas Karlsson <andreas(at)proxel(dot)se>
Cc: Craig Ringer <craig(at)2ndquadrant(dot)com>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: CTE inlining
Date: 2017-05-02 16:34:07
Message-ID: 20170502163407.GE9099@fetter.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, May 02, 2017 at 02:40:55PM +0200, Andreas Karlsson wrote:
> On 05/02/2017 04:38 AM, Craig Ringer wrote:
> > On 1 May 2017 at 22:26, Andreas Karlsson <andreas(at)proxel(dot)se> wrote:
> > > I am not sure I like decorators since this means adding an ad hoc query hint
> > > directly into the SQL syntax which is something which I requires serious
> > > consideration.
> >
> > And mangling the semantics of existing syntax doesn't?
> >
> > That's what we do right now so we can pretend we don't have query
> > hints while still having query hints.
>
> I am in favor of removing the optimization fence from CTEs, and strongly
> prefer no fence being the default behavior since SQL is a declarative
> language and I think it is reasonable to assume that CTEs can be inlined.
> But the question is how to best remove the fence while taking into account
> that quite many use them as optimization fences today.
>
> I see some alternatives, none of them perfect.
>
> 1. Just remove the optimization fence and let people add OFFSET 0 to their
> queries if they want an optimization fence. This lets us keep pretending
> that we do not have query hints (and therefore do not have to formalize any
> syntax for them) while still allowing people to add optimization fences.

+1

I get that people with gigantic PostgreSQL installations with
stringent performance requirements sometimes need to do odd things to
squeeze out the last few percentage points of performance. As the
people (well, at least the people close to the ground) at these
organizations are fully aware, performance optimizations are extremely
volatile with respect to new versions of software, whether it's
PostgreSQL, Oracle, the Linux kernel, or what have you. They expect
this, and they have processes in place to handle it. If they don't,
it's pilot error.

We should not be penalizing all our other users to maintain the
fiction that people can treat performance optimizations as a "fire and
forget" matter.

> 2. Add a decorator for WITH (e.g. "WITH MATERIALIZED x (...") to add an
> explicit optimization fence. This will for the first time add official
> support for a query hint in the syntax which is a quite big precedent.

Yep. It's one we should think very carefully before we introduce.

> 3. Add a new GUC which can enable and disable the optimization fence. This
> is a very clumsy tool, but maybe good enough for some users and some people
> here in this thread have complained about our similar GUCs.

Any GUC would be unable to distinguish one WITH clause from another.
The hammer would then be guaranteed to be too big for precisely the
cases where it's most needed.

> 4. Add some new more generic query hinting facility. This is a lot
> of work and something which would be very hard to get consensus for.

Just the design of the thing would be the work of months at a minimum,
assuming we got to some consensus at all. Maybe it's worth doing.

Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


From: Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
To: David Fetter <david(at)fetter(dot)org>
Cc: Andreas Karlsson <andreas(at)proxel(dot)se>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: CTE inlining
Date: 2017-05-02 16:51:35
Message-ID: CADkLM=eQDjYqAe_e0nQm_c-rEx9UsCiAn+P0=96cuHkBZn-xpw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

>
> I get that people with gigantic PostgreSQL installations with
> stringent performance requirements sometimes need to do odd things to
> squeeze out the last few percentage points of performance. As the
> people (well, at least the people close to the ground) at these
> organizations are fully aware, performance optimizations are extremely
> volatile with respect to new versions of software, whether it's
> PostgreSQL, Oracle, the Linux kernel, or what have you. They expect
> this, and they have processes in place to handle it. If they don't,
> it's pilot error.
>

Well put. People on the ground in those situations go to great lengths to
freeze the query plan as-is. For them, an upgrade is something that is done
after months of planning. They might be surprised by the dropping of this
optimization fence, but the surprise won't be in production, and they've
got just as good of chance of being pleasantly surprised.

> > 2. Add a decorator for WITH (e.g. "WITH MATERIALIZED x (...") to add an
> > explicit optimization fence. This will for the first time add official
> > support for a query hint in the syntax which is a quite big precedent.
>
> Yep. It's one we should think very carefully before we introduce.
>

There's a tiny, oblique precedence for this with Oracle's WITH [FUNCTION |
PROCEDURE] syntax. In both cases, the user is able to create an ephemeral
object that can be referenced later in the query. Good idea or bad, it's a
sign that others have been fine with that conceptual path.

Personally, I'm fine with WITH MATERIALIZED, but I'm also ok with just not
promising the fence. I think there is value in letting users break up a
complex query into understandable WITH-chunks, and that value shouldn't
prevent good performance. The fence will probably still be there anyway in
the case of INSERT/UPDATE RETURNING and cases where a CTE is referenced
more than once in the query that follows.


From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: David Fetter <david(at)fetter(dot)org>, Andreas Karlsson <andreas(at)proxel(dot)se>
Cc: Craig Ringer <craig(at)2ndquadrant(dot)com>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: CTE inlining
Date: 2017-05-02 17:05:10
Message-ID: 5f1657e1-6412-c117-def1-a039e3483f76@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 5/2/17 6:34 PM, David Fetter wrote:
> On Tue, May 02, 2017 at 02:40:55PM +0200, Andreas Karlsson wrote:
>> On 05/02/2017 04:38 AM, Craig Ringer wrote:
>>> On 1 May 2017 at 22:26, Andreas Karlsson <andreas(at)proxel(dot)se> wrote:
>>
>> ...
>>
>> I see some alternatives, none of them perfect.
>>
>> 1. Just remove the optimization fence and let people add OFFSET 0 to their
>> queries if they want an optimization fence. This lets us keep pretending
>> that we do not have query hints (and therefore do not have to formalize any
>> syntax for them) while still allowing people to add optimization fences.
>
> +1
>
> I get that people with gigantic PostgreSQL installations with
> stringent performance requirements sometimes need to do odd things to
> squeeze out the last few percentage points of performance. As the
> people (well, at least the people close to the ground) at these
> organizations are fully aware, performance optimizations are extremely
> volatile with respect to new versions of software, whether it's
> PostgreSQL, Oracle, the Linux kernel, or what have you. They expect
> this, and they have processes in place to handle it. If they don't,
> it's pilot error.
>
> We should not be penalizing all our other users to maintain the
> fiction that people can treat performance optimizations as a "fire and
> forget" matter.
>

Agreed.

>> 2. Add a decorator for WITH (e.g. "WITH MATERIALIZED x (...") to add an
>> explicit optimization fence. This will for the first time add official
>> support for a query hint in the syntax which is a quite big precedent.
>
> Yep. It's one we should think very carefully before we introduce.
>

I think it's a mistake to see this as an introduction of query hits.

Firstly, it's a question whether it qualifies as a hint. I wouldn't call
it a hint, but let's assume there is a definition of query hints that
includes WITH MATERIALIZED.

More importantly, however, this is not introducing anything new. It's
just a different name for the current "WITH" semantics, and you can
achieve the same behavior by "OFFSET 0". And people are already using
these as hints, so I fail to see how this introduces anything new.

In fact, if you see the optimization fence as an implicit query hint,
this actually *removes* a hint (although most users are unaware of that
behavior and use it unintentionally).

>> 3. Add a new GUC which can enable and disable the optimization fence. This
>> is a very clumsy tool, but maybe good enough for some users and some people
>> here in this thread have complained about our similar GUCs.
>
> Any GUC would be unable to distinguish one WITH clause from another.
> The hammer would then be guaranteed to be too big for precisely the
> cases where it's most needed.
>

If I could, I'd give -1 million to a GUC-based approach, as that would
make it entirely unusable in practice, I think.

Actually, I can give -1 million, so I'm giving it.

>>
>> 4. Add some new more generic query hinting facility. This is a lot
>> of work and something which would be very hard to get consensus for.
>
> Just the design of the thing would be the work of months at a minimum,
> assuming we got to some consensus at all. Maybe it's worth doing.
>

While I came to conclusion that query hints may be quite useful in some
situations, I'm pretty sure this is not a battle you'd like to fight.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: David Fetter <david(at)fetter(dot)org>, Andreas Karlsson <andreas(at)proxel(dot)se>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: CTE inlining
Date: 2017-05-02 21:23:16
Message-ID: CAHyXU0zWtJnhx4K3T11MafYJ7ARBpt5KqpeNvY2SY6VHePJ53w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

\On Tue, May 2, 2017 at 12:05 PM, Tomas Vondra
<tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
> On 5/2/17 6:34 PM, David Fetter wrote:
>>
>> On Tue, May 02, 2017 at 02:40:55PM +0200, Andreas Karlsson wrote:
>>>
>>> On 05/02/2017 04:38 AM, Craig Ringer wrote:
>>>>
>>>> On 1 May 2017 at 22:26, Andreas Karlsson <andreas(at)proxel(dot)se> wrote:
>
>>>
>>>
>>> ...
>>>
>>> I see some alternatives, none of them perfect.
>>>
>>> 1. Just remove the optimization fence and let people add OFFSET 0 to
>>> their
>>> queries if they want an optimization fence. This lets us keep pretending
>>> that we do not have query hints (and therefore do not have to formalize
>>> any
>>> syntax for them) while still allowing people to add optimization fences.
>>
>>
>> +1
>>
>> I get that people with gigantic PostgreSQL installations with
>> stringent performance requirements sometimes need to do odd things to
>> squeeze out the last few percentage points of performance. As the
>> people (well, at least the people close to the ground) at these
>> organizations are fully aware, performance optimizations are extremely
>> volatile with respect to new versions of software, whether it's
>> PostgreSQL, Oracle, the Linux kernel, or what have you. They expect
>> this, and they have processes in place to handle it. If they don't,
>> it's pilot error.
>>
>> We should not be penalizing all our other users to maintain the
>> fiction that people can treat performance optimizations as a "fire and
>> forget" matter.
>>
>
> Agreed.
>
>>> 2. Add a decorator for WITH (e.g. "WITH MATERIALIZED x (...") to add an
>>> explicit optimization fence. This will for the first time add official
>>> support for a query hint in the syntax which is a quite big precedent.
>>
>>
>> Yep. It's one we should think very carefully before we introduce.
>>
>
> I think it's a mistake to see this as an introduction of query hits.
>
> Firstly, it's a question whether it qualifies as a hint. I wouldn't call it
> a hint, but let's assume there is a definition of query hints that includes
> WITH MATERIALIZED.
>
> More importantly, however, this is not introducing anything new. It's just a
> different name for the current "WITH" semantics, and you can achieve the
> same behavior by "OFFSET 0". And people are already using these as hints, so
> I fail to see how this introduces anything new.
>
> In fact, if you see the optimization fence as an implicit query hint, this
> actually *removes* a hint (although most users are unaware of that behavior
> and use it unintentionally).

+1 down the line. More to the point, for several years now we've (or
at least I, but I'm not the only one) have been advocating for the
usage of CTE to avoid the undocumented and bizarre OFFSET 0 trick.
Jerking this out from users without giving a simple mechanic to get
the same behavior minus a major query rewrite is blatantly user
hostile. I can't believe we're even contemplating it. Also a GUC is
not a solution for pretty obvious reasons I think.

merlin


From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: David Fetter <david(at)fetter(dot)org>, Andreas Karlsson <andreas(at)proxel(dot)se>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: CTE inlining
Date: 2017-05-02 23:00:38
Message-ID: c12d6f9b-f0eb-6b0d-164f-4de897bdc459@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 5/2/17 11:23 PM, Merlin Moncure wrote:
> \On Tue, May 2, 2017 at 12:05 PM, Tomas Vondra
> <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>> On 5/2/17 6:34 PM, David Fetter wrote:
>>>
>>> On Tue, May 02, 2017 at 02:40:55PM +0200, Andreas Karlsson wrote:
>>>>
>>>> On 05/02/2017 04:38 AM, Craig Ringer wrote:
>>>>>
>>>>> On 1 May 2017 at 22:26, Andreas Karlsson <andreas(at)proxel(dot)se> wrote:
>>
>>>>
>>>>
>>>> ...
>>>>
>>>> I see some alternatives, none of them perfect.
>>>>
>>>> 1. Just remove the optimization fence and let people add OFFSET 0 to
>>>> their
>>>> queries if they want an optimization fence. This lets us keep pretending
>>>> that we do not have query hints (and therefore do not have to formalize
>>>> any
>>>> syntax for them) while still allowing people to add optimization fences.
>>>
>>>
>>> +1
>>>
>>> I get that people with gigantic PostgreSQL installations with
>>> stringent performance requirements sometimes need to do odd things to
>>> squeeze out the last few percentage points of performance. As the
>>> people (well, at least the people close to the ground) at these
>>> organizations are fully aware, performance optimizations are extremely
>>> volatile with respect to new versions of software, whether it's
>>> PostgreSQL, Oracle, the Linux kernel, or what have you. They expect
>>> this, and they have processes in place to handle it. If they don't,
>>> it's pilot error.
>>>
>>> We should not be penalizing all our other users to maintain the
>>> fiction that people can treat performance optimizations as a "fire and
>>> forget" matter.
>>>
>>
>> Agreed.
>>
>>>> 2. Add a decorator for WITH (e.g. "WITH MATERIALIZED x (...") to add an
>>>> explicit optimization fence. This will for the first time add official
>>>> support for a query hint in the syntax which is a quite big precedent.
>>>
>>>
>>> Yep. It's one we should think very carefully before we introduce.
>>>
>>
>> I think it's a mistake to see this as an introduction of query hits.
>>
>> Firstly, it's a question whether it qualifies as a hint. I wouldn't call it
>> a hint, but let's assume there is a definition of query hints that includes
>> WITH MATERIALIZED.
>>
>> More importantly, however, this is not introducing anything new. It's just a
>> different name for the current "WITH" semantics, and you can achieve the
>> same behavior by "OFFSET 0". And people are already using these as hints, so
>> I fail to see how this introduces anything new.
>>
>> In fact, if you see the optimization fence as an implicit query hint, this
>> actually *removes* a hint (although most users are unaware of that behavior
>> and use it unintentionally).
>
> +1 down the line. More to the point, for several years now we've (or
> at least I, but I'm not the only one) have been advocating for the
> usage of CTE to avoid the undocumented and bizarre OFFSET 0 trick.
> Jerking this out from users without giving a simple mechanic to get
> the same behavior minus a major query rewrite is blatantly user
> hostile. I can't believe we're even contemplating it. Also a GUC is
> not a solution for pretty obvious reasons I think.
>

I'm not sure what you mean by "jerking this out from users". Isn't most
of this thread about how to allow CTE inlining without hurting users
unnecessarily?

I think we agree that:

* Just removing the optimization fence and telling users to use OFFSET 0
instead is a no-go, just like removing the fencing and not providing any
sensible replacement.

* GUC is not the solution.

Which leaves us with either WITH INLINE or WITH MATERIALIZE, or
something along those lines.

If we go with WITH INLINE then we're likely not solving anything,
because most people will simply use WITH just like now, and will be
subject to the fencing without realizing it.

Or we will choose WITH MATERIALIZE, and then the users aware of the
fencing (and using the CTEs for that purpose) will have to modify the
queries. But does adding MATERIALIZE quality as major query rewrite?

Perhaps combining this with a GUC would be a solution. I mean, a GUC
specifying the default behavior, and then INLINE / MATERIALIZE for
individual CTEs in a query?

If you have an application intentionally using CTEs as a fence, just do

ALTER DATABASE x SET enable_guc_fencing = on

and you don't have to rewrite the queries.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Craig Ringer <craig(at)2ndquadrant(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Merlin Moncure <mmoncure(at)gmail(dot)com>, David Fetter <david(at)fetter(dot)org>, Andreas Karlsson <andreas(at)proxel(dot)se>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: CTE inlining
Date: 2017-05-03 05:19:22
Message-ID: CAMsr+YHPKb-JzhscZnHsYyYR_mi78GP-EfOQ8H4txGr=PwnWKA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 3 May 2017 at 07:00, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:

> I'm not sure what you mean by "jerking this out from users". Isn't most of
> this thread about how to allow CTE inlining without hurting users
> unnecessarily?

He's referring to

Andreas Karlsson <andreas(at)proxel(dot)se> wrote:
> 1. Just remove the optimization fence and let people add OFFSET 0 to their queries if they want an optimization fence. This lets us keep pretending that we do not have query hints (and therefore do not have to formalize any syntax for them) while still allowing people to add optimization fences.

... and the various agreement expressed with it.

I wish that's what had been done in the first place, but it's a bit
harsh on users to do it now.

We can keep living in fantasty-land where we deny hints while telling
people to use a washy-worded semi-documented hint to work around
issues.

Or we can take responsibility and admit we've had to grandfather a
limitation in as a blessed hint. Then unblock our way toward this
performance enhancement / user foot-gun without yanking the rug out
from under users who've been relying on our odd implementation detail
to date:

* Add "WITH MATERIALIZED" or "WITH NO_INLINE" or whatever
* Change "WITH" to inline where safe, initially only for single reference terms
* Mention the change in the release notes

Note that I'm in no way advocating real hints. I won't say I love all
aspects of project policy there, but it's clear there are benefits,
and that hints bring plenty of problems. If this weren't already
documented and widely advocated as a way to "fix" poor
pushdown/pullup/inlining decisions I'd be firmly in the "just fix it"
camp.

> * Just removing the optimization fence and telling users to use OFFSET 0
> instead is a no-go, just like removing the fencing and not providing any
> sensible replacement.

I could tolerate telling people to use OFFSET 0 (and documenting it!)
as a workaround if we can't get something more friendly in.

At least then we're back down to one hint-not-a-hint instead of two,
and it doesn't really block any useful optimisations.

> * GUC is not the solution.

Strong agreement.

> If we go with WITH INLINE then we're likely not solving anything, because
> most people will simply use WITH just like now, and will be subject to the
> fencing without realizing it.

Yes, and we're missing the opportunity to confirm with what other
systems do, and the spirit of the SQL language's declare what I want,
not how to do it, model.

> Or we will choose WITH MATERIALIZE, and then the users aware of the fencing
> (and using the CTEs for that purpose) will have to modify the queries. But
> does adding MATERIALIZE quality as major query rewrite?

Hardly.

> Perhaps combining this with a GUC would be a solution. I mean, a GUC
> specifying the default behavior, and then INLINE / MATERIALIZE for
> individual CTEs in a query?

It'd be nice if we could do that for a couple of releases as an
interim measure, but people will then get locked into relying on it,
and we'll never be able to remove it.

It's not like we don't make users do other things for upgrades, and
don't change performance in other ways. We don't even HAVE a
performance-test farm to look at regressions, planner behaviour
changes, etc. Yes, I know it's hard and nobody's volunteering, the
point is, we're hardly free of undocumented and unintentional
query-specific regressions let alone documented and relnoted ones.

So a sad -1 to me for a GUC.

Anyone big enough to be significantly upset by this planner change
will have a QA/staging deployment system anyway. Or should, because we
make enough other changes in a major release to make their life way
more interesting than this!

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: David Fetter <david(at)fetter(dot)org>, Andreas Karlsson <andreas(at)proxel(dot)se>, Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: CTE inlining
Date: 2017-05-03 13:54:39
Message-ID: a91929f6-d3da-fa53-c42c-31b32062e88e@2ndQuadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 05/02/2017 07:00 PM, Tomas Vondra wrote:
>
>
>
> I think we agree that:
>
> * Just removing the optimization fence and telling users to use OFFSET
> 0 instead is a no-go, just like removing the fencing and not providing
> any sensible replacement.
>
> * GUC is not the solution.

yes

>
> Which leaves us with either WITH INLINE or WITH MATERIALIZE, or
> something along those lines.
>
> If we go with WITH INLINE then we're likely not solving anything,
> because most people will simply use WITH just like now, and will be
> subject to the fencing without realizing it.

In many cases it won't actually matter much.

We're going to penalize some group of users no matter what we do. It
just seems a pity that it might be the group who actually took us at our
word. It's no skin off my nose - I will happily spend time finding
places where this will make things worse for customers.

>
> Or we will choose WITH MATERIALIZE, and then the users aware of the
> fencing (and using the CTEs for that purpose) will have to modify the
> queries. But does adding MATERIALIZE quality as major query rewrite?

It's not a major rewrite. But I can think of at least one former
customer who will have to go through a heck of a lot of code finding
where to add that one word.

>
> Perhaps combining this with a GUC would be a solution. I mean, a GUC
> specifying the default behavior, and then INLINE / MATERIALIZE for
> individual CTEs in a query?
>
> If you have an application intentionally using CTEs as a fence, just do
>
> ALTER DATABASE x SET enable_guc_fencing = on
>
> and you don't have to rewrite the queries.
>

ITYM enable_cte_fencing.

I'm not sure it will help all that much unless we provide both decorator
variants, so people can remediate their code one query at a time, and
this guc would just govern the default.

cheers

andrew

--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Claudio Freire <klaussfreire(at)gmail(dot)com>
To: Craig Ringer <craig(at)2ndquadrant(dot)com>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Merlin Moncure <mmoncure(at)gmail(dot)com>, David Fetter <david(at)fetter(dot)org>, Andreas Karlsson <andreas(at)proxel(dot)se>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: CTE inlining
Date: 2017-05-03 14:26:27
Message-ID: CAGTBQpb0196WWCLHTUzqZUFy5s-ayRptQD=nGVyrw3vEyWHamg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, May 3, 2017 at 2:19 AM, Craig Ringer <craig(at)2ndquadrant(dot)com> wrote:
>> Or we will choose WITH MATERIALIZE, and then the users aware of the fencing
>> (and using the CTEs for that purpose) will have to modify the queries. But
>> does adding MATERIALIZE quality as major query rewrite?
>
> Hardly.
>
>> Perhaps combining this with a GUC would be a solution. I mean, a GUC
>> specifying the default behavior, and then INLINE / MATERIALIZE for
>> individual CTEs in a query?
>
> It'd be nice if we could do that for a couple of releases as an
> interim measure, but people will then get locked into relying on it,
> and we'll never be able to remove it.

The proposed guc seems like a good idea, without which ORMs that
support CTEs would be at a loss. People using those ORMs that need
materialized behavior would have to wait for the ORM to catch up with
postgres syntax before upgrading, and that wouldn't be a nice thing.

It's not about requiring testing before upgrading, of course users
should/will do that. But if said testing says inlined CTEs perform
horribly, and the ORM has no support for the materialized keyword, the
only option is to not upgrade. With the CTE, people can upgrade,
changing the default behavior back to what it was.

That seems to me a useful thing to have.


From: David Fetter <david(at)fetter(dot)org>
To: Claudio Freire <klaussfreire(at)gmail(dot)com>
Cc: Craig Ringer <craig(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Andreas Karlsson <andreas(at)proxel(dot)se>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: CTE inlining
Date: 2017-05-03 14:31:58
Message-ID: 20170503143158.GF14241@fetter.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, May 03, 2017 at 11:26:27AM -0300, Claudio Freire wrote:
> On Wed, May 3, 2017 at 2:19 AM, Craig Ringer <craig(at)2ndquadrant(dot)com> wrote:
> >> Or we will choose WITH MATERIALIZE, and then the users aware of
> >> the fencing (and using the CTEs for that purpose) will have to
> >> modify the queries. But does adding MATERIALIZE quality as major
> >> query rewrite?
> >
> > Hardly.
> >
> >> Perhaps combining this with a GUC would be a solution. I mean, a
> >> GUC specifying the default behavior, and then INLINE /
> >> MATERIALIZE for individual CTEs in a query?
> >
> > It'd be nice if we could do that for a couple of releases as an
> > interim measure, but people will then get locked into relying on
> > it, and we'll never be able to remove it.
>
> The proposed guc seems like a good idea, without which ORMs that
> support CTEs would be at a loss.

Are you aware of such an ORM which both supports WITH and doesn't also
closely track PostgreSQL development? I'm not.

Even assuming that such a thing exists, it's not at all obvious to me
that we should be stalling and/or putting in what will turn out to be
misfeatures to accommodate it.

Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


From: Thomas Kellerer <spam_eater(at)gmx(dot)net>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: CTE inlining
Date: 2017-05-03 15:51:46
Message-ID: 1493826706018-5959509.post@n3.nabble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> I could tolerate telling people to use OFFSET 0 (and documenting it!)
> as a workaround if we can't get something more friendly in.

I agree with that.

> If we go with WITH INLINE then we're likely not solving anything, because
> most people will simply use WITH just like now, and will be subject to the
> fencing without realizing it.

I agree - the default behaviour should be change to match what everybody
expects. The current behaviour should be the exception.

> Yes, and we're missing the opportunity to confirm with what other
> systems do, and the spirit of the SQL language's declare what I want,
> not how to do it, model.

Essentially *all* other systems optimize CTEs the same way they optimize
derived tables. I think even MySQL does it like that in the upcoming 8.0
release.

I have never heard anyone saying that the Postgres implementation is an
advantage and that they would hate to see this disappear. I usually hear
"Why is Postgres doing that? Can't they change that?"

Maybe I have a limited view on this, but from where I stand, simply changing
it would help everybody I know and would not break anything. I don't even
think a replacement for the old behaviour would be necessary.

--
View this message in context: http://www.postgresql-archive.org/CTE-inlining-tp5958992p5959509.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


From: Claudio Freire <klaussfreire(at)gmail(dot)com>
To: David Fetter <david(at)fetter(dot)org>
Cc: Craig Ringer <craig(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Andreas Karlsson <andreas(at)proxel(dot)se>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: CTE inlining
Date: 2017-05-03 16:27:38
Message-ID: CAGTBQpZ46J1U+p6=_qpo_668vMiw1bvQVy6CzD=6oSMPPeFwOA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, May 3, 2017 at 11:31 AM, David Fetter <david(at)fetter(dot)org> wrote:
> On Wed, May 03, 2017 at 11:26:27AM -0300, Claudio Freire wrote:
>> On Wed, May 3, 2017 at 2:19 AM, Craig Ringer <craig(at)2ndquadrant(dot)com> wrote:
>> >> Or we will choose WITH MATERIALIZE, and then the users aware of
>> >> the fencing (and using the CTEs for that purpose) will have to
>> >> modify the queries. But does adding MATERIALIZE quality as major
>> >> query rewrite?
>> >
>> > Hardly.
>> >
>> >> Perhaps combining this with a GUC would be a solution. I mean, a
>> >> GUC specifying the default behavior, and then INLINE /
>> >> MATERIALIZE for individual CTEs in a query?
>> >
>> > It'd be nice if we could do that for a couple of releases as an
>> > interim measure, but people will then get locked into relying on
>> > it, and we'll never be able to remove it.
>>
>> The proposed guc seems like a good idea, without which ORMs that
>> support CTEs would be at a loss.
>
> Are you aware of such an ORM which both supports WITH and doesn't also
> closely track PostgreSQL development? I'm not.
>
> Even assuming that such a thing exists, it's not at all obvious to me
> that we should be stalling and/or putting in what will turn out to be
> misfeatures to accommodate it.

I know SQLAlchemy does support CTEs, and lags quite considerably in
its support of the latest syntactic elements.

For instance, it took them 8 months to support the "skip locked" option.

Not sure whether that qualifies as "closely tracking" postgres for
you. Clearly they do track it, but that doesn't mean they're fast or
as fast as one would like/need.

Sure, that might not be enough to warrant the GUC. I would think so,
those are my 2 cents. YMMV.


From: David Fetter <david(at)fetter(dot)org>
To: Claudio Freire <klaussfreire(at)gmail(dot)com>
Cc: Craig Ringer <craig(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Andreas Karlsson <andreas(at)proxel(dot)se>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: CTE inlining
Date: 2017-05-03 16:54:32
Message-ID: 20170503165432.GL14241@fetter.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, May 03, 2017 at 01:27:38PM -0300, Claudio Freire wrote:
> On Wed, May 3, 2017 at 11:31 AM, David Fetter <david(at)fetter(dot)org> wrote:
> > Are you aware of such an ORM which both supports WITH and doesn't
> > also closely track PostgreSQL development? I'm not.
> >
> > Even assuming that such a thing exists, it's not at all obvious to
> > me that we should be stalling and/or putting in what will turn out
> > to be misfeatures to accommodate it.
>
> I know SQLAlchemy does support CTEs, and lags quite considerably in
> its support of the latest syntactic elements.
>
> For instance, it took them 8 months to support the "skip locked"
> option.

That is pretty strictly their problem.

> Not sure whether that qualifies as "closely tracking" postgres for
> you. Clearly they do track it, but that doesn't mean they're fast or
> as fast as one would like/need.

We can NOT make their tardiness a driver of our development.

> Sure, that might not be enough to warrant the GUC. I would think so,
> those are my 2 cents. YMMV.

When we add a "temporary" GUC, we're taking on a gigantic burden.
Either we support it forever somehow, or we put it on a deprecation
schedule immediately and expect to be answering questions about it for
years after it's been removed.

-1 for the GUC.

Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: David Fetter <david(at)fetter(dot)org>
Cc: Claudio Freire <klaussfreire(at)gmail(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Andreas Karlsson <andreas(at)proxel(dot)se>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: CTE inlining
Date: 2017-05-03 16:58:49
Message-ID: CAFj8pRD0FfeaQFjaq+718O5sesdUxnicg6kur7eDs2OrR=kfSg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2017-05-03 18:54 GMT+02:00 David Fetter <david(at)fetter(dot)org>:

> On Wed, May 03, 2017 at 01:27:38PM -0300, Claudio Freire wrote:
> > On Wed, May 3, 2017 at 11:31 AM, David Fetter <david(at)fetter(dot)org> wrote:
> > > Are you aware of such an ORM which both supports WITH and doesn't
> > > also closely track PostgreSQL development? I'm not.
> > >
> > > Even assuming that such a thing exists, it's not at all obvious to
> > > me that we should be stalling and/or putting in what will turn out
> > > to be misfeatures to accommodate it.
> >
> > I know SQLAlchemy does support CTEs, and lags quite considerably in
> > its support of the latest syntactic elements.
> >
> > For instance, it took them 8 months to support the "skip locked"
> > option.
>
> That is pretty strictly their problem.
>
> > Not sure whether that qualifies as "closely tracking" postgres for
> > you. Clearly they do track it, but that doesn't mean they're fast or
> > as fast as one would like/need.
>
> We can NOT make their tardiness a driver of our development.
>
> > Sure, that might not be enough to warrant the GUC. I would think so,
> > those are my 2 cents. YMMV.
>
> When we add a "temporary" GUC, we're taking on a gigantic burden.
> Either we support it forever somehow, or we put it on a deprecation
> schedule immediately and expect to be answering questions about it for
> years after it's been removed.
>
> -1 for the GUC.
>

Is possible to find consensus without GUC? I understand well, why GUC is
wrong, but I don't see any possible solution how to change current behave
and don't break lot of applications.

Regards

Pavel

> Best,
> David.
> --
> David Fetter <david(at)fetter(dot)org> http://fetter.org/
> Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
> Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com
>
> Remember to vote!
> Consider donating to Postgres: http://www.postgresql.org/about/donate
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: David Fetter <david(at)fetter(dot)org>
Cc: Claudio Freire <klaussfreire(at)gmail(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Andreas Karlsson <andreas(at)proxel(dot)se>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: CTE inlining
Date: 2017-05-03 17:33:05
Message-ID: 20170503173305.fetj4tz7kd56tjlr@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

David Fetter wrote:

> When we add a "temporary" GUC, we're taking on a gigantic burden.
> Either we support it forever somehow, or we put it on a deprecation
> schedule immediately and expect to be answering questions about it for
> years after it's been removed.
>
> -1 for the GUC.

Absolutely.

So ISTM we have three choices:

1) we switch unmarked CTEs as inlineable by default in pg11. What seems
likely to happen for a user that upgrades to pg11 is that 5 out of 10
CTE-using queries are going to become faster than with pg10, and they
are going to be happy; 4 out of five are going to see no difference, but
they didn't have to do anything about it; and the remaining query is
going to become slower, either indistinguishably so (in which case they
don't care and they remain happy because of the other improvements) or
notably so, in which case they can easily figure where to add the
MATERIALIZED option and regain the original performance.

2) unmarked CTEs continue to be an optimization barrier, but we add
"WITH INLINED" so that they're inlineable. Some users may wonder about
it and waste a lot of time trying to figure out which CTEs to add it to.
They see a benefit in half the queries, which makes them happy, but they
are angry that they had to waste all that time on the other queries.

3) We don't do anything, because we all agree that GUCs are not
suitable. No progress. No anger, but nobody is happy either.

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: David Fetter <david(at)fetter(dot)org>, Claudio Freire <klaussfreire(at)gmail(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Andreas Karlsson <andreas(at)proxel(dot)se>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: CTE inlining
Date: 2017-05-03 17:44:17
Message-ID: CAFj8pRC5T-ucUeiP4dgqO0FXS7wuGDQMdUDKX10kgW1RmXMO=w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2017-05-03 19:33 GMT+02:00 Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>:

> David Fetter wrote:
>
> > When we add a "temporary" GUC, we're taking on a gigantic burden.
> > Either we support it forever somehow, or we put it on a deprecation
> > schedule immediately and expect to be answering questions about it for
> > years after it's been removed.
> >
> > -1 for the GUC.
>
> Absolutely.
>
> So ISTM we have three choices:
>
> 1) we switch unmarked CTEs as inlineable by default in pg11. What seems
> likely to happen for a user that upgrades to pg11 is that 5 out of 10
> CTE-using queries are going to become faster than with pg10, and they
> are going to be happy; 4 out of five are going to see no difference, but
> they didn't have to do anything about it; and the remaining query is
> going to become slower, either indistinguishably so (in which case they
> don't care and they remain happy because of the other improvements) or
> notably so, in which case they can easily figure where to add the
> MATERIALIZED option and regain the original performance.
>
>
> 2) unmarked CTEs continue to be an optimization barrier, but we add
> "WITH INLINED" so that they're inlineable. Some users may wonder about
> it and waste a lot of time trying to figure out which CTEs to add it to.
> They see a benefit in half the queries, which makes them happy, but they
> are angry that they had to waste all that time on the other queries.
>
>
> 3) We don't do anything, because we all agree that GUCs are not
> suitable. No progress. No anger, but nobody is happy either.
>

yes, these variants are all.

@2 is only little bit better than bad @3 - is not nice to have default
behave different than any other and than what 90% developers expects.

Regards

Pavel

>
> --
> Álvaro Herrera https://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


From: Kenneth Marshall <ktm(at)rice(dot)edu>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: David Fetter <david(at)fetter(dot)org>, Claudio Freire <klaussfreire(at)gmail(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Andreas Karlsson <andreas(at)proxel(dot)se>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: CTE inlining
Date: 2017-05-03 18:22:06
Message-ID: 20170503182206.GJ19317@aart.rice.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, May 03, 2017 at 02:33:05PM -0300, Alvaro Herrera wrote:
> David Fetter wrote:
>
> > When we add a "temporary" GUC, we're taking on a gigantic burden.
> > Either we support it forever somehow, or we put it on a deprecation
> > schedule immediately and expect to be answering questions about it for
> > years after it's been removed.
> >
> > -1 for the GUC.
>
> Absolutely.
>
> So ISTM we have three choices:
>
> 1) we switch unmarked CTEs as inlineable by default in pg11. What seems
> likely to happen for a user that upgrades to pg11 is that 5 out of 10
> CTE-using queries are going to become faster than with pg10, and they
> are going to be happy; 4 out of five are going to see no difference, but
> they didn't have to do anything about it; and the remaining query is
> going to become slower, either indistinguishably so (in which case they
> don't care and they remain happy because of the other improvements) or
> notably so, in which case they can easily figure where to add the
> MATERIALIZED option and regain the original performance.
>
>
> 2) unmarked CTEs continue to be an optimization barrier, but we add
> "WITH INLINED" so that they're inlineable. Some users may wonder about
> it and waste a lot of time trying to figure out which CTEs to add it to.
> They see a benefit in half the queries, which makes them happy, but they
> are angry that they had to waste all that time on the other queries.
>
>
> 3) We don't do anything, because we all agree that GUCs are not
> suitable. No progress. No anger, but nobody is happy either.
>

+1 for option 1. I just finished rewriting a well written CTE query to
avoid the optimization fence and get reasonable performance.

Regards,
Ken


From: Andreas Karlsson <andreas(at)proxel(dot)se>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, David Fetter <david(at)fetter(dot)org>
Cc: Claudio Freire <klaussfreire(at)gmail(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: CTE inlining
Date: 2017-05-03 19:54:50
Message-ID: a5908071-fdfd-a0e3-e438-60229aa4a3fd@proxel.se
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 05/03/2017 07:33 PM, Alvaro Herrera wrote:
> 1) we switch unmarked CTEs as inlineable by default in pg11. What seems
> likely to happen for a user that upgrades to pg11 is that 5 out of 10
> CTE-using queries are going to become faster than with pg10, and they
> are going to be happy; 4 out of five are going to see no difference, but
> they didn't have to do anything about it; and the remaining query is
> going to become slower, either indistinguishably so (in which case they
> don't care and they remain happy because of the other improvements) or
> notably so, in which case they can easily figure where to add the
> MATERIALIZED option and regain the original performance.
>
>
> 2) unmarked CTEs continue to be an optimization barrier, but we add
> "WITH INLINED" so that they're inlineable. Some users may wonder about
> it and waste a lot of time trying to figure out which CTEs to add it to.
> They see a benefit in half the queries, which makes them happy, but they
> are angry that they had to waste all that time on the other queries.
>
>
> 3) We don't do anything, because we all agree that GUCs are not
> suitable. No progress. No anger, but nobody is happy either.

+1 for option 1. And while I would not like if we had to combine it with
a backwards compatibility GUC which enables the old behavior to get it
merged I still personally would prefer that over option 2 and 3.

Andreas


From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Andreas Karlsson <andreas(at)proxel(dot)se>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, David Fetter <david(at)fetter(dot)org>
Cc: Claudio Freire <klaussfreire(at)gmail(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: CTE inlining
Date: 2017-05-03 21:23:11
Message-ID: d846a031-f9b8-9611-8af2-80b38d915f41@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 5/3/17 9:54 PM, Andreas Karlsson wrote:
> On 05/03/2017 07:33 PM, Alvaro Herrera wrote:
>> 1) we switch unmarked CTEs as inlineable by default in pg11. What seems
>> likely to happen for a user that upgrades to pg11 is that 5 out of 10
>> CTE-using queries are going to become faster than with pg10, and they
>> are going to be happy; 4 out of five are going to see no difference, but
>> they didn't have to do anything about it; and the remaining query is
>> going to become slower, either indistinguishably so (in which case they
>> don't care and they remain happy because of the other improvements) or
>> notably so, in which case they can easily figure where to add the
>> MATERIALIZED option and regain the original performance.
>>
>>
>> 2) unmarked CTEs continue to be an optimization barrier, but we add
>> "WITH INLINED" so that they're inlineable. Some users may wonder about
>> it and waste a lot of time trying to figure out which CTEs to add it to.
>> They see a benefit in half the queries, which makes them happy, but they
>> are angry that they had to waste all that time on the other queries.
>>
>>
>> 3) We don't do anything, because we all agree that GUCs are not
>> suitable. No progress. No anger, but nobody is happy either.
>
> +1 for option 1. And while I would not like if we had to combine it with
> a backwards compatibility GUC which enables the old behavior to get it
> merged I still personally would prefer that over option 2 and 3.
>
> Andreas
>

+1 to what Andreas says

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: David Fetter <david(at)fetter(dot)org>, Claudio Freire <klaussfreire(at)gmail(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Andreas Karlsson <andreas(at)proxel(dot)se>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: CTE inlining
Date: 2017-05-03 21:24:05
Message-ID: CAHyXU0xcbctKzRoenpJL8fj01GDCy0KzmFCSus3qHAjOyB-bAA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, May 3, 2017 at 12:33 PM, Alvaro Herrera
<alvherre(at)2ndquadrant(dot)com> wrote:
> David Fetter wrote:
>
>> When we add a "temporary" GUC, we're taking on a gigantic burden.
>> Either we support it forever somehow, or we put it on a deprecation
>> schedule immediately and expect to be answering questions about it for
>> years after it's been removed.
>>
>> -1 for the GUC.
>
> Absolutely.
>
> So ISTM we have three choices:
>
> 1) we switch unmarked CTEs as inlineable by default in pg11. What seems
> likely to happen for a user that upgrades to pg11 is that 5 out of 10
> CTE-using queries are going to become faster than with pg10, and they
> are going to be happy; 4 out of five are going to see no difference, but
> they didn't have to do anything about it; and the remaining query is
> going to become slower, either indistinguishably so (in which case they
> don't care and they remain happy because of the other improvements) or
> notably so, in which case they can easily figure where to add the
> MATERIALIZED option and regain the original performance.

+1 for option 1. This change will be welcome for a large number of
queries, but forced materialization is a real need and I use it often.
This comes off as a very reasonable compromise in my opinion unless it
requires major coding gymnastics to implement.

merlin


From: Gavin Flower <GavinFlower(at)archidevsys(dot)co(dot)nz>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, David Fetter <david(at)fetter(dot)org>
Cc: Claudio Freire <klaussfreire(at)gmail(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Andreas Karlsson <andreas(at)proxel(dot)se>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: CTE inlining
Date: 2017-05-03 21:33:54
Message-ID: 30673162-d741-5d74-2c64-5018083a5f20@archidevsys.co.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 04/05/17 05:33, Alvaro Herrera wrote:
> David Fetter wrote:
>
>> When we add a "temporary" GUC, we're taking on a gigantic burden.
>> Either we support it forever somehow, or we put it on a deprecation
>> schedule immediately and expect to be answering questions about it for
>> years after it's been removed.
>>
>> -1 for the GUC.
> Absolutely.
>
> So ISTM we have three choices:
>
> 1) we switch unmarked CTEs as inlineable by default in pg11. What seems
> likely to happen for a user that upgrades to pg11 is that 5 out of 10
> CTE-using queries are going to become faster than with pg10, and they
> are going to be happy; 4 out of five are going to see no difference, but
> they didn't have to do anything about it; and the remaining query is
> going to become slower, either indistinguishably so (in which case they
> don't care and they remain happy because of the other improvements) or
> notably so, in which case they can easily figure where to add the
> MATERIALIZED option and regain the original performance.
+1

I've watched a colleague spend hours trying to optimise a complicated
query with nested views, then find that this 'optimisation fence' was
the heart of the problem.

Cheers,
Gavin


From: Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: David Fetter <david(at)fetter(dot)org>, Claudio Freire <klaussfreire(at)gmail(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Andreas Karlsson <andreas(at)proxel(dot)se>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: CTE inlining
Date: 2017-05-03 21:34:47
Message-ID: f50f9212-1939-d42f-d218-14b26a0f466c@2ndQuadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 05/03/2017 05:24 PM, Merlin Moncure wrote:
> On Wed, May 3, 2017 at 12:33 PM, Alvaro Herrera
> <alvherre(at)2ndquadrant(dot)com> wrote:
>> David Fetter wrote:
>>
>>> When we add a "temporary" GUC, we're taking on a gigantic burden.
>>> Either we support it forever somehow, or we put it on a deprecation
>>> schedule immediately and expect to be answering questions about it for
>>> years after it's been removed.
>>>
>>> -1 for the GUC.
>> Absolutely.
>>
>> So ISTM we have three choices:
>>
>> 1) we switch unmarked CTEs as inlineable by default in pg11. What seems
>> likely to happen for a user that upgrades to pg11 is that 5 out of 10
>> CTE-using queries are going to become faster than with pg10, and they
>> are going to be happy; 4 out of five are going to see no difference, but
>> they didn't have to do anything about it; and the remaining query is
>> going to become slower, either indistinguishably so (in which case they
>> don't care and they remain happy because of the other improvements) or
>> notably so, in which case they can easily figure where to add the
>> MATERIALIZED option and regain the original performance.
> +1 for option 1. This change will be welcome for a large number of
> queries, but forced materialization is a real need and I use it often.
> This comes off as a very reasonable compromise in my opinion unless it
> requires major coding gymnastics to implement.
>

The only thing I am totally dead set against is making people go back to
using OFFSET 0. It's ugly and completely non-intuitive.

cheers

andrew

--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: David Fetter <david(at)fetter(dot)org>, Claudio Freire <klaussfreire(at)gmail(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Andreas Karlsson <andreas(at)proxel(dot)se>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: CTE inlining
Date: 2017-05-04 06:34:16
Message-ID: 68170cb4-7f14-7d67-a64e-82124a2c77a1@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 03/05/17 23:24, Merlin Moncure wrote:
> On Wed, May 3, 2017 at 12:33 PM, Alvaro Herrera
> <alvherre(at)2ndquadrant(dot)com> wrote:
>> David Fetter wrote:
>>
>>> When we add a "temporary" GUC, we're taking on a gigantic burden.
>>> Either we support it forever somehow, or we put it on a deprecation
>>> schedule immediately and expect to be answering questions about it for
>>> years after it's been removed.
>>>
>>> -1 for the GUC.
>>
>> Absolutely.
>>
>> So ISTM we have three choices:
>>
>> 1) we switch unmarked CTEs as inlineable by default in pg11. What seems
>> likely to happen for a user that upgrades to pg11 is that 5 out of 10
>> CTE-using queries are going to become faster than with pg10, and they
>> are going to be happy; 4 out of five are going to see no difference, but
>> they didn't have to do anything about it; and the remaining query is
>> going to become slower, either indistinguishably so (in which case they
>> don't care and they remain happy because of the other improvements) or
>> notably so, in which case they can easily figure where to add the
>> MATERIALIZED option and regain the original performance.
>
> +1 for option 1. This change will be welcome for a large number of
> queries, but forced materialization is a real need and I use it often.
> This comes off as a very reasonable compromise in my opinion unless it
> requires major coding gymnastics to implement.
>

+1 to this

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Thomas Kellerer <spam_eater(at)gmx(dot)net>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: CTE inlining
Date: 2017-05-04 07:25:31
Message-ID: 1493882731645-5959615.post@n3.nabble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> 1) we switch unmarked CTEs as inlineable by default in pg11.

+1 from me for option 1

--
View this message in context: http://www.postgresql-archive.org/CTE-inlining-tp5958992p5959615.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


From: Gavin Flower <GavinFlower(at)archidevsys(dot)co(dot)nz>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, David Fetter <david(at)fetter(dot)org>
Cc: Claudio Freire <klaussfreire(at)gmail(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Andreas Karlsson <andreas(at)proxel(dot)se>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Mario Becroft <mb(at)true(dot)group>
Subject: Re: CTE inlining
Date: 2017-05-04 09:37:36
Message-ID: 070e1221-f4a6-dc84-3c72-b9ffa06f65ce@archidevsys.co.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 04/05/17 05:33, Alvaro Herrera wrote:
> David Fetter wrote:
>
>> When we add a "temporary" GUC, we're taking on a gigantic burden.
>> Either we support it forever somehow, or we put it on a deprecation
>> schedule immediately and expect to be answering questions about it for
>> years after it's been removed.
>>
>> -1 for the GUC.
>
> Absolutely.
>
> So ISTM we have three choices:
>
> 1) we switch unmarked CTEs as inlineable by default in pg11. What seems
> likely to happen for a user that upgrades to pg11 is that 5 out of 10
> CTE-using queries are going to become faster than with pg10, and they
> are going to be happy; 4 out of five are going to see no difference, but
> they didn't have to do anything about it; and the remaining query is
> going to become slower, either indistinguishably so (in which case they
> don't care and they remain happy because of the other improvements) or
> notably so, in which case they can easily figure where to add the
> MATERIALIZED option and regain the original performance.

+1 (Mario, proxied by me)

I've been asked to pass on comments by my colleague Mario of True Group
regarding the pain of the current behaviour of CTE's being optimisation
fences (as he doesn't normally follow this email group, I'll pass on any
feedback):

"Greetings,

Jumping onto the bandwagon here. At True Group in Auckland, we use
PostgreSQL as the platform for large-scale software development, often
with extensive and complex schemas, views and queries.

We frequently encounter poor query performance due to CTEs being an
optimisation barrier. We are forced to use workarounds such as a
set-returning function (parameterised view) in place of a view, and
manually placing the needed quals (WHERE clauses) into each CTE. This
creates headaches in large systems, where writing expressive SQL is
essential, rather than employing workarounds that make code harder to
understand and reuse.

Our general assumption is that we should write SQL that describes what
we want to achieve. The planner's job is to determine how to do that
efficiently.

There is an argument that pushing quals into CTEs could reduce
performance for some queries, especially if the qual is expensive. My
answer is twofold:

a) most queries will either experience no change or benefit from qual
push-down, as usually the expensive part is the query subtree, not the
top-level qual.

b) if a small proportion of queries are negatively affected, this is
better addressed by improving the planner's cost estimation. At worst,
an explicit OPTIMIZATION BARRIER hint could be added. But I know there
is much philosophical objection in the PostgreSQL community to planner
hints. The irony is that a main argument for retaining current CTE
behaviour is that people rely on CTEs as implicit barrier hints!

As an aside, we also encounter several other instances where qual
push-down fails, including where rewording a query in a way that is
semantically identical can change whether push-down takes place. But
probably the greatest bugbear is inability to push-down a qual into
more
than one subtree, meaning if a query has two or more subtrees, each of
which is expensive, but only one row is needed from each, determined by
one qual at the top level, one subtree will get the qual pushed into it
and run fast, while the others will do a full (expensive) table
scan and
be filtered only afterwards. There are also gains to be had in
improving
qual push-down in sub-selects with aggregates, and others. But I think
these issues are only due to lack of resource to implement, rather than
philosophical/political objections.

We would like to see a Postgres version in the future that does much
better planning in the areas I've mentioned, including but not limited
to the present issue of CTEs. Our organisation may be able to
contribute
materially to this, if the political will is there, and others affected
pitch in to achieve common goals. Not being expert in Postgres
internals, I am not certain how difficult each of the problems is.

I hope we have contributed usefully to the conversation, and invite
feedback.

Sincerely,
Mario"

Cheers,
Gavin


From: Gavin Flower <GavinFlower(at)archidevsys(dot)co(dot)nz>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Craig Ringer <craig(dot)ringer(at)2ndquadrant(dot)com>
Cc: Ilya Shkuratov <motr(dot)ilya(at)ya(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Mario Becroft <mb(at)true(dot)group>
Subject: Re: CTE inlining
Date: 2017-05-04 10:02:06
Message-ID: 2dab0187-42fe-3b43-eed9-921aca564ec2@archidevsys.co.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 30/04/17 16:28, Tom Lane wrote:
> Craig Ringer <craig(dot)ringer(at)2ndquadrant(dot)com> writes:
>> - as you noted, it is hard to decide when it's worth inlining vs
>> materializing for CTE terms referenced more than once.
> [ raised eyebrow... ] Please explain why the answer isn't trivially
> "never".
>
> There's already a pretty large hill to climb here in the way of
> breaking peoples' expectations about CTEs being optimization
> fences. Breaking the documented semantics about CTEs being
> single-evaluation seems to me to be an absolute non-starter.
>
> regards, tom lane
>
>
Could not each CTE be only evaluated once, but restricted (as far as is
practicable) to the rows actually needed by the body of the SELECT?

Cheers,
Gavin


From: Julien Rouhaud <julien(dot)rouhaud(at)dalibo(dot)com>
To: Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: David Fetter <david(at)fetter(dot)org>, Claudio Freire <klaussfreire(at)gmail(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Andreas Karlsson <andreas(at)proxel(dot)se>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: CTE inlining
Date: 2017-05-04 11:45:01
Message-ID: 5582f012-1f79-695f-1aad-9b95cb558c0e@dalibo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 04/05/2017 08:34, Petr Jelinek wrote:
> On 03/05/17 23:24, Merlin Moncure wrote:
>> On Wed, May 3, 2017 at 12:33 PM, Alvaro Herrera
>> <alvherre(at)2ndquadrant(dot)com> wrote:
>>> David Fetter wrote:
>>>
>>>> When we add a "temporary" GUC, we're taking on a gigantic burden.
>>>> Either we support it forever somehow, or we put it on a deprecation
>>>> schedule immediately and expect to be answering questions about it for
>>>> years after it's been removed.
>>>>
>>>> -1 for the GUC.
>>>
>>> Absolutely.
>>>
>>> So ISTM we have three choices:
>>>
>>> 1) we switch unmarked CTEs as inlineable by default in pg11. What seems
>>> likely to happen for a user that upgrades to pg11 is that 5 out of 10
>>> CTE-using queries are going to become faster than with pg10, and they
>>> are going to be happy; 4 out of five are going to see no difference, but
>>> they didn't have to do anything about it; and the remaining query is
>>> going to become slower, either indistinguishably so (in which case they
>>> don't care and they remain happy because of the other improvements) or
>>> notably so, in which case they can easily figure where to add the
>>> MATERIALIZED option and regain the original performance.
>>
>> +1 for option 1. This change will be welcome for a large number of
>> queries, but forced materialization is a real need and I use it often.
>> This comes off as a very reasonable compromise in my opinion unless it
>> requires major coding gymnastics to implement.
>>
>
> +1 to this
>

+1 too

--
Julien Rouhaud
http://dalibo.com - http://dalibo.org


From: Serge Rielau <serge(at)rielau(dot)com>
To: Gavin Flower <GavinFlower(at)archidevsys(dot)co(dot)nz>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Craig Ringer <craig(dot)ringer(at)2ndquadrant(dot)com>, Ilya Shkuratov <motr(dot)ilya(at)ya(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Mario Becroft <mb(at)true(dot)group>
Subject: Re: CTE inlining
Date: 2017-05-04 15:04:09
Message-ID: 651A1657-3645-40B2-8202-3BE0B2A35FD1@rielau.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


> On May 4, 2017, at 3:02 AM, Gavin Flower <GavinFlower(at)archidevsys(dot)co(dot)nz> wrote:
>
> On 30/04/17 16:28, Tom Lane wrote:
>> Craig Ringer <craig(dot)ringer(at)2ndquadrant(dot)com> writes:
>>> - as you noted, it is hard to decide when it's worth inlining vs
>>> materializing for CTE terms referenced more than once.
>> [ raised eyebrow... ] Please explain why the answer isn't trivially
>> "never".
>>
>> There's already a pretty large hill to climb here in the way of
>> breaking peoples' expectations about CTEs being optimization
>> fences. Breaking the documented semantics about CTEs being
>> single-evaluation seems to me to be an absolute non-starter.
>>
>> regards, tom lane
>>
>>
> Could not each CTE be only evaluated once, but restricted (as far as is practicable) to the rows actually needed by the body of the SELECT?
>
Tom,

Are you worried about semantics or performance?
With proper detection of mutating functions and snapshot isolation I do not see how a user would detect “lack of” single evaluation.
As for performance we’d be talking about what? An uncorrelated inner of a nested loop join?

Anyway it seems to me that there a multiple properties at play here which are quite orthogonal.

1. Full materialization/Slow materialization/pipelining
I cannot come up with any scenario where full materialization would be beneficial from a performance point of view (which speaks to Gavin’s view).
I can see it from a semantic point of view when order of execution may matter (for example with embedded DML and triggers present).
As soon as semantics are at play having syntax is absolutely the right thing: +1 for MATERIALIZE
2.Pushing predicates (or other operators) into the CTE.
All this can ever do is reduce the number of rows being looked at.
As long as the optimizer is careful, not to do what it isn’t supposed to do in a nested query (push past a mutating function) I don’t see the harm
3. Replicating the CTE to push distinct operators from different consumers.
Again this can only be done if there are no mutators or non deterministic operators.

To summarize: big +1 to preserve the existing behavior with MATERIALIZE and to set CTE’s free by default with the onus on the optimizer to prove semantic equivalence.

Cheers
Serge Rielau
Salesforce.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Serge Rielau <serge(at)rielau(dot)com>
Cc: Gavin Flower <GavinFlower(at)archidevsys(dot)co(dot)nz>, Craig Ringer <craig(dot)ringer(at)2ndquadrant(dot)com>, Ilya Shkuratov <motr(dot)ilya(at)ya(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Mario Becroft <mb(at)true(dot)group>
Subject: Re: CTE inlining
Date: 2017-05-04 15:36:50
Message-ID: 9846.1493912210@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Serge Rielau <serge(at)rielau(dot)com> writes:
>> On May 4, 2017, at 3:02 AM, Gavin Flower <GavinFlower(at)archidevsys(dot)co(dot)nz> wrote:
>> On 30/04/17 16:28, Tom Lane wrote:
>>> There's already a pretty large hill to climb here in the way of
>>> breaking peoples' expectations about CTEs being optimization
>>> fences. Breaking the documented semantics about CTEs being
>>> single-evaluation seems to me to be an absolute non-starter.

> Are you worried about semantics or performance?
> With proper detection of mutating functions and snapshot isolation I do not see how a user would detect “lack of” single evaluation.

I do not think a user will have any trouble "detecting" what we did
if his query runs for two hours, rather than two minutes, because we
executed some expensive function 100 times rather than the one time
he expected it to run.

Now you could argue that that's user error because he should have
marked the expensive function with a sufficiently high cost to
discourage us from flattening the CTE. But not everyone will have
done that (and I'm not sure we have any planner smarts that would
respond to such cases anyway). So what I'm saying is that if you're
promising there will be no visible bad consequences, you're wrong.

It may be worth breaking some peoples' queries to make other peoples'
queries faster, but I think we need to tread pretty carefully there.
And we definitely can't change any case where there would be visible
semantic differences.

regards, tom lane


From: Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Serge Rielau <serge(at)rielau(dot)com>
Cc: Gavin Flower <GavinFlower(at)archidevsys(dot)co(dot)nz>, Craig Ringer <craig(dot)ringer(at)2ndquadrant(dot)com>, Ilya Shkuratov <motr(dot)ilya(at)ya(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Mario Becroft <mb(at)true(dot)group>
Subject: Re: CTE inlining
Date: 2017-05-04 16:22:36
Message-ID: 5990fc09-f3ad-8966-7e10-12550f52bfc1@2ndQuadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 05/04/2017 11:36 AM, Tom Lane wrote:
> Serge Rielau <serge(at)rielau(dot)com> writes:
>>> On May 4, 2017, at 3:02 AM, Gavin Flower <GavinFlower(at)archidevsys(dot)co(dot)nz> wrote:
>>> On 30/04/17 16:28, Tom Lane wrote:
>>>> There's already a pretty large hill to climb here in the way of
>>>> breaking peoples' expectations about CTEs being optimization
>>>> fences. Breaking the documented semantics about CTEs being
>>>> single-evaluation seems to me to be an absolute non-starter.
>> Are you worried about semantics or performance?
>> With proper detection of mutating functions and snapshot isolation I do not see how a user would detect “lack of” single evaluation.
> I do not think a user will have any trouble "detecting" what we did
> if his query runs for two hours, rather than two minutes, because we
> executed some expensive function 100 times rather than the one time
> he expected it to run.
>
> Now you could argue that that's user error because he should have
> marked the expensive function with a sufficiently high cost to
> discourage us from flattening the CTE. But not everyone will have
> done that (and I'm not sure we have any planner smarts that would
> respond to such cases anyway). So what I'm saying is that if you're
> promising there will be no visible bad consequences, you're wrong.
>
> It may be worth breaking some peoples' queries to make other peoples'
> queries faster, but I think we need to tread pretty carefully there.
> And we definitely can't change any case where there would be visible
> semantic differences.
>

Yeah, the idea that this won't cause possibly significant pain is quite wrong. Quite by accident I came across an example just this morning where rewriting as a CTE makes a big improvement.

I wrote this query:

select (json_populate_record(null::mytype, myjson)).*
from mytable;

It turned out that this was an order of magnitude faster:

with r as
(
select json_populate_record(null::mytype, myjson) as x
from mytable
)
select (x).*
from r;

cheers

andrew

--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
To: Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Serge Rielau <serge(at)rielau(dot)com>, Gavin Flower <GavinFlower(at)archidevsys(dot)co(dot)nz>, Craig Ringer <craig(dot)ringer(at)2ndquadrant(dot)com>, Ilya Shkuratov <motr(dot)ilya(at)ya(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Mario Becroft <mb(at)true(dot)group>
Subject: Re: CTE inlining
Date: 2017-05-04 16:34:19
Message-ID: CAKFQuwawh+p0-XR6_wcOuHsi7ppk7_=q8ZRLsD4qbd0NDxyGwQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, May 4, 2017 at 9:22 AM, Andrew Dunstan <
andrew(dot)dunstan(at)2ndquadrant(dot)com> wrote:

>
> Yeah, the idea that this won't cause possibly significant pain is quite
> wrong. Quite by accident I came across an example just this morning where
> rewriting as a CTE makes a big improvement.
>
> I wrote this query:
>
> select (json_populate_record(null::mytype, myjson)).*
> from mytable;
>
>
> It turned out that this was an order of magnitude faster:
>
> with r as
> (
> select json_populate_record(null::mytype, myjson) as x
> from mytable
> )
> select (x).*
> from r;
>

​Except I suspect we at least have a chance to detect the above and not
de-optimize it by evaluating "json_populate_record" once for every column
in mytype.

The now idiomatic solution​ to the above is to use LATERAL so the above CTE
is no longer actually a required workaround.

David J.


From: Andres Freund <andres(at)anarazel(dot)de>
To: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
Cc: Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Serge Rielau <serge(at)rielau(dot)com>, Gavin Flower <GavinFlower(at)archidevsys(dot)co(dot)nz>, Craig Ringer <craig(dot)ringer(at)2ndquadrant(dot)com>, Ilya Shkuratov <motr(dot)ilya(at)ya(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Mario Becroft <mb(at)true(dot)group>
Subject: Re: CTE inlining
Date: 2017-05-04 16:39:11
Message-ID: 20170504163911.twb2otkbwfz52onx@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2017-05-04 09:34:19 -0700, David G. Johnston wrote:
> On Thu, May 4, 2017 at 9:22 AM, Andrew Dunstan <
> andrew(dot)dunstan(at)2ndquadrant(dot)com> wrote:
>
> >
> > Yeah, the idea that this won't cause possibly significant pain is quite
> > wrong. Quite by accident I came across an example just this morning where
> > rewriting as a CTE makes a big improvement.
> >
> > I wrote this query:
> >
> > select (json_populate_record(null::mytype, myjson)).*
> > from mytable;
> >
> >
> > It turned out that this was an order of magnitude faster:
> >
> > with r as
> > (
> > select json_populate_record(null::mytype, myjson) as x
> > from mytable
> > )
> > select (x).*
> > from r;

That's a very specific issue, that you can just as well solve with CTEs
though. IIRC Tom and I were discussing changing this - independently of
anything CTE releated - so it doesn't end up calling the function
multiple times. We e.g. could now quite easily put the
json_populate_record(null::mytype, myjson) into something *like* a
nodeProjectSet node, and handle the column accesses via the nromal
projection.

> ​Except I suspect we at least have a chance to detect the above and not
> de-optimize it by evaluating "json_populate_record" once for every column
> in mytype.

That's what happens with the non CTE version - don't think it's super
likely we'd end up doing that for CTEs; I'd even consider it a bug.

Greetings,

Andres Freund


From: Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>
To: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Serge Rielau <serge(at)rielau(dot)com>, Gavin Flower <GavinFlower(at)archidevsys(dot)co(dot)nz>, Craig Ringer <craig(dot)ringer(at)2ndquadrant(dot)com>, Ilya Shkuratov <motr(dot)ilya(at)ya(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Mario Becroft <mb(at)true(dot)group>
Subject: Re: CTE inlining
Date: 2017-05-04 17:20:20
Message-ID: 60607f78-0c45-6f17-cce2-9cff6c8d8801@2ndQuadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 05/04/2017 12:34 PM, David G. Johnston wrote:
> On Thu, May 4, 2017 at 9:22 AM, Andrew Dunstan
> <andrew(dot)dunstan(at)2ndquadrant(dot)com
> <mailto:andrew(dot)dunstan(at)2ndquadrant(dot)com>>wrote:
>
>
> Yeah, the idea that this won't cause possibly significant pain is
> quite wrong. Quite by accident I came across an example just this
> morning where rewriting as a CTE makes a big improvement.
>
> I wrote this query:
>
> select (json_populate_record(null::mytype, myjson)).*
> from mytable;
>
>
> It turned out that this was an order of magnitude faster:
>
> with r as
> (
> select json_populate_record(null::mytype, myjson) as x
> from mytable
> )
> select (x).*
> from r;
>
>
> ​Except I suspect we at least have a chance to detect the above and
> not de-optimize it by evaluating "json_populate_record" once for every
> column in mytype.
>
> The now idiomatic solution​ to the above is to use LATERAL so the
> above CTE is no longer actually a required workaround.

Hadn't though about LATERAL, good point. Still, there will be other cases.

cheers

andrew

--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>
Cc: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Serge Rielau <serge(at)rielau(dot)com>, Gavin Flower <GavinFlower(at)archidevsys(dot)co(dot)nz>, Craig Ringer <craig(dot)ringer(at)2ndquadrant(dot)com>, Ilya Shkuratov <motr(dot)ilya(at)ya(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Mario Becroft <mb(at)true(dot)group>
Subject: Re: CTE inlining
Date: 2017-05-04 17:33:39
Message-ID: 20170504173339.6zrrdrtklyr6d4ih@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Dunstan wrote:

> Hadn't though about LATERAL, good point. Still, there will be other cases.

I'm not sure what your point is. We know that for some cases the
optimization barrier semantics are useful, which is why the proposal is
to add a keyword to install one explicitely:

with materialized r as
(
select json_populate_record(null::mytype, myjson) as x
from mytable
)
select (x).*
from r;

this would preserve the current semantics.

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Serge Rielau <serge(at)rielau(dot)com>, Gavin Flower <GavinFlower(at)archidevsys(dot)co(dot)nz>, Craig Ringer <craig(dot)ringer(at)2ndquadrant(dot)com>, Ilya Shkuratov <motr(dot)ilya(at)ya(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Mario Becroft <mb(at)true(dot)group>
Subject: Re: CTE inlining
Date: 2017-05-04 17:44:12
Message-ID: b3b9ebf1-28b6-394a-d99b-52ff11d58df3@2ndQuadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 05/04/2017 01:33 PM, Alvaro Herrera wrote:
> Andrew Dunstan wrote:
>
>> Hadn't though about LATERAL, good point. Still, there will be other cases.
> I'm not sure what your point is. We know that for some cases the
> optimization barrier semantics are useful, which is why the proposal is
> to add a keyword to install one explicitely:
>
> with materialized r as
> (
> select json_populate_record(null::mytype, myjson) as x
> from mytable
> )
> select (x).*
> from r;
>
> this would preserve the current semantics.
>

I understand. Some people appear to think hardly anyone will actually
need this - that's what I was arguing against.

cheers

andrew

--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Joe Conway <mail(at)joeconway(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>
Cc: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Serge Rielau <serge(at)rielau(dot)com>, Gavin Flower <GavinFlower(at)archidevsys(dot)co(dot)nz>, Craig Ringer <craig(dot)ringer(at)2ndquadrant(dot)com>, Ilya Shkuratov <motr(dot)ilya(at)ya(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Mario Becroft <mb(at)true(dot)group>
Subject: Re: CTE inlining
Date: 2017-05-04 17:52:01
Message-ID: 3f4835a9-9bb2-d34d-faa7-e3afc8d87614@joeconway.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 05/04/2017 10:33 AM, Alvaro Herrera wrote:
> I'm not sure what your point is. We know that for some cases the
> optimization barrier semantics are useful, which is why the proposal is
> to add a keyword to install one explicitely:
>
> with materialized r as
> (
> select json_populate_record(null::mytype, myjson) as x
> from mytable
> )
> select (x).*
> from r;
>
> this would preserve the current semantics.

I haven't been able to follow this incredibly long thread, so please
excuse me if way off base, but are we talking about that a CTE would be
silently be rewritten as an inline expression potentially unless it is
decorated with some new syntax?

I would find that very disconcerting myself. For example, would this CTE
potentially get rewritten with multiple evaluation as follows?

DROP SEQUENCE IF EXISTS foo_seq;
CREATE SEQUENCE foo_seq;

WITH a(f1) AS (SELECT nextval('foo_seq'))
SELECT a.f1, a.f1 FROM a;
f1 | ?column?
----+----------
1 | 1
(1 row)

ALTER SEQUENCE foo_seq RESTART;
SELECT nextval('foo_seq'), nextval('foo_seq');
nextval | ?column?
---------+----------
1 | 2
(1 row)

Joe

--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development


From: Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>
To: Joe Conway <mail(at)joeconway(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Serge Rielau <serge(at)rielau(dot)com>, Gavin Flower <GavinFlower(at)archidevsys(dot)co(dot)nz>, Craig Ringer <craig(dot)ringer(at)2ndquadrant(dot)com>, Ilya Shkuratov <motr(dot)ilya(at)ya(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Mario Becroft <mb(at)true(dot)group>
Subject: Re: CTE inlining
Date: 2017-05-04 17:56:03
Message-ID: 67996352-f4e7-eec6-35b5-9223d26df26b@2ndQuadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 05/04/2017 01:52 PM, Joe Conway wrote:
> On 05/04/2017 10:33 AM, Alvaro Herrera wrote:
>> I'm not sure what your point is. We know that for some cases the
>> optimization barrier semantics are useful, which is why the proposal is
>> to add a keyword to install one explicitely:
>>
>> with materialized r as
>> (
>> select json_populate_record(null::mytype, myjson) as x
>> from mytable
>> )
>> select (x).*
>> from r;
>>
>> this would preserve the current semantics.
> I haven't been able to follow this incredibly long thread, so please
> excuse me if way off base, but are we talking about that a CTE would be
> silently be rewritten as an inline expression potentially unless it is
> decorated with some new syntax?
>
> I would find that very disconcerting myself. For example, would this CTE
> potentially get rewritten with multiple evaluation as follows?
>
> DROP SEQUENCE IF EXISTS foo_seq;
> CREATE SEQUENCE foo_seq;
>
> WITH a(f1) AS (SELECT nextval('foo_seq'))
> SELECT a.f1, a.f1 FROM a;
> f1 | ?column?
> ----+----------
> 1 | 1
> (1 row)
>
> ALTER SEQUENCE foo_seq RESTART;
> SELECT nextval('foo_seq'), nextval('foo_seq');
> nextval | ?column?
> ---------+----------
> 1 | 2
> (1 row)
>

I think that would be a change in semantics, which we should definitely
not be getting. Avoiding a change in semantics might be an interesting
exercise, but we have lots of clever coders ...

cheers

andrew

--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: CTE inlining
Date: 2017-05-04 18:00:11
Message-ID: 344e31f7-deab-d242-d709-2d4b5984aa50@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 5/4/17 7:56 PM, Andrew Dunstan wrote:
>
>
> On 05/04/2017 01:52 PM, Joe Conway wrote:
>> On 05/04/2017 10:33 AM, Alvaro Herrera wrote:
>>> I'm not sure what your point is. We know that for some cases the
>>> optimization barrier semantics are useful, which is why the proposal is
>>> to add a keyword to install one explicitely:
>>>
>>> with materialized r as
>>> (
>>> select json_populate_record(null::mytype, myjson) as x
>>> from mytable
>>> )
>>> select (x).*
>>> from r;
>>>
>>> this would preserve the current semantics.
>> I haven't been able to follow this incredibly long thread, so please
>> excuse me if way off base, but are we talking about that a CTE would be
>> silently be rewritten as an inline expression potentially unless it is
>> decorated with some new syntax?
>>
>> I would find that very disconcerting myself. For example, would this CTE
>> potentially get rewritten with multiple evaluation as follows?
>>
>> DROP SEQUENCE IF EXISTS foo_seq;
>> CREATE SEQUENCE foo_seq;
>>
>> WITH a(f1) AS (SELECT nextval('foo_seq'))
>> SELECT a.f1, a.f1 FROM a;
>> f1 | ?column?
>> ----+----------
>> 1 | 1
>> (1 row)
>>
>> ALTER SEQUENCE foo_seq RESTART;
>> SELECT nextval('foo_seq'), nextval('foo_seq');
>> nextval | ?column?
>> ---------+----------
>> 1 | 2
>> (1 row)
>>
>
>
>
> I think that would be a change in semantics, which we should definitely
> not be getting. Avoiding a change in semantics might be an interesting
> exercise, but we have lots of clever coders ...
>

nextval is a volatile function, and in my understanding the plan was not
to inline CTEs with volatile functions (or CTEs doing writes etc.). That
is, we'd guarantee the same results as we get now.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Joe Conway <mail(at)joeconway(dot)com>
To: Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Serge Rielau <serge(at)rielau(dot)com>, Gavin Flower <GavinFlower(at)archidevsys(dot)co(dot)nz>, Craig Ringer <craig(dot)ringer(at)2ndquadrant(dot)com>, Ilya Shkuratov <motr(dot)ilya(at)ya(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Mario Becroft <mb(at)true(dot)group>
Subject: Re: CTE inlining
Date: 2017-05-04 18:03:13
Message-ID: 07133bf2-ecc4-77ea-f9c0-4b802dacce54@joeconway.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 05/04/2017 10:56 AM, Andrew Dunstan wrote:
>
>
> On 05/04/2017 01:52 PM, Joe Conway wrote:
>> On 05/04/2017 10:33 AM, Alvaro Herrera wrote:
>>> I'm not sure what your point is. We know that for some cases the
>>> optimization barrier semantics are useful, which is why the proposal is
>>> to add a keyword to install one explicitely:
>>>
>>> with materialized r as
>>> (
>>> select json_populate_record(null::mytype, myjson) as x
>>> from mytable
>>> )
>>> select (x).*
>>> from r;
>>>
>>> this would preserve the current semantics.
>> I haven't been able to follow this incredibly long thread, so please
>> excuse me if way off base, but are we talking about that a CTE would be
>> silently be rewritten as an inline expression potentially unless it is
>> decorated with some new syntax?
>>
>> I would find that very disconcerting myself. For example, would this CTE
>> potentially get rewritten with multiple evaluation as follows?
>>
>> DROP SEQUENCE IF EXISTS foo_seq;
>> CREATE SEQUENCE foo_seq;
>>
>> WITH a(f1) AS (SELECT nextval('foo_seq'))
>> SELECT a.f1, a.f1 FROM a;
>> f1 | ?column?
>> ----+----------
>> 1 | 1
>> (1 row)
>>
>> ALTER SEQUENCE foo_seq RESTART;
>> SELECT nextval('foo_seq'), nextval('foo_seq');
>> nextval | ?column?
>> ---------+----------
>> 1 | 2
>> (1 row)
>>
>
> I think that would be a change in semantics, which we should definitely
> not be getting. Avoiding a change in semantics might be an interesting
> exercise, but we have lots of clever coders ...

Well I think my point is that I always have understood CTEs to be
executed precisely once producing a temporary result set that is then
referenced elsewhere. I don't think that property of CTEs should change.
Somewhere else in the thread someone mentioned predicate push down --
that makes sense and maybe some clever coder can come up with a patch
that does that, but I would not be in favor of CTEs being inlined and
therefore evaluated multiple times.

Joe

--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, Serge Rielau <serge(at)rielau(dot)com>, Gavin Flower <GavinFlower(at)archidevsys(dot)co(dot)nz>, Craig Ringer <craig(dot)ringer(at)2ndquadrant(dot)com>, Ilya Shkuratov <motr(dot)ilya(at)ya(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Mario Becroft <mb(at)true(dot)group>
Subject: Re: CTE inlining
Date: 2017-05-04 18:37:46
Message-ID: 23652.1493923066@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> writes:
> I'm not sure what your point is. We know that for some cases the
> optimization barrier semantics are useful, which is why the proposal is
> to add a keyword to install one explicitely:

> with materialized r as
> (
> select json_populate_record(null::mytype, myjson) as x
> from mytable
> )
> select (x).*
> from r;

> this would preserve the current semantics.

Uh, no, it wouldn't, until you had changed your query (and made it
non-SQL-compliant, and unable to work at all on pre-v11 PG).
This might be a good design, but surely it does not provide backwards
compatibility.

regards, tom lane


From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Joe Conway <mail(at)joeconway(dot)com>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Serge Rielau <serge(at)rielau(dot)com>, Gavin Flower <GavinFlower(at)archidevsys(dot)co(dot)nz>, Craig Ringer <craig(dot)ringer(at)2ndquadrant(dot)com>, Ilya Shkuratov <motr(dot)ilya(at)ya(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Mario Becroft <mb(at)true(dot)group>
Subject: Re: CTE inlining
Date: 2017-05-04 18:39:31
Message-ID: ffb2c8a3-7086-7f57-f9fe-19a632e4011d@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 5/4/17 8:03 PM, Joe Conway wrote:
> On 05/04/2017 10:56 AM, Andrew Dunstan wrote:
>>
>>
>> On 05/04/2017 01:52 PM, Joe Conway wrote:
>>> On 05/04/2017 10:33 AM, Alvaro Herrera wrote:
>>>> I'm not sure what your point is. We know that for some cases the
>>>> optimization barrier semantics are useful, which is why the proposal is
>>>> to add a keyword to install one explicitely:
>>>>
>>>> with materialized r as
>>>> (
>>>> select json_populate_record(null::mytype, myjson) as x
>>>> from mytable
>>>> )
>>>> select (x).*
>>>> from r;
>>>>
>>>> this would preserve the current semantics.
>>> I haven't been able to follow this incredibly long thread, so please
>>> excuse me if way off base, but are we talking about that a CTE would be
>>> silently be rewritten as an inline expression potentially unless it is
>>> decorated with some new syntax?
>>>
>>> I would find that very disconcerting myself. For example, would this CTE
>>> potentially get rewritten with multiple evaluation as follows?
>>>
>>> DROP SEQUENCE IF EXISTS foo_seq;
>>> CREATE SEQUENCE foo_seq;
>>>
>>> WITH a(f1) AS (SELECT nextval('foo_seq'))
>>> SELECT a.f1, a.f1 FROM a;
>>> f1 | ?column?
>>> ----+----------
>>> 1 | 1
>>> (1 row)
>>>
>>> ALTER SEQUENCE foo_seq RESTART;
>>> SELECT nextval('foo_seq'), nextval('foo_seq');
>>> nextval | ?column?
>>> ---------+----------
>>> 1 | 2
>>> (1 row)
>>>
>>
>> I think that would be a change in semantics, which we should definitely
>> not be getting. Avoiding a change in semantics might be an interesting
>> exercise, but we have lots of clever coders ...
>
> Well I think my point is that I always have understood CTEs to be
> executed precisely once producing a temporary result set that is then
> referenced elsewhere. I don't think that property of CTEs should change.
> Somewhere else in the thread someone mentioned predicate push down --
> that makes sense and maybe some clever coder can come up with a patch
> that does that, but I would not be in favor of CTEs being inlined and
> therefore evaluated multiple times.
>

I agree with this, but there's a difference between "executed exactly
once" and "producing the same result as if executed exactly once".

I may be misunderstanding what other people proposed in this thread, but
I think the plan was to only inline CTEs where we know it won't change
the results, etc. So e.g. CTEs with volatile functions would not get
inlined, which includes nextval() for example.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Joe Conway <mail(at)joeconway(dot)com>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, Serge Rielau <serge(at)rielau(dot)com>, Gavin Flower <GavinFlower(at)archidevsys(dot)co(dot)nz>, Craig Ringer <craig(dot)ringer(at)2ndquadrant(dot)com>, Ilya Shkuratov <motr(dot)ilya(at)ya(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Mario Becroft <mb(at)true(dot)group>
Subject: Re: CTE inlining
Date: 2017-05-04 18:52:34
Message-ID: 24165.1493923954@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> writes:
> On 5/4/17 8:03 PM, Joe Conway wrote:
>>> I haven't been able to follow this incredibly long thread, so please
>>> excuse me if way off base, but are we talking about that a CTE would be
>>> silently be rewritten as an inline expression potentially unless it is
>>> decorated with some new syntax?

> I agree with this, but there's a difference between "executed exactly
> once" and "producing the same result as if executed exactly once".

> I may be misunderstanding what other people proposed in this thread, but
> I think the plan was to only inline CTEs where we know it won't change
> the results, etc. So e.g. CTEs with volatile functions would not get
> inlined, which includes nextval() for example.

I haven't been keeping close tabs either, but surely we still have to have
the optimization fence in (at least) all these cases:

* CTE contains INSERT/UPDATE/DELETE
* CTE contains SELECT FOR UPDATE/SHARE (else the set of rows that get
locked might change)
* CTE contains volatile functions

I'm willing to write off cases where, eg, a function should have been
marked volatile and was not. That's user error and there are plenty
of hazards of that kind already. But if the optimizer has reason
to know that discarding the fence might change any query side-effects,
it mustn't.

regards, tom lane


From: Serge Rielau <serge(at)rielau(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Joe Conway <mail(at)joeconway(dot)com>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, Gavin Flower <GavinFlower(at)archidevsys(dot)co(dot)nz>, Craig Ringer <craig(dot)ringer(at)2ndquadrant(dot)com>, Ilya Shkuratov <motr(dot)ilya(at)ya(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Mario Becroft <mb(at)true(dot)group>
Subject: Re: CTE inlining
Date: 2017-05-04 18:59:32
Message-ID: aec1bf28-ded4-4b92-9bd4-6af1f21e394e@rielau.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I haven't been keeping close tabs either, but surely we still have to have
the optimization fence in (at least) all these cases:

* CTE contains INSERT/UPDATE/DELETE
* CTE contains SELECT FOR UPDATE/SHARE (else the set of rows that get
locked might change)
* CTE contains volatile functions

I'm willing to write off cases where, eg, a function should have been
marked volatile and was not. That's user error and there are plenty
of hazards of that kind already. But if the optimizer has reason
to know that discarding the fence might change any query side-effects,
it mustn't. Yes! +100
Cheers Serge


From: Gavin Flower <GavinFlower(at)archidevsys(dot)co(dot)nz>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Joe Conway <mail(at)joeconway(dot)com>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Serge Rielau <serge(at)rielau(dot)com>, Craig Ringer <craig(dot)ringer(at)2ndquadrant(dot)com>, Ilya Shkuratov <motr(dot)ilya(at)ya(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Mario Becroft <mb(at)true(dot)group>
Subject: Re: CTE inlining
Date: 2017-05-04 21:24:10
Message-ID: 1cba6cad-4323-8e9f-563a-34d569fa3bab@archidevsys.co.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 05/05/17 06:39, Tomas Vondra wrote:
>
>
> On 5/4/17 8:03 PM, Joe Conway wrote:
>> On 05/04/2017 10:56 AM, Andrew Dunstan wrote:
>>>
>>>
>>> On 05/04/2017 01:52 PM, Joe Conway wrote:
>>>> On 05/04/2017 10:33 AM, Alvaro Herrera wrote:
>>>>> I'm not sure what your point is. We know that for some cases the
>>>>> optimization barrier semantics are useful, which is why the
>>>>> proposal is
>>>>> to add a keyword to install one explicitely:
>>>>>
>>>>> with materialized r as
>>>>> (
>>>>> select json_populate_record(null::mytype, myjson) as x
>>>>> from mytable
>>>>> )
>>>>> select (x).*
>>>>> from r;
>>>>>
>>>>> this would preserve the current semantics.
>>>> I haven't been able to follow this incredibly long thread, so please
>>>> excuse me if way off base, but are we talking about that a CTE
>>>> would be
>>>> silently be rewritten as an inline expression potentially unless it is
>>>> decorated with some new syntax?
>>>>
>>>> I would find that very disconcerting myself. For example, would
>>>> this CTE
>>>> potentially get rewritten with multiple evaluation as follows?
>>>>
>>>> DROP SEQUENCE IF EXISTS foo_seq;
>>>> CREATE SEQUENCE foo_seq;
>>>>
>>>> WITH a(f1) AS (SELECT nextval('foo_seq'))
>>>> SELECT a.f1, a.f1 FROM a;
>>>> f1 | ?column?
>>>> ----+----------
>>>> 1 | 1
>>>> (1 row)
>>>>
>>>> ALTER SEQUENCE foo_seq RESTART;
>>>> SELECT nextval('foo_seq'), nextval('foo_seq');
>>>> nextval | ?column?
>>>> ---------+----------
>>>> 1 | 2
>>>> (1 row)
>>>>
>>>
>>> I think that would be a change in semantics, which we should definitely
>>> not be getting. Avoiding a change in semantics might be an interesting
>>> exercise, but we have lots of clever coders ...
>>
>> Well I think my point is that I always have understood CTEs to be
>> executed precisely once producing a temporary result set that is then
>> referenced elsewhere. I don't think that property of CTEs should change.
>> Somewhere else in the thread someone mentioned predicate push down --
>> that makes sense and maybe some clever coder can come up with a patch
>> that does that, but I would not be in favor of CTEs being inlined and
>> therefore evaluated multiple times.
>>
>
> I agree with this, but there's a difference between "executed exactly
> once" and "producing the same result as if executed exactly once".
>
> I may be misunderstanding what other people proposed in this thread,
> but I think the plan was to only inline CTEs where we know it won't
> change the results, etc. So e.g. CTEs with volatile functions would
> not get inlined, which includes nextval() for example.
>
> regards
>
It was the behaviour of "producing the same result as if executed
exactly once" that I was thinking of - I think this is still valid for
triggers & volatile functions, but such behaviour should be clearly
documented. This what I implicitly thought about CTE's when I first
came across them - to me it is the intuitively obvious behaviour.
However, limiting the rows based on the body of the SELECT would often
be a very useful optimisation

Cheers,
Gavin


From: Andreas Karlsson <andreas(at)proxel(dot)se>
To: Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Serge Rielau <serge(at)rielau(dot)com>
Cc: Gavin Flower <GavinFlower(at)archidevsys(dot)co(dot)nz>, Craig Ringer <craig(dot)ringer(at)2ndquadrant(dot)com>, Ilya Shkuratov <motr(dot)ilya(at)ya(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Mario Becroft <mb(at)true(dot)group>
Subject: Re: CTE inlining
Date: 2017-05-04 22:04:46
Message-ID: 19fe51b0-1d0b-a72c-99ee-aacb5b4930f8@proxel.se
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 05/04/2017 06:22 PM, Andrew Dunstan wrote:
> I wrote this query:
>
> select (json_populate_record(null::mytype, myjson)).*
> from mytable;
>
>
> It turned out that this was an order of magnitude faster:
>
> with r as
> (
> select json_populate_record(null::mytype, myjson) as x
> from mytable
> )
> select (x).*
> from r;

I do not know the planner that well, but I imagined that when we remove
the optimization fence that one would be evaluated similar to if it had
been a lateral join, i.e. there would be no extra function calls in this
case after removing the fence.

Andreas


From: Craig Ringer <craig(dot)ringer(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Mario Becroft <mb(at)true(dot)group>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Ilya Shkuratov <motr(dot)ilya(at)ya(dot)ru>, Joe Conway <mail(at)joeconway(dot)com>, "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Gavin Flower <GavinFlower(at)archidevsys(dot)co(dot)nz>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Serge Rielau <serge(at)rielau(dot)com>
Subject: Re: CTE inlining
Date: 2017-05-05 00:03:01
Message-ID: CAMsr+YG1O7hdbSWozO4fyTNf9pcXQEv=1Om1=THCJwQJFow97g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 5 May 2017 02:52, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> writes:
> On 5/4/17 8:03 PM, Joe Conway wrote:
>>> I haven't been able to follow this incredibly long thread, so please
>>> excuse me if way off base, but are we talking about that a CTE would be
>>> silently be rewritten as an inline expression potentially unless it is
>>> decorated with some new syntax?

> I agree with this, but there's a difference between "executed exactly
> once" and "producing the same result as if executed exactly once".

> I may be misunderstanding what other people proposed in this thread, but
> I think the plan was to only inline CTEs where we know it won't change
> the results, etc. So e.g. CTEs with volatile functions would not get
> inlined, which includes nextval() for example.

I haven't been keeping close tabs either, but surely we still have to have
the optimization fence in (at least) all these cases:

* CTE contains INSERT/UPDATE/DELETE
* CTE contains SELECT FOR UPDATE/SHARE (else the set of rows that get
locked might change)
* CTE contains volatile functions

I'm willing to write off cases where, eg, a function should have been
marked volatile and was not. That's user error and there are plenty
of hazards of that kind already. But if the optimizer has reason
to know that discarding the fence might change any query side-effects,
it mustn't.

I think everyone is in total agreement there.


From: Craig Ringer <craig(dot)ringer(at)2ndquadrant(dot)com>
To: Andreas Karlsson <andreas(at)proxel(dot)se>
Cc: Mario Becroft <mb(at)true(dot)group>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Ilya Shkuratov <motr(dot)ilya(at)ya(dot)ru>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Gavin Flower <GavinFlower(at)archidevsys(dot)co(dot)nz>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Serge Rielau <serge(at)rielau(dot)com>
Subject: Re: CTE inlining
Date: 2017-05-05 00:16:42
Message-ID: CAMsr+YFcfGuAdKiJ+fNV7mSKTjV-T75+Sj18wBONvT6Q6-tXKw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 5 May 2017 06:04, "Andreas Karlsson" <andreas(at)proxel(dot)se> wrote:

On 05/04/2017 06:22 PM, Andrew Dunstan wrote:

> I wrote this query:
>
> select (json_populate_record(null::mytype, myjson)).*
> from mytable;
>
>
> It turned out that this was an order of magnitude faster:
>
> with r as
> (
> select json_populate_record(null::mytype, myjson) as x
> from mytable
> )
> select (x).*
> from r;
>

I do not know the planner that well, but I imagined that when we remove the
optimization fence that one would be evaluated similar to if it had been a
lateral join, i.e. there would be no extra function calls in this case
after removing the fence.

Sort of. PostgreSQL has a wart around (x).* expansion where it's
essentially macro-expanded into

(x).a, (x).b, (x).c, ...

Now, if x is a function call, PG will merrily execute it n times for its n
output columns.

Andres is working on fixing this. And it's trivially worked around with a
lateral query ; the above would be better written as

select (x).*
from mytable
cross join lateral json_populate_record(null::mytype, myjson) as x;

So this example just abuses our optimiser hint behaviour for CTEs to avoid
solving a planner issue (why project policy is against hints). But there's
already a solution.

I'm finding it increasingly hilarious watching people vociferously
defending their one precious (semi-documented) query/optimiser hint in
PostgreSQL. The one we don't admit is a hint, but treat as one by avoiding
optimising across it when it's​ safe to do so.

We can't remove or change our precious hint because we need it to solve
production issues. But we don't have hints because then people wouldn't
report planner/optimiser issues, would lock in bad plans and then complain
about it, etc.

Just like what's happening here. And people are leaping to defend it, lest
we risk exposing performance issues by changing anything, even though all
we're doing is documenting what is already so.

Hey. Crazy idea for backward compat to address Tom's complaint that adding
explicit syntax would require people who wanted the old behaviour to make
their queries incompatible with pg10 and below. Add the "MATERIALIZED"
keyword or whatever. The back patch the keyword as a no-op, since that's
what we already do in back branches. I can't see anything that could
possibly break in that context so long as we only go as far back as it was
already a keyword elsewhere.

We could at least add it to pg10.


From: Joe Conway <mail(at)joeconway(dot)com>
To: Craig Ringer <craig(dot)ringer(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Mario Becroft <mb(at)true(dot)group>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Ilya Shkuratov <motr(dot)ilya(at)ya(dot)ru>, "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Gavin Flower <GavinFlower(at)archidevsys(dot)co(dot)nz>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Serge Rielau <serge(at)rielau(dot)com>
Subject: Re: CTE inlining
Date: 2017-05-05 00:17:05
Message-ID: 4f98ca61-798b-8751-b2a7-7b0af2bc237b@joeconway.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 05/04/2017 05:03 PM, Craig Ringer wrote:
> On 5 May 2017 02:52, "Tom Lane" wrote:
> I haven't been keeping close tabs either, but surely we still have
> to have
> the optimization fence in (at least) all these cases:
>
> * CTE contains INSERT/UPDATE/DELETE
> * CTE contains SELECT FOR UPDATE/SHARE (else the set of rows that get
> locked might change)
> * CTE contains volatile functions
>
> I'm willing to write off cases where, eg, a function should have been
> marked volatile and was not. That's user error and there are plenty
> of hazards of that kind already. But if the optimizer has reason
> to know that discarding the fence might change any query side-effects,
> it mustn't.
>
> I think everyone is in total agreement there.

That's great, but if so, why do we need any change in syntax at all?

Joe

--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development


From: Craig Ringer <craig(dot)ringer(at)2ndquadrant(dot)com>
To: Joe Conway <mail(at)joeconway(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Mario Becroft <mb(at)true(dot)group>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Ilya Shkuratov <motr(dot)ilya(at)ya(dot)ru>, "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Gavin Flower <GavinFlower(at)archidevsys(dot)co(dot)nz>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Serge Rielau <serge(at)rielau(dot)com>
Subject: Re: CTE inlining
Date: 2017-05-05 01:30:47
Message-ID: CAMsr+YG-c+Y4NYWvPtknSpzm13o0zwxhDPkw=dZm7JBtZwg40A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 5 May 2017 at 08:17, Joe Conway <mail(at)joeconway(dot)com> wrote:
> On 05/04/2017 05:03 PM, Craig Ringer wrote:
>> On 5 May 2017 02:52, "Tom Lane" wrote:
>> I haven't been keeping close tabs either, but surely we still have
>> to have
>> the optimization fence in (at least) all these cases:
>>
>> * CTE contains INSERT/UPDATE/DELETE
>> * CTE contains SELECT FOR UPDATE/SHARE (else the set of rows that get
>> locked might change)
>> * CTE contains volatile functions
>>
>> I'm willing to write off cases where, eg, a function should have been
>> marked volatile and was not. That's user error and there are plenty
>> of hazards of that kind already. But if the optimizer has reason
>> to know that discarding the fence might change any query side-effects,
>> it mustn't.
>>
>> I think everyone is in total agreement there.
>
> That's great, but if so, why do we need any change in syntax at all?

Because a lot of people use it as a query hint, and don't want to let
go of the one precious query hint they rely on in PostgreSQL to solve
urgent production issues or work around planner limitations. You can
currently (ab)use a CTE with fencing behaviour to force join order,
force evaluation of expensive functions, etc, if the planner would
otherwise pick a semantically identical plan that lands up running
slower.

Many people don't want to give up their hint because it's an important
tool in their "solve user/customer problems fast" toolbox.

Project policy says we don't want query hints, don't have them, and
won't add them. I understand the reasoning for this (and mostly
agree). But we kind of added one by accident anyway, and it's proving
rather hard to take it away!

We added it by documenting an implementation detail/limitation - that
there's limited or no inlining/pullup/pushdown across CTE terms.
People took that as blessing to rely on that behaviour in future
releases, treating it as an optimisation barrier query hint. Since
then some people have been promoting it as a workaround for
performance issues faced by queries with sub-optimal plans.

People "in the know" used to use OFFSET 0 for that instead, but it was
entirely undocumented and could go away in a future release without
any warning, plus it was uglier. The CTE limitation-as-feature got a
lot more press in blogs, etc since it's documented-ish.

So now we're collectively trying to get to a compromise position where
people who rely on it can keep the functionality by declaring the
no-inline requirement explicitly in their SQL where they depend on it.
That way it won't penalise everyone else who just wants the optimiser
to do its best the rest of the time, so we can just write SQL that
says "I want this outcome" without having to worry about a
semi-documented performance booby-trap. So surprise and confuse users
who're (reasonably) expecting the optimiser to treat a CTE like a
subquery, view or SQL function, those who're writing portable SQL and
those who're porting from other DBMSes.

The sticking point is that this seems to require admitting that it
serves as a hint, of sorts, where it forces the optimiser to choose a
different plan than its cost based estimates would suggest if all
candidate plans are semantically identical.

We're carefully maintaining this bizarre cognitive dissonance where we
justify the need for using this as a planner hint at the same time as
denying that we have a hint. That makes it hard to make progress here.
I think there's fear that we're setting some kind of precedent by
admitting what we already have.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Craig Ringer <craig(dot)ringer(at)2ndquadrant(dot)com>
Cc: Joe Conway <mail(at)joeconway(dot)com>, Mario Becroft <mb(at)true(dot)group>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Ilya Shkuratov <motr(dot)ilya(at)ya(dot)ru>, "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Gavin Flower <GavinFlower(at)archidevsys(dot)co(dot)nz>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Serge Rielau <serge(at)rielau(dot)com>
Subject: Re: CTE inlining
Date: 2017-05-05 02:04:29
Message-ID: 8786.1493949869@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Craig Ringer <craig(dot)ringer(at)2ndquadrant(dot)com> writes:
> We're carefully maintaining this bizarre cognitive dissonance where we
> justify the need for using this as a planner hint at the same time as
> denying that we have a hint. That makes it hard to make progress here.
> I think there's fear that we're setting some kind of precedent by
> admitting what we already have.

I think you're overstating the case. It's clear that there's a
significant subset of CTE functionality where there has to be an
optimization fence. The initial implementation basically took the
easy way out by deeming *all* CTEs to be optimization fences. Maybe
we shouldn't have documented that behavior, but we did. Now we're
arguing about how much of a compatibility break it'd be to change that
planner behavior. I don't see any particular cognitive dissonance here,
just disagreements about the extent to which backwards compatibility is
more important than better query optimization.

regards, tom lane


From: Serge Rielau <serge(at)rielau(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Craig Ringer <craig(dot)ringer(at)2ndquadrant(dot)com>, Joe Conway <mail(at)joeconway(dot)com>, Mario Becroft <mb(at)true(dot)group>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Ilya Shkuratov <motr(dot)ilya(at)ya(dot)ru>, "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Gavin Flower <GavinFlower(at)archidevsys(dot)co(dot)nz>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>
Subject: Re: CTE inlining
Date: 2017-05-05 02:10:47
Message-ID: f326163b-f2b1-489a-94d6-6935efb0e49e@rielau.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

In my past life when I was faced with such debates I argued that the number
of customers We are hoping to attract in the future is much bigger than the
ones we risk offending. Doesn't mean I wanted to piss everyone off. Just
that I didn't want to be held hostage by history.
Cheers Serge
PS: On the opposing side was typically the mainframe crowd.... hint hint


From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Craig Ringer <craig(dot)ringer(at)2ndquadrant(dot)com>, Joe Conway <mail(at)joeconway(dot)com>, Mario Becroft <mb(at)true(dot)group>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Ilya Shkuratov <motr(dot)ilya(at)ya(dot)ru>, "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Gavin Flower <GavinFlower(at)archidevsys(dot)co(dot)nz>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Serge Rielau <serge(at)rielau(dot)com>
Subject: Re: CTE inlining
Date: 2017-05-05 02:14:24
Message-ID: 20170505021424.GS21223@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom, all,

* Tom Lane (tgl(at)sss(dot)pgh(dot)pa(dot)us) wrote:
> Craig Ringer <craig(dot)ringer(at)2ndquadrant(dot)com> writes:
> > We're carefully maintaining this bizarre cognitive dissonance where we
> > justify the need for using this as a planner hint at the same time as
> > denying that we have a hint. That makes it hard to make progress here.
> > I think there's fear that we're setting some kind of precedent by
> > admitting what we already have.
>
> I think you're overstating the case. It's clear that there's a
> significant subset of CTE functionality where there has to be an
> optimization fence. The initial implementation basically took the
> easy way out by deeming *all* CTEs to be optimization fences. Maybe
> we shouldn't have documented that behavior, but we did. Now we're
> arguing about how much of a compatibility break it'd be to change that
> planner behavior. I don't see any particular cognitive dissonance here,
> just disagreements about the extent to which backwards compatibility is
> more important than better query optimization.

This really begs the question of if it's "better."

If it's always better, then there's no discussion to be had. If there
are cases where it's not, then that could cause problems for people and
there's a reason to question if we're ok with that.

Of course, if a user runs into such an issue, there's a few ways they
can address it without needs any additional syntax (use a temp table
instead, introduce a volative function into the CTE, etc).

I had originally been on the fence about this myself, and even advocated
the new-syntax approach at one point (as I recall), but with the new
statistics we're gaining and the improvmenets in the planner, along with
parallel query and the advantages offered by being able to pull a CTE
into the main query, I think I've come around to agree that removing the
optimization fence, when it won't change the resullts, is a good thing
to do.

Thanks!

Stephen


From: Joe Conway <mail(at)joeconway(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Craig Ringer <craig(dot)ringer(at)2ndquadrant(dot)com>
Cc: Mario Becroft <mb(at)true(dot)group>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Ilya Shkuratov <motr(dot)ilya(at)ya(dot)ru>, "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Gavin Flower <GavinFlower(at)archidevsys(dot)co(dot)nz>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Serge Rielau <serge(at)rielau(dot)com>
Subject: Re: CTE inlining
Date: 2017-05-05 02:57:21
Message-ID: 894e6690-a9cf-cc94-f457-94c7d9c120dd@joeconway.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 05/04/2017 07:04 PM, Tom Lane wrote:
> Craig Ringer <craig(dot)ringer(at)2ndquadrant(dot)com> writes:
>> We're carefully maintaining this bizarre cognitive dissonance where we
>> justify the need for using this as a planner hint at the same time as
>> denying that we have a hint. That makes it hard to make progress here.
>> I think there's fear that we're setting some kind of precedent by
>> admitting what we already have.
>
> I think you're overstating the case. It's clear that there's a
> significant subset of CTE functionality where there has to be an
> optimization fence. The initial implementation basically took the
> easy way out by deeming *all* CTEs to be optimization fences. Maybe
> we shouldn't have documented that behavior, but we did. Now we're
> arguing about how much of a compatibility break it'd be to change that
> planner behavior. I don't see any particular cognitive dissonance here,
> just disagreements about the extent to which backwards compatibility is
> more important than better query optimization.

Exactly.

One thought, is that we treat a CTE in a similar way to foreign tables,
with the same set of push downs.

Joe

--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development


From: Andres Freund <andres(at)anarazel(dot)de>
To: Joe Conway <mail(at)joeconway(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Craig Ringer <craig(dot)ringer(at)2ndquadrant(dot)com>, Mario Becroft <mb(at)true(dot)group>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Ilya Shkuratov <motr(dot)ilya(at)ya(dot)ru>, "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Gavin Flower <GavinFlower(at)archidevsys(dot)co(dot)nz>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Serge Rielau <serge(at)rielau(dot)com>
Subject: Re: CTE inlining
Date: 2017-05-05 03:00:02
Message-ID: 20170505030002.rn63hfynu25fp6v7@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

On 2017-05-04 19:57:21 -0700, Joe Conway wrote:
> One thought, is that we treat a CTE in a similar way to foreign tables,
> with the same set of push downs.

A bit surprised about that suggestion - there seems to be very little
similarity between the cases. What'd be the benefit of that? With FDWs
there's not even a common set of things pushed down between different
FDW users.

Greetings,

Andres Freund


From: Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at>
To: "Thomas Kellerer *EXTERN*" <spam_eater(at)gmx(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: CTE inlining
Date: 2017-05-05 07:27:58
Message-ID: A737B7A37273E048B164557ADEF4A58B53A347E8@ntex2010i.host.magwien.gv.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Thomas Kellerer wrote:
>> 1) we switch unmarked CTEs as inlineable by default in pg11.
>
> +1 from me for option 1

+1 from me as well, FWIW.

Yours,
Laurenz Albe


From: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Craig Ringer <craig(dot)ringer(at)2ndquadrant(dot)com>, Joe Conway <mail(at)joeconway(dot)com>, Mario Becroft <mb(at)true(dot)group>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Ilya Shkuratov <motr(dot)ilya(at)ya(dot)ru>, "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Gavin Flower <GavinFlower(at)archidevsys(dot)co(dot)nz>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Serge Rielau <serge(at)rielau(dot)com>
Subject: Re: CTE inlining
Date: 2017-05-05 12:43:33
Message-ID: CAKJS1f-zcdda_6Rskry8cwaVmrmdPg1XBAE40-5+fWHjnwmK=g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 5 May 2017 at 14:04, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Craig Ringer <craig(dot)ringer(at)2ndquadrant(dot)com> writes:
>> We're carefully maintaining this bizarre cognitive dissonance where we
>> justify the need for using this as a planner hint at the same time as
>> denying that we have a hint. That makes it hard to make progress here.
>> I think there's fear that we're setting some kind of precedent by
>> admitting what we already have.
>
> I think you're overstating the case. It's clear that there's a
> significant subset of CTE functionality where there has to be an
> optimization fence. The initial implementation basically took the
> easy way out by deeming *all* CTEs to be optimization fences. Maybe
> we shouldn't have documented that behavior, but we did. Now we're
> arguing about how much of a compatibility break it'd be to change that
> planner behavior. I don't see any particular cognitive dissonance here,
> just disagreements about the extent to which backwards compatibility is
> more important than better query optimization.

How about we get the ball rolling on this in v10 and pull that part
out of the docs. If anything that'll buy us a bit more wiggle room to
change this in v11.

I've attached a proposed patch.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment Content-Type Size
doc_caution_about_cte_changes_in_the_future.patch application/octet-stream 2.7 KB

From: Vik Fearing <vik(dot)fearing(at)2ndquadrant(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, David Fetter <david(at)fetter(dot)org>
Cc: Claudio Freire <klaussfreire(at)gmail(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Andreas Karlsson <andreas(at)proxel(dot)se>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: CTE inlining
Date: 2017-05-06 20:38:33
Message-ID: e11d1ce2-5b62-234e-5d21-69d8c733809a@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 05/03/2017 07:33 PM, Alvaro Herrera wrote:
> 1) we switch unmarked CTEs as inlineable by default in pg11. What seems
> likely to happen for a user that upgrades to pg11 is that 5 out of 10
> CTE-using queries are going to become faster than with pg10, and they
> are going to be happy; 4 out of five are going to see no difference, but
> they didn't have to do anything about it; and the remaining query is
> going to become slower, either indistinguishably so (in which case they
> don't care and they remain happy because of the other improvements) or
> notably so, in which case they can easily figure where to add the
> MATERIALIZED option and regain the original performance.

I am overwhelmingly in favor of this option.

I am in favor of an enable_inlinedcte guc in the same spirit as the
other enable_* gucs, but violently opposed to any "compatibility" guc.

--
Vik Fearing +33 6 46 75 15 36
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


From: Ilya Shkuratov <motr(dot)ilya(at)ya(dot)ru>
To: Vik Fearing <vik(dot)fearing(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, David Fetter <david(at)fetter(dot)org>
Cc: Claudio Freire <klaussfreire(at)gmail(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Andreas Karlsson <andreas(at)proxel(dot)se>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: CTE inlining
Date: 2017-05-09 18:26:24
Message-ID: 8550391494354384@web15m.yandex.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Ok, it seems that most people in discussion are agree that removing optimization
fence is a right thing to do. But so far the main topic was whether it worth to
make "inlining" by default, and how we should enable it.

Nonetheless I still hoping to discuss the algorithm and its implementation.

I suppose, in case of a single reference we can validate CTE subquery and inline it
just before SS_process_ctes() in subquery_planner() and then process remaining
CTEs as before.

The case of multiple reference is more interesting.
Ideally, we would decide whether to inline just before pull_up_sublinks(), so all
the optimizations can be applied to inlined subquery. But It is impossible as we
have no information to build subquery paths and estimate they costs at this point.
All necessary initialization is performed in query_planner(), that invoked far
later in grouping_planner(). (As far as I understand.)

The most straighforward way is to compare CTE scan cost with subquery execution
and result scan cost in set_rel_size(), just after set_cte_pathlist(), and alter
RelOptInfo, if we choose to inline.
(e.g (CTE scan) < (cheapest_path(subquery) + subquery scan))
This way we still can push down predicates as it is performed in
set_subquery_pathlist(), but we missed pull_up_subquery().
Besides, it seems like a dirty quick solution.

Maybe it possible to add subquery scan to RTE_CTE RelOptInfo, but I'm not sure.

So what is a right way to conduct comparison between CTE scan and subquery
execution with subsequent scan?

I am new to PostgreSQL development, so I need a guidance from someone who
familiar with optimizer infrastructure to ensure that I moving in a right
direction and not making something weird.

P.S. There is a paper [1] describing implementation of CTE optimization in Orca
optimizer. It may be useful, though architecture is completely different.

[1] Optimization of Common Table Expressions in MPP Database Systems
(http://www.vldb.org/pvldb/vol8/p1704-elhelw.pdf)

Ilya Shkuratov


From: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>
To: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Craig Ringer <craig(dot)ringer(at)2ndquadrant(dot)com>, Joe Conway <mail(at)joeconway(dot)com>, Mario Becroft <mb(at)true(dot)group>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Ilya Shkuratov <motr(dot)ilya(at)ya(dot)ru>, "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Gavin Flower <GavinFlower(at)archidevsys(dot)co(dot)nz>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Serge Rielau <serge(at)rielau(dot)com>
Subject: Re: CTE inlining
Date: 2017-05-09 19:15:41
Message-ID: 20bbf396-eeea-f82c-ed5f-7dbcbbc5b76f@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 5/5/17 08:43, David Rowley wrote:
> How about we get the ball rolling on this in v10 and pull that part
> out of the docs. If anything that'll buy us a bit more wiggle room to
> change this in v11.
>
> I've attached a proposed patch.

If we just tell them that the thing they might have relied on might go
away, without a replacement to suggest, then we're just confusing and
scaring them, no?

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
To: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>
Cc: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Craig Ringer <craig(dot)ringer(at)2ndquadrant(dot)com>, Joe Conway <mail(at)joeconway(dot)com>, Mario Becroft <mb(at)true(dot)group>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Ilya Shkuratov <motr(dot)ilya(at)ya(dot)ru>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Gavin Flower <GavinFlower(at)archidevsys(dot)co(dot)nz>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Serge Rielau <serge(at)rielau(dot)com>
Subject: Re: CTE inlining
Date: 2017-05-09 19:22:54
Message-ID: CAKFQuwaOxwMMqOEgG0BxquW_nnFf=+NB-RGAFb4_YNtbFJQqMw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, May 9, 2017 at 12:15 PM, Peter Eisentraut <
peter(dot)eisentraut(at)2ndquadrant(dot)com> wrote:

> On 5/5/17 08:43, David Rowley wrote:
> > How about we get the ball rolling on this in v10 and pull that part
> > out of the docs. If anything that'll buy us a bit more wiggle room to
> > change this in v11.
> >
> > I've attached a proposed patch.
>
> If we just tell them that the thing they might have relied on might go
> away, without a replacement to suggest, then we're just confusing and
> scaring them, no?
>

​We'd end up suggesting our OFFSET 0 hack as true protection. If they know
for a fact that their use of CTE for its barrier properties is not
supported they are also more likely to document intentional usage with
something like: "-- CHANGE THIS ONCE VERSION 11 IS RELEASED!!! --" which
would make finding the call sites that need to add the new "MATERIALIZED"
​keyword much easier.

David J.


From: Serge Rielau <serge(at)rielau(dot)com>
To: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
Cc: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Craig Ringer <craig(dot)ringer(at)2ndquadrant(dot)com>, Joe Conway <mail(at)joeconway(dot)com>, Mario Becroft <mb(at)true(dot)group>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Ilya Shkuratov <motr(dot)ilya(at)ya(dot)ru>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Gavin Flower <GavinFlower(at)archidevsys(dot)co(dot)nz>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>
Subject: Re: CTE inlining
Date: 2017-05-09 19:29:31
Message-ID: be1f7b5f-0099-4874-8237-df672046f23e@rielau.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, May 9, 2017 at 12:22 PM, David G. Johnston <david(dot)g(dot)johnston(at)gmail(dot)com> wrote:
On Tue, May 9, 2017 at 12:15 PM, Peter Eisentraut < peter(dot)eisentraut(at)2ndquadrant(dot)com [peter(dot)eisentraut(at)2ndquadrant(dot)com] > wrote:
On 5/5/17 08:43, David Rowley wrote:
> How about we get the ball rolling on this in v10 and pull that part
> out of the docs. If anything that'll buy us a bit more wiggle room to
> change this in v11.
>
> I've attached a proposed patch.

If we just tell them that the thing they might have relied on might go
away, without a replacement to suggest, then we're just confusing and
scaring them, no?

We'd end up suggesting our OFFSET 0 hack as true protection. If they know for a fact that their use of CTE for its barrier properties is not supported they are also more likely to document intentional usage with something like: "-- CHANGE THIS ONCE VERSION 11 IS RELEASED!!! --" which would make finding the call sites that need to add the new "MATERIALIZED" ​keyword much easier. How about adding MATERIALIZED now (in 10) as a noise word. Give people a release to switch over before pulling the rug.. Cheers Serge


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
Cc: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Craig Ringer <craig(dot)ringer(at)2ndquadrant(dot)com>, Joe Conway <mail(at)joeconway(dot)com>, Mario Becroft <mb(at)true(dot)group>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Ilya Shkuratov <motr(dot)ilya(at)ya(dot)ru>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Gavin Flower <GavinFlower(at)archidevsys(dot)co(dot)nz>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Serge Rielau <serge(at)rielau(dot)com>
Subject: Re: CTE inlining
Date: 2017-05-09 19:36:42
Message-ID: 17491.1494358602@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"David G. Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com> writes:
> On Tue, May 9, 2017 at 12:15 PM, Peter Eisentraut <
> peter(dot)eisentraut(at)2ndquadrant(dot)com> wrote:
>> If we just tell them that the thing they might have relied on might go
>> away, without a replacement to suggest, then we're just confusing and
>> scaring them, no?

> ​We'd end up suggesting our OFFSET 0 hack as true protection.

Considering that many of the commenters in this thread view OFFSET 0
as a vile hack that ought to go away, I can hardly see how that's
an improvement.

I tend to agree with Peter that there's no need to do anything until
we have a committable code improvement. Documentation changes that
push people towards adding OFFSET 0, without any certainty that that
will be the long-term answer, do not seem like a net positive.

Also, considering that this behavior has been there since 8.4,
I think it's sheerest chutzpah to claim that changing the docs in
v10 would materially reduce the backward-compatibility concerns
for whatever we might do in v11.

regards, tom lane


From: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Craig Ringer <craig(dot)ringer(at)2ndquadrant(dot)com>, Joe Conway <mail(at)joeconway(dot)com>, Mario Becroft <mb(at)true(dot)group>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Ilya Shkuratov <motr(dot)ilya(at)ya(dot)ru>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Gavin Flower <GavinFlower(at)archidevsys(dot)co(dot)nz>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Serge Rielau <serge(at)rielau(dot)com>
Subject: Re: CTE inlining
Date: 2017-05-09 19:51:58
Message-ID: CAKFQuwYpQ0XoyEWaxQtDY+soaSZhPoLr_SxFizgaEakfiT_=cw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, May 9, 2017 at 12:36 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Also, considering that this behavior has been there since 8.4,
> I think it's sheerest chutzpah to claim that changing the docs in
> v10 would materially reduce the backward-compatibility concerns
> for whatever we might do in v11.
>

​No it won't, but those who are using 10 as their first version would
probably be happy if this was covered in a bit more depth. Even a comment
like "Unlike most other DBMS PostgreSQL presently executes the subquery in
the CTE​ in relative isolation. It is suggested that you document any
intentional usage of this optimization fence as a query planning hint so
that should the default change in the future you can update the query to
support whatever official syntax is implemented to retain this behavior.

We cannot really help those who have been using this since 8.4 and won't
read the relevant docs because they don't need to learn anything new; but
we can at least avoid subjecting newer customers. I'm not that strongly in
favor of it but it would be a nice touch - even if it won't change the
risk/reward calculation in any meaningful way.

David J.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
Cc: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Craig Ringer <craig(dot)ringer(at)2ndquadrant(dot)com>, Joe Conway <mail(at)joeconway(dot)com>, Mario Becroft <mb(at)true(dot)group>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Ilya Shkuratov <motr(dot)ilya(at)ya(dot)ru>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Gavin Flower <GavinFlower(at)archidevsys(dot)co(dot)nz>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Serge Rielau <serge(at)rielau(dot)com>
Subject: Re: CTE inlining
Date: 2017-05-09 20:14:20
Message-ID: 18985.1494360860@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"David G. Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com> writes:
> On Tue, May 9, 2017 at 12:36 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Also, considering that this behavior has been there since 8.4,
>> I think it's sheerest chutzpah to claim that changing the docs in
>> v10 would materially reduce the backward-compatibility concerns
>> for whatever we might do in v11.

> ​No it won't, but those who are using 10 as their first version would
> probably be happy if this was covered in a bit more depth.

I think the existing doc text is perfectly clear (while David's proposed
replacement text is not).

> Even a comment
> like "Unlike most other DBMS PostgreSQL presently executes the subquery in
> the CTE​ in relative isolation. It is suggested that you document any
> intentional usage of this optimization fence as a query planning hint so
> that should the default change in the future you can update the query to
> support whatever official syntax is implemented to retain this behavior.

Well, TBH that is pre-judging what (if anything) is going to be changed
by a feature that we don't even have design consensus on, let alone a
patch for. I don't think that's an improvement or a good service to
our users; it's just promoting confusion. I think that until we do have
a design and a patch, we're better off leaving the docs reading the way
they have for the past eight years.

I'm also a bit mystified by the apparent urgency to change something,
anything, in this area when we're already past feature freeze. This
would be a fit subject for discussion several months from now when
v11 development opens, but right now it's just distracting us from
stabilizing v10.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Ilya Shkuratov <motr(dot)ilya(at)ya(dot)ru>
Cc: Vik Fearing <vik(dot)fearing(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, David Fetter <david(at)fetter(dot)org>, Claudio Freire <klaussfreire(at)gmail(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Andreas Karlsson <andreas(at)proxel(dot)se>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: CTE inlining
Date: 2017-05-09 21:14:19
Message-ID: 21113.1494364459@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Ilya Shkuratov <motr(dot)ilya(at)ya(dot)ru> writes:
> Ok, it seems that most people in discussion are agree that removing optimization
> fence is a right thing to do.
> Nonetheless I still hoping to discuss the algorithm and its implementation.

Yeah, so far we've mainly discussed whether to do that and how to control
it, not what the actual results would be.

> I suppose, in case of a single reference we can validate CTE subquery and inline it
> just before SS_process_ctes() in subquery_planner() and then process remaining
> CTEs as before.

Yeah, something like that ought to be possible. I'd prefer to think of it
as related to the work prepjointree.c does, ie pull_up_sublinks, which
raises the question why we do SS_process_ctes before that not after it.

You might want to start by postponing planning of CTE subqueries as late
as possible, without any other changes. In particular, I think it'd be
a good thing to postpone Path-to-Plan conversion of the subqueries
until much later, ideally not till the final toplevel create_plan() call.
This is tied up with early Plan creation for regular subqueries too, so
that might have to be an all-or-nothing conversion. But we aren't really
going to have much flexibility of planning for subqueries until we do
that.

> The case of multiple reference is more interesting.
> Ideally, we would decide whether to inline just before pull_up_sublinks(), so all
> the optimizations can be applied to inlined subquery. But It is impossible as we
> have no information to build subquery paths and estimate they costs at this point.

TBH, I would just ignore that completely, at least until the simpler
case is done and committed. Trying to bite that off as part of the
initial patch is likely to lead to never getting anything done at all.
And I'm not exactly convinced either that it will win often enough to
be worth the trouble, or that users would thank you for rewriting their
queries that way.

When and if we get to this, you could imagine tackling it a bit like
preprocess_minmax_aggregates, or like what I did with OR clauses in
<22002(dot)1487099934(at)sss(dot)pgh(dot)pa(dot)us>, ie just repeat a lot of the work
over again to get several complete Path trees, and then pick one.
This is kind of expensive and ugly but it's hard to see how to do it
sensibly in a bottom-up fashion.

regards, tom lane


From: Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
Cc: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Craig Ringer <craig(dot)ringer(at)2ndquadrant(dot)com>, Joe Conway <mail(at)joeconway(dot)com>, Mario Becroft <mb(at)true(dot)group>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Ilya Shkuratov <motr(dot)ilya(at)ya(dot)ru>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Gavin Flower <GavinFlower(at)archidevsys(dot)co(dot)nz>, Serge Rielau <serge(at)rielau(dot)com>
Subject: Re: CTE inlining
Date: 2017-05-09 22:01:01
Message-ID: 9110f786-a412-0b07-b59c-1844c3368a4f@2ndQuadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 05/09/2017 04:14 PM, Tom Lane wrote:
>
> Well, TBH that is pre-judging what (if anything) is going to be changed
> by a feature that we don't even have design consensus on, let alone a
> patch for. I don't think that's an improvement or a good service to
> our users; it's just promoting confusion. I think that until we do have
> a design and a patch, we're better off leaving the docs reading the way
> they have for the past eight years.
>
> I'm also a bit mystified by the apparent urgency to change something,
> anything, in this area when we're already past feature freeze. This
> would be a fit subject for discussion several months from now when
> v11 development opens, but right now it's just distracting us from
> stabilizing v10.
>
>

+1. We should not be in the business of documenting or declaring our
intention of having vaporware. Say we get a patch and for some reason it
gets rejected. We've had features that took two or three releases to get
accepted. This is why we've historically shied away from roadmaps. And
we've been bitten in the past by deprecating things only to undeprecate
them later.

cheers

andrew

--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Yaroslav <ladayaroslav(at)yandex(dot)ru>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: CTE inlining
Date: 2017-05-11 21:37:27
Message-ID: 1494538647075-5961086.post@n3.nabble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Ilya Shkuratov wrote
> First of all, to such replacement to be valid, the CTE must be
> 1. non-writable (e.g. be of form: SELECT ...),
> 2. do not use VOLATILE or STABLE functions,
> 3. ... (maybe there must be more restrictions?)

What about simple things like this?

CREATE OR REPLACE FUNCTION z(numeric) RETURNS boolean AS $$
BEGIN
RETURN $1 <> 0;
END;
$$ LANGUAGE plpgSQL IMMUTABLE COST 1000;

-- This one works:
WITH T AS (
SELECT 1.0 AS v1, 0.0 AS v2
UNION ALL
SELECT 3.0, 1.0
UNION ALL
SELECT 2.0, 0.0
), a AS (
SELECT *
FROM t
WHERE z(v2)
)
SELECT *
FROM a
WHERE v1/v2 > 1.5;

-- This one gives 'division by zero':
WITH T AS (
SELECT 1.0 AS v1, 0.0 AS v2
UNION ALL
SELECT 3.0, 1.0
UNION ALL
SELECT 2.0, 0.0
)
SELECT *
FROM (
SELECT *
FROM t
WHERE z(v2)
) AS a
WHERE v1/v2 > 1.5;

-----
WBR, Yaroslav Schekin.
--
View this message in context: http://www.postgresql-archive.org/CTE-inlining-tp5958992p5961086.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


From: Andreas Joseph Krogh <andreas(at)visena(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: CTE inlining
Date: 2017-05-12 06:35:48
Message-ID: VisenaEmail.2c.135a04d5e341c1ff.15bfb5abae7@tc7-visena
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

På torsdag 11. mai 2017 kl. 23:37:27, skrev Yaroslav <ladayaroslav(at)yandex(dot)ru
<mailto:ladayaroslav(at)yandex(dot)ru>>:
Ilya Shkuratov wrote
> First of all, to such replacement to be valid, the CTE must be
>     1. non-writable (e.g. be of form: SELECT ...),
>     2. do not use VOLATILE or STABLE functions,
>     3. ... (maybe there must be more restrictions?)

What about simple things like this?

CREATE OR REPLACE FUNCTION z(numeric) RETURNS boolean AS $$
BEGIN
RETURN $1 <> 0;
END;
$$ LANGUAGE plpgSQL IMMUTABLE COST 1000;

-- This one works:
WITH T AS (
SELECT 1.0 AS v1, 0.0 AS v2
UNION ALL
SELECT 3.0, 1.0
UNION ALL
SELECT 2.0, 0.0
), a AS (
SELECT *
  FROM t
 WHERE z(v2)
)
SELECT *
  FROM a
 WHERE v1/v2 > 1.5;

-- This one gives 'division by zero':
WITH T AS (
SELECT 1.0 AS v1, 0.0 AS v2
UNION ALL
SELECT 3.0, 1.0
UNION ALL
SELECT 2.0, 0.0
)
SELECT *
  FROM (
       SELECT *
         FROM t
        WHERE z(v2)
       ) AS a
 WHERE v1/v2 > 1.5;
 
 
From a non-hacker;
Just to se what other RDBMS are doing with CTEs; Look at slide
31 here: https://www.percona.com/live/17/sites/default/files/slides/Recursive%20Query%20Throwdown.pdf
 
PG is not on top wrt. CTE, but could have been if CTEs were not this
"established" fence.
 
+1 for removing this fence and get all the possible optimization we can.

--
Andreas Joseph Krogh
 


From: Thomas Kellerer <spam_eater(at)gmx(dot)net>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: CTE inlining
Date: 2017-05-12 08:35:20
Message-ID: 1494578120116-5961164.post@n3.nabble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> Just to se what other RDBMS are doing with CTEs; Look at slide
> 31 here: https://www.percona.com/live/17/sites/default/files/slides/Recursive%20Query%20Throwdown.pdf

That is taken from Markus Winand's post:

https://twitter.com/MarkusWinand/status/852862475699707904

"Seems like MySQL is getting the best WITH support of all tested DBs
(RECURSIVE not tested!)"

--
View this message in context: http://www.postgresql-archive.org/CTE-inlining-tp5958992p5961164.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Ilya Shkuratov <motr(dot)ilya(at)ya(dot)ru>, Vik Fearing <vik(dot)fearing(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, David Fetter <david(at)fetter(dot)org>, Claudio Freire <klaussfreire(at)gmail(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Andreas Karlsson <andreas(at)proxel(dot)se>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: CTE inlining
Date: 2017-05-12 20:39:46
Message-ID: 20170512203946.GA31631@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, May 9, 2017 at 05:14:19PM -0400, Tom Lane wrote:
> Ilya Shkuratov <motr(dot)ilya(at)ya(dot)ru> writes:
> > Ok, it seems that most people in discussion are agree that removing optimization
> > fence is a right thing to do.
> > Nonetheless I still hoping to discuss the algorithm and its implementation.
>
> Yeah, so far we've mainly discussed whether to do that and how to control
> it, not what the actual results would be.

To summarize, it seems we have two options if we want to add fence
control to CTEs:

1. add INLINE to disable the CTE fence
2. add MATERIALIZE to enable the CTE fence

or some other keywords. I think most people prefer #2 because:

* most users writing queries prefer #2
* most users assume full optimization and it seems natural to turn
_off_ an optimization via a keyword
* while some queries can be inlined, all queries can be materialized,
so doing #1 means INLINE would be only a preference, which could be
confusing

Anyway, I am very glad we are considering addressing this in PG 11.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +


From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Ilya Shkuratov <motr(dot)ilya(at)ya(dot)ru>, Vik Fearing <vik(dot)fearing(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, David Fetter <david(at)fetter(dot)org>, Claudio Freire <klaussfreire(at)gmail(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Andreas Karlsson <andreas(at)proxel(dot)se>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: CTE inlining
Date: 2017-05-12 20:52:40
Message-ID: CAHyXU0yeMHBajnJr475U3fPPTihehST0MpotFRD8BoE9hNNENQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, May 12, 2017 at 3:39 PM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> On Tue, May 9, 2017 at 05:14:19PM -0400, Tom Lane wrote:
>> Ilya Shkuratov <motr(dot)ilya(at)ya(dot)ru> writes:
>> > Ok, it seems that most people in discussion are agree that removing optimization
>> > fence is a right thing to do.
>> > Nonetheless I still hoping to discuss the algorithm and its implementation.
>>
>> Yeah, so far we've mainly discussed whether to do that and how to control
>> it, not what the actual results would be.
>
> To summarize, it seems we have two options if we want to add fence
> control to CTEs:
>
> 1. add INLINE to disable the CTE fence
> 2. add MATERIALIZE to enable the CTE fence
>
> or some other keywords. I think most people prefer #2 because:
>
> * most users writing queries prefer #2

Yeah, I think there was rough consensus on this point. I think it's
fair to assume that most (or at least a majority) of queries will
benefit from inlining, people would want to opt out of, rather than
opt in to, generally good optimization strategies. This will hit
some in people today, but this is not a backwards compatibility issue
since performance is generally not really fairly described as
compatibility criteria. If this feature drops we ought to warn people
in the release notes though.

merlin


From: Adam Brusselback <adambrusselback(at)gmail(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Ilya Shkuratov <motr(dot)ilya(at)ya(dot)ru>, Vik Fearing <vik(dot)fearing(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, David Fetter <david(at)fetter(dot)org>, Claudio Freire <klaussfreire(at)gmail(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Andreas Karlsson <andreas(at)proxel(dot)se>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: CTE inlining
Date: 2017-05-15 15:57:20
Message-ID: CAMjNa7dvpNKmXCmKzsvBCuw03muyE2FCTZqtj2qZBE3ynrk96w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

From a user's perspective:

>I think most people prefer #2 because:

>
>* most users writing queries prefer #2

> >* most users assume full optimization and it seems natural to turn

> > _off_ an optimization via a keyword

> >* while some queries can be inlined, all queries can be materialized,

> > so doing #1 means INLINE would be only a preference, which could be

> > confusing

I completely agree with this reasoning. I have a few queries I would have
to touch to add "MATERIALIZED", but the vast majority of CTE's in my
codebase would get a speedup. It would allow usage of CTE's more freely
than now. I currently avoid them unless it really simplifies a query
because of the optimization fence.

Not that my opinion holds any weight, but the extra keyword for enabling
the optimization fence is my preference. By default trying to optimize
more is a good thing IMO.

>
>Anyway, I am very glad we are considering addressing this in PG 11.

Seconded, this is a sore spot for me when using Postgres, and i'd love to
not have it be an issue any more.

Thanks,
-Adam


From: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Ilya Shkuratov <motr(dot)ilya(at)ya(dot)ru>, Vik Fearing <vik(dot)fearing(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, David Fetter <david(at)fetter(dot)org>, Claudio Freire <klaussfreire(at)gmail(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Andreas Karlsson <andreas(at)proxel(dot)se>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: CTE inlining
Date: 2017-05-16 04:19:01
Message-ID: CAKJS1f8ma3icER32g8HM33Rp4mTp90Ucv8q0CkitY3s6EE20aw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 13 May 2017 at 08:39, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> To summarize, it seems we have two options if we want to add fence
> control to CTEs:
>
> 1. add INLINE to disable the CTE fence
> 2. add MATERIALIZE to enable the CTE fence

I think #1 is out of the question.

What would we do in cases like:

WITH INLINE cte AS (SELECT random() a)
SELECT * FROM cte UNION SELECT * FROM cte;

I assume we won't want to inline when the CTE query contains a
volatile function, and we certainly won't in cases like:

WITH INLINE cte AS (DELETE FROM a RETURNING *)
INSERT INTO b SELECT * from cte WHERE cte.value > 5;

We'd be certain to receive complaints from disgruntled users about
"Why is this not inlined when I specified INLINE?"

#2 does not suffer from that.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services