Re: CTE inlining

From: Gavin Flower <GavinFlower(at)archidevsys(dot)co(dot)nz>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, David Fetter <david(at)fetter(dot)org>
Cc: Claudio Freire <klaussfreire(at)gmail(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Andreas Karlsson <andreas(at)proxel(dot)se>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Mario Becroft <mb(at)true(dot)group>
Subject: Re: CTE inlining
Date: 2017-05-04 09:37:36
Message-ID: 070e1221-f4a6-dc84-3c72-b9ffa06f65ce@archidevsys.co.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 04/05/17 05:33, Alvaro Herrera wrote:
> David Fetter wrote:
>
>> When we add a "temporary" GUC, we're taking on a gigantic burden.
>> Either we support it forever somehow, or we put it on a deprecation
>> schedule immediately and expect to be answering questions about it for
>> years after it's been removed.
>>
>> -1 for the GUC.
>
> Absolutely.
>
> So ISTM we have three choices:
>
> 1) we switch unmarked CTEs as inlineable by default in pg11. What seems
> likely to happen for a user that upgrades to pg11 is that 5 out of 10
> CTE-using queries are going to become faster than with pg10, and they
> are going to be happy; 4 out of five are going to see no difference, but
> they didn't have to do anything about it; and the remaining query is
> going to become slower, either indistinguishably so (in which case they
> don't care and they remain happy because of the other improvements) or
> notably so, in which case they can easily figure where to add the
> MATERIALIZED option and regain the original performance.

+1 (Mario, proxied by me)

I've been asked to pass on comments by my colleague Mario of True Group
regarding the pain of the current behaviour of CTE's being optimisation
fences (as he doesn't normally follow this email group, I'll pass on any
feedback):

"Greetings,

Jumping onto the bandwagon here. At True Group in Auckland, we use
PostgreSQL as the platform for large-scale software development, often
with extensive and complex schemas, views and queries.

We frequently encounter poor query performance due to CTEs being an
optimisation barrier. We are forced to use workarounds such as a
set-returning function (parameterised view) in place of a view, and
manually placing the needed quals (WHERE clauses) into each CTE. This
creates headaches in large systems, where writing expressive SQL is
essential, rather than employing workarounds that make code harder to
understand and reuse.

Our general assumption is that we should write SQL that describes what
we want to achieve. The planner's job is to determine how to do that
efficiently.

There is an argument that pushing quals into CTEs could reduce
performance for some queries, especially if the qual is expensive. My
answer is twofold:

a) most queries will either experience no change or benefit from qual
push-down, as usually the expensive part is the query subtree, not the
top-level qual.

b) if a small proportion of queries are negatively affected, this is
better addressed by improving the planner's cost estimation. At worst,
an explicit OPTIMIZATION BARRIER hint could be added. But I know there
is much philosophical objection in the PostgreSQL community to planner
hints. The irony is that a main argument for retaining current CTE
behaviour is that people rely on CTEs as implicit barrier hints!

As an aside, we also encounter several other instances where qual
push-down fails, including where rewording a query in a way that is
semantically identical can change whether push-down takes place. But
probably the greatest bugbear is inability to push-down a qual into
more
than one subtree, meaning if a query has two or more subtrees, each of
which is expensive, but only one row is needed from each, determined by
one qual at the top level, one subtree will get the qual pushed into it
and run fast, while the others will do a full (expensive) table
scan and
be filtered only afterwards. There are also gains to be had in
improving
qual push-down in sub-selects with aggregates, and others. But I think
these issues are only due to lack of resource to implement, rather than
philosophical/political objections.

We would like to see a Postgres version in the future that does much
better planning in the areas I've mentioned, including but not limited
to the present issue of CTEs. Our organisation may be able to
contribute
materially to this, if the political will is there, and others affected
pitch in to achieve common goals. Not being expert in Postgres
internals, I am not certain how difficult each of the problems is.

I hope we have contributed usefully to the conversation, and invite
feedback.

Sincerely,
Mario"

Cheers,
Gavin

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2017-05-04 09:51:03 Re: delta relations in AFTER triggers
Previous Message Prabhat Sahu 2017-05-04 09:12:59 Re: delta relations in AFTER triggers