Re: Guide to PG's capabilities for inlining, predicate hoisting, flattening, etc?

Lists: pgsql-performance
From: Jay Levitt <jay(dot)levitt(at)gmail(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Guide to PG's capabilities for inlining, predicate hoisting, flattening, etc?
Date: 2011-11-02 14:22:21
Message-ID: CANSg5mqt+7j1SU2Gu2tZ_c_vdzCJPkymxEqaQ-OjRXfz=Pfimg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

As I've come up to speed on SQL and PostgreSQL with some
medium-complexity queries, I've asked a few questions about what the
optimizer will do in various situations. I'm not talking about the
seq-scan-vs-index type of optimizing; I mean "transforming within the
relational calculus (algebra?) to an equivalent but more performant
query". The same topics come up:

- Flattening. I think that means "Merge the intent of the subquery
into the various clauses of the parent query".

- Inlining. That's "Don't run this function/subquery/view as an atomic
unit; instead, push it up into the parent query so the optimizer can
see it all at once." Maybe that's the same as flattening.

- Predicate pushdown. That's "This subquery produces a lot of rows,
but the parent query has a WHERE clause that will eliminate half of
them, so don't produce the unnecessary rows."

Am I right so far? Now, the big question, which I haven't seen
documented anywhere: Under what circumstances can the optimizer do
each of these things?

For instance, I have a complex query that calculates the similarity of
one user to every other user. The output is two columns, one row per
user:

select * from similarity(my_user_id);

other_user | similarity%
-----------|-------------
123 | 99

Being a novice at SQL, I first wrote it in PL/pgSQL, so I could stay
in my imperative, iterative head. The query performed decently well
when scanning the whole table, but when I only wanted to compare
myself to a single user, I said:

select * from similarity(my_user_id) as s where s.other_user = 321;

And, of course, similarity() produced the whole table anyway, because
predicates don't get pushed down into PL/pgSQL functions.

So I went and rewrote similarity as a SQL function, but I still didn't
want one big hairy SQL query. Ah ha! CTEs let you write modular
subqueries, and you also avoid problems with lack of LATERAL. I'll use
those.

.. But of course predicates don't get pushed into CTEs, either. (Or
maybe it was that they would, but only if they were inline with the
predicate.. I forget now.)

So you can see where I'm going. I know if I break everything into
elegant, composable functions, it'll continue to perform poorly. If I
write one big hairy, it'll perform great but it will be difficult to
maintain, and it will be inelegant and a kitten will die. My tools
are CTEs, subqueries, aliases, SQL functions, PL/pgSQL functions, and
views (and other tools?) What optimizations do each of those prevent?

We're on 9.0 now but will happily upgrade to 9.1 if that matters.

Jay Levitt


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jay Levitt <jay(dot)levitt(at)gmail(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Guide to PG's capabilities for inlining, predicate hoisting, flattening, etc?
Date: 2011-11-02 14:38:39
Message-ID: 29918.1320244719@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Jay Levitt <jay(dot)levitt(at)gmail(dot)com> writes:
> So you can see where I'm going. I know if I break everything into
> elegant, composable functions, it'll continue to perform poorly. If I
> write one big hairy, it'll perform great but it will be difficult to
> maintain, and it will be inelegant and a kitten will die. My tools
> are CTEs, subqueries, aliases, SQL functions, PL/pgSQL functions, and
> views (and other tools?) What optimizations do each of those prevent?

plpgsql functions are black boxes to the optimizer. If you can express
your functions as single SQL commands, using SQL-language functions is
usually a better bet than plpgsql.

CTEs are also treated as optimization fences; this is not so much an
optimizer limitation as to keep the semantics sane when the CTE contains
a writable query.

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jay Levitt <jay(dot)levitt(at)gmail(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Guide to PG's capabilities for inlining, predicate hoisting, flattening, etc?
Date: 2011-11-02 15:13:09
Message-ID: CA+TgmoY_a34rAWCxBaA+3RtSv9tBo0PaRyxnNH4-rtJ5XkBQdA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Wed, Nov 2, 2011 at 10:38 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Jay Levitt <jay(dot)levitt(at)gmail(dot)com> writes:
>> So you can see where I'm going.  I know if I break everything into
>> elegant, composable functions, it'll continue to perform poorly.  If I
>> write one big hairy, it'll perform great but it will be difficult to
>> maintain, and it will be inelegant and a kitten will die.  My tools
>> are CTEs, subqueries, aliases, SQL functions, PL/pgSQL functions, and
>> views (and other tools?)  What optimizations do each of those prevent?
>
> plpgsql functions are black boxes to the optimizer.  If you can express
> your functions as single SQL commands, using SQL-language functions is
> usually a better bet than plpgsql.
>
> CTEs are also treated as optimization fences; this is not so much an
> optimizer limitation as to keep the semantics sane when the CTE contains
> a writable query.

I wonder if we need to rethink, though. We've gotten a number of
reports of problems that were caused by single-use CTEs not being
equivalent - in terms of performance - to a non-CTE formulation of the
same idea. It seems necessary for CTEs to behave this way when the
subquery modifies data, and there are certainly situations where it
could be desirable otherwise, but I'm starting to think that we
shouldn't do it that way by default. Perhaps we could let people say
something like WITH x AS FENCE (...) when they want the fencing
behavior, and otherwise assume they don't (but give it to them anyway
if there's a data-modifying operation in there).

Whenever I give a talk on the query optimizer, I'm constantly telling
people to take logic out of functions and inline it, avoid CTEs, and
generally merge everything into one big query. But as the OP says,
that is decidedly less than ideal from a code-beauty-and-maintenance
point of view: people WANT to be able to use syntactic sugar and still
get good performance. Allowing for the insertion of optimization
fences is good and important but it needs to be user-controllable
behavior.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Justin Pitts <justinpitts(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jay Levitt <jay(dot)levitt(at)gmail(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Guide to PG's capabilities for inlining, predicate hoisting, flattening, etc?
Date: 2011-11-02 16:41:56
Message-ID: CAMYZu8UHu-zz3-2_C0vtdqwDiWMgsWQjuVPSvNYR5Sxuu-Me+g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Wed, Nov 2, 2011 at 11:13 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> […] Perhaps we could let people say
> something like WITH x AS FENCE (...) when they want the fencing
> behavior, and otherwise assume they don't (but give it to them anyway
> if there's a data-modifying operation in there).
>

I would love to be able to test some of our CTE queries in such a scenario.

None of them do data modification. How hard would it be to patch my own
build to disable the fence unilaterally for testing purposes?


From: Andres Freund <andres(at)anarazel(dot)de>
To: pgsql-performance(at)postgresql(dot)org
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jay Levitt <jay(dot)levitt(at)gmail(dot)com>
Subject: Re: Guide to PG's capabilities for inlining, predicate hoisting, flattening, etc?
Date: 2011-11-02 17:13:06
Message-ID: 201111021813.06401.andres@anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Wednesday 02 Nov 2011 16:13:09 Robert Haas wrote:
> On Wed, Nov 2, 2011 at 10:38 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > Jay Levitt <jay(dot)levitt(at)gmail(dot)com> writes:
> >> So you can see where I'm going. I know if I break everything into
> >> elegant, composable functions, it'll continue to perform poorly. If I
> >> write one big hairy, it'll perform great but it will be difficult to
> >> maintain, and it will be inelegant and a kitten will die. My tools
> >> are CTEs, subqueries, aliases, SQL functions, PL/pgSQL functions, and
> >> views (and other tools?) What optimizations do each of those prevent?
> >
> > plpgsql functions are black boxes to the optimizer. If you can express
> > your functions as single SQL commands, using SQL-language functions is
> > usually a better bet than plpgsql.
> >
> > CTEs are also treated as optimization fences; this is not so much an
> > optimizer limitation as to keep the semantics sane when the CTE contains
> > a writable query.
>
> I wonder if we need to rethink, though. We've gotten a number of
> reports of problems that were caused by single-use CTEs not being
> equivalent - in terms of performance - to a non-CTE formulation of the
> same idea. It seems necessary for CTEs to behave this way when the
> subquery modifies data, and there are certainly situations where it
> could be desirable otherwise, but I'm starting to think that we
> shouldn't do it that way by default. Perhaps we could let people say
> something like WITH x AS FENCE (...) when they want the fencing
> behavior, and otherwise assume they don't (but give it to them anyway
> if there's a data-modifying operation in there).
+1. I avoid writing CTEs in many cases where they would be very useful just
for that reasons.
I don't even think some future inlining necessarily has to be restricted to
one-use cases only...

+1 for making fencing behaviour as well. Currently there is no real explicit
method to specify this which is necessarily future proof (WITH, OFFSET 0)...

Andres


From: Claudio Freire <klaussfreire(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jay Levitt <jay(dot)levitt(at)gmail(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Guide to PG's capabilities for inlining, predicate hoisting, flattening, etc?
Date: 2011-11-02 17:22:05
Message-ID: CAGTBQpahSrrddk1T9UPG6rkDWW6mzB_Hn2gJ=fyp1gbq1Mrt_w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Wed, Nov 2, 2011 at 12:13 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> I wonder if we need to rethink, though.  We've gotten a number of
> reports of problems that were caused by single-use CTEs not being
> equivalent - in terms of performance - to a non-CTE formulation of the
> same idea.  It seems necessary for CTEs to behave this way when the
> subquery modifies data, and there are certainly situations where it
> could be desirable otherwise, but I'm starting to think that we
> shouldn't do it that way by default.  Perhaps we could let people say
> something like WITH x AS FENCE (...) when they want the fencing
> behavior, and otherwise assume they don't (but give it to them anyway
> if there's a data-modifying operation in there).

Well, in my case, I got performance thanks to CTEs *being*
optimization fences, letting me fiddle with query execution.

And I mean, going from half-hour queries to 1-minute queries.

It is certainly desirable to maintain the possibility to use fences when needed.


From: Craig James <craig_james(at)emolecules(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Guide to PG's capabilities for inlining, predicate hoisting, flattening, etc?
Date: 2011-11-02 19:39:36
Message-ID: 4EB19C78.8030903@emolecules.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On 11/2/11 10:22 AM, Claudio Freire wrote:
> On Wed, Nov 2, 2011 at 12:13 PM, Robert Haas<robertmhaas(at)gmail(dot)com> wrote:
>> I wonder if we need to rethink, though. We've gotten a number of
>> reports of problems that were caused by single-use CTEs not being
>> equivalent - in terms of performance - to a non-CTE formulation of the
>> same idea. It seems necessary for CTEs to behave this way when the
>> subquery modifies data, and there are certainly situations where it
>> could be desirable otherwise, but I'm starting to think that we
>> shouldn't do it that way by default. Perhaps we could let people say
>> something like WITH x AS FENCE (...) when they want the fencing
>> behavior, and otherwise assume they don't (but give it to them anyway
>> if there's a data-modifying operation in there).
> Well, in my case, I got performance thanks to CTEs *being*
> optimization fences, letting me fiddle with query execution.
>
> And I mean, going from half-hour queries to 1-minute queries.
Same here. It was a case where I asked this group and was told that putting an "offset 0" fence in was probably the only way to solve it (once again reminding us that Postgres actually does have hints ... they're just called other things).
> It is certainly desirable to maintain the possibility to use fences when needed.
Indeed. Optimizer problems are usually fixed in due course, but these "fences" are invaluable when you have a dead web site that has to be fixed right now.

Craig


From: "Igor Neyman" <ineyman(at)perceptron(dot)com>
To: "Robert Haas" <robertmhaas(at)gmail(dot)com>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Jay Levitt" <jay(dot)levitt(at)gmail(dot)com>, <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Guide to PG's capabilities for inlining, predicate hoisting, flattening, etc?
Date: 2011-11-02 20:22:36
Message-ID: F4C27E77F7A33E4CA98C19A9DC6722A2084DFFF7@EXCHANGE.corp.perceptron.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

> -----Original Message-----
> From: Robert Haas [mailto:robertmhaas(at)gmail(dot)com]
> Sent: Wednesday, November 02, 2011 11:13 AM
> To: Tom Lane
> Cc: Jay Levitt; pgsql-performance(at)postgresql(dot)org
> Subject: Re: Guide to PG's capabilities for inlining, predicate
> hoisting, flattening, etc?
> .......
> .......
> Perhaps we could let people say
> something like WITH x AS FENCE (...) when they want the fencing
> behavior, and otherwise assume they don't (but give it to them anyway
> if there's a data-modifying operation in there).
>
> ....
> ....
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company

Hints.... here we come :)


From: Gavin Flower <GavinFlower(at)archidevsys(dot)co(dot)nz>
To: Igor Neyman <ineyman(at)perceptron(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jay Levitt <jay(dot)levitt(at)gmail(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Guide to PG's capabilities for inlining, predicate hoisting, flattening, etc?
Date: 2011-11-03 00:17:49
Message-ID: 4EB1DDAD.9000407@archidevsys.co.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On 03/11/11 09:22, Igor Neyman wrote:
>
>> -----Original Message-----
>> From: Robert Haas [mailto:robertmhaas(at)gmail(dot)com]
>> Sent: Wednesday, November 02, 2011 11:13 AM
>> To: Tom Lane
>> Cc: Jay Levitt; pgsql-performance(at)postgresql(dot)org
>> Subject: Re: Guide to PG's capabilities for inlining, predicate
>> hoisting, flattening, etc?
>> .......
>> .......
>> Perhaps we could let people say
>> something like WITH x AS FENCE (...) when they want the fencing
>> behavior, and otherwise assume they don't (but give it to them anyway
>> if there's a data-modifying operation in there).
>>
>> ....
>> ....
>> --
>> Robert Haas
>> EnterpriseDB: http://www.enterprisedb.com
>> The Enterprise PostgreSQL Company
>
> Hints.... here we come :)
>
Is that a hint???

[Sorry, my perverse sense of humour kicked in]

I too would like CTE's to take part in optimisation - as I don't like
the mass slaughter of kittens, but I still want to pander to my speed
addiction.

So I think that having some sort of fence mechanism would be good.

Cheers,
Gavin


From: Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>
To: Igor Neyman <ineyman(at)perceptron(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jay Levitt <jay(dot)levitt(at)gmail(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Guide to PG's capabilities for inlining, predicate hoisting, flattening, etc?
Date: 2011-11-03 09:07:21
Message-ID: 4EB259C9.5020102@ringerc.id.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On 11/03/2011 04:22 AM, Igor Neyman wrote:

> Hints.... here we come :)

Pfft! No more than `VOLATILE' vs `STABLE' vs `IMMUTABLE'. It's a
semantic difference, not just a performance hint.

That said, I'm not actually against performance hints if done sensibly.

--
Craig Ringer


From: "Igor Neyman" <ineyman(at)perceptron(dot)com>
To: "Craig Ringer" <ringerc(at)ringerc(dot)id(dot)au>
Cc: "Robert Haas" <robertmhaas(at)gmail(dot)com>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Jay Levitt" <jay(dot)levitt(at)gmail(dot)com>, <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Guide to PG's capabilities for inlining, predicate hoisting, flattening, etc?
Date: 2011-11-03 13:16:10
Message-ID: F4C27E77F7A33E4CA98C19A9DC6722A2084E0203@EXCHANGE.corp.perceptron.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

> -----Original Message-----
> From: Craig Ringer [mailto:ringerc(at)ringerc(dot)id(dot)au]
> Sent: Thursday, November 03, 2011 5:07 AM
> To: Igor Neyman
> Cc: Robert Haas; Tom Lane; Jay Levitt;
pgsql-performance(at)postgresql(dot)org
> Subject: Re: [PERFORM] Guide to PG's capabilities for inlining,
> predicate hoisting, flattening, etc?
>
> On 11/03/2011 04:22 AM, Igor Neyman wrote:
>
> That said, I'm not actually against performance hints if done
sensibly.
>
> --
> Craig Ringer
>

> ...sensibly
As it is with any other feature...

Igor Neyman