Re: Union within View vs.Union of Views

Lists: pgsql-performance
From: "Jeff Larsen" <jlar310(at)gmail(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Union within View vs.Union of Views
Date: 2007-11-03 20:22:18
Message-ID: d1f9b6f00711031322h1d2858d4xbfff4852941d405b@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Consider:

CREATE VIEW_X AS
SELECT <query A>
UNION ALL
SELECT <query B>
UNION ALL
SELECT <query C>;

versus

CREATE VIEW_A AS
SELECT <query A>;

CREATE VIEW_B AS
SELECT <query B>;

CREATE VIEW_C AS
SELECT <query B>;

where <query A>, <query B> and <query C> are each somewhat complex
with several joins, but utilizing different tables for each of A, B
and C.

Performance on

SELECT * from VIEW_X WHERE <conditions>;

was absolutely terrible. But performance on

SELECT * from VIEW_A WHERE <conditions>
UNION ALL
SELECT * from VIEW_B WHERE <conditions>
UNION ALL
SELECT * from VIEW_C WHERE <conditions>;

was nice and speedy, perhaps 100 times faster than the first.

If it's possible to consider this abstractly, is there any particular
reason why there is such a vast difference in performance? I would
guess that is has something to do with how the WHERE conditions are
applied to a view composed of a UNION of queries. Perhaps this is an
opportunity for improvement in the code. In the first case, it's as if
the server is doing the union on all rows (over 10 million altogether
in my case) without filtering, then applying the conditions to the
result. Maybe there is no better way.

I can post query plans if anyone is interested. I haven't really
learned how to make sense out of them myself yet.

For my purposes, I'm content to use the union of separate views in my
application, so if this doesn't pique anyone's interest, feel free to
ignore it.

Jeff


From: Mark Mielke <mark(at)mark(dot)mielke(dot)cc>
To: Jeff Larsen <jlar310(at)gmail(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Union within View vs.Union of Views
Date: 2007-11-03 21:39:21
Message-ID: 472CEA89.30808@mark.mielke.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Jeff Larsen wrote:
> If it's possible to consider this abstractly, is there any particular
> reason why there is such a vast difference in performance? I would
> guess that is has something to do with how the WHERE conditions are
> applied to a view composed of a UNION of queries. Perhaps this is an
> opportunity for improvement in the code. In the first case, it's as if
> the server is doing the union on all rows (over 10 million altogether
> in my case) without filtering, then applying the conditions to the
> result. Maybe there is no better way.
>
> I can post query plans if anyone is interested. I haven't really
> learned how to make sense out of them myself yet.
>
> For my purposes, I'm content to use the union of separate views in my
> application, so if this doesn't pique anyone's interest, feel free to
> ignore it.
>
I hit this as well in less impacting statements. I found myself curious
that the sub-plan would have to be executed in full before it applied
the filter. Perhaps PostgreSQL has difficulty pushing WHERE conditions
through the rule system? It's an area I only barely understand, so I
never looked further...

I'm interested, but do not have anything of value to provide either. :-)

Cheers,
mark

--
Mark Mielke <mark(at)mielke(dot)cc>


From: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
To: Jeff Larsen <jlar310(at)gmail(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Union within View vs.Union of Views
Date: 2007-11-03 23:03:39
Message-ID: 472CFE4B.5060606@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Jeff Larsen wrote:
> Performance on
>
> SELECT * from VIEW_X WHERE <conditions>;
>
> was absolutely terrible. But performance on
>
> SELECT * from VIEW_A WHERE <conditions>
> UNION ALL
> SELECT * from VIEW_B WHERE <conditions>
> UNION ALL
> SELECT * from VIEW_C WHERE <conditions>;
>
> was nice and speedy, perhaps 100 times faster than the first.
>
> If it's possible to consider this abstractly, is there any particular
> reason why there is such a vast difference in performance? I would
> guess that is has something to do with how the WHERE conditions are
> applied to a view composed of a UNION of queries. Perhaps this is an
> opportunity for improvement in the code. In the first case, it's as if
> the server is doing the union on all rows (over 10 million altogether
> in my case) without filtering, then applying the conditions to the
> result. Maybe there is no better way.

That's surprising. The planner knows how to push down WHERE conditions
to parts of a UNION ALL, and should be able to generate the same plan in
both cases. Maybe it's just estimating the costs differently? Did you
copy-paste all the conditions in the single WHERE clause of the slow
query to all the three WHERE clauses on the separate views? Even if some
of the clauses are not applicable, they might still affect the cost
estimates and lead to a worse plan.

> I can post query plans if anyone is interested. I haven't really
> learned how to make sense out of them myself yet.

Yes, please. Please post the SQL and schema as well if possible.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
Cc: Jeff Larsen <jlar310(at)gmail(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Union within View vs.Union of Views
Date: 2007-11-04 01:38:35
Message-ID: 3248.1194140315@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Heikki Linnakangas <heikki(at)enterprisedb(dot)com> writes:
> Jeff Larsen wrote:
>> If it's possible to consider this abstractly, is there any particular
>> reason why there is such a vast difference in performance?

> That's surprising. The planner knows how to push down WHERE conditions
> to parts of a UNION ALL, and should be able to generate the same plan in
> both cases.

There are a bunch of special cases where it can't do that, though.
Look into src/backend/optimizer/path/allpaths.c, particularly
subquery_is_pushdown_safe:

* Conditions checked here:
*
* 1. If the subquery has a LIMIT clause, we must not push down any quals,
* since that could change the set of rows returned.
*
* 2. If the subquery contains EXCEPT or EXCEPT ALL set ops we cannot push
* quals into it, because that would change the results.
*
* 3. For subqueries using UNION/UNION ALL/INTERSECT/INTERSECT ALL, we can
* push quals into each component query, but the quals can only reference
* subquery columns that suffer no type coercions in the set operation.
* Otherwise there are possible semantic gotchas. So, we check the
* component queries to see if any of them have different output types;
* differentTypes[k] is set true if column k has different type in any
* component.

and qual_is_pushdown_safe:

* Conditions checked here:
*
* 1. The qual must not contain any subselects (mainly because I'm not sure
* it will work correctly: sublinks will already have been transformed into
* subplans in the qual, but not in the subquery).
*
* 2. The qual must not refer to the whole-row output of the subquery
* (since there is no easy way to name that within the subquery itself).
*
* 3. The qual must not refer to any subquery output columns that were
* found to have inconsistent types across a set operation tree by
* subquery_is_pushdown_safe().
*
* 4. If the subquery uses DISTINCT ON, we must not push down any quals that
* refer to non-DISTINCT output columns, because that could change the set
* of rows returned. This condition is vacuous for DISTINCT, because then
* there are no non-DISTINCT output columns, but unfortunately it's fairly
* expensive to tell the difference between DISTINCT and DISTINCT ON in the
* parsetree representation. It's cheaper to just make sure all the Vars
* in the qual refer to DISTINCT columns.
*
* 5. We must not push down any quals that refer to subselect outputs that
* return sets, else we'd introduce functions-returning-sets into the
* subquery's WHERE/HAVING quals.
*
* 6. We must not push down any quals that refer to subselect outputs that
* contain volatile functions, for fear of introducing strange results due
* to multiple evaluation of a volatile function.

Idly looking at this, I'm suddenly wondering whether the prohibition on
pushing into an EXCEPT is necessary. If a qual eliminates rows from the
EXCEPT's output, can't we just eliminate those same rows from the
inputs?

regards, tom lane


From: "Jeff Larsen" <jlar310(at)gmail(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Union within View vs.Union of Views
Date: 2007-11-04 01:57:21
Message-ID: d1f9b6f00711031857s2972e92dgcedb209233f2485a@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

[[ Oops, still not used to gmail. Accidentally posted this straight to
Tom and not the list]]

> There are a bunch of special cases where it can't do that, though.
> Look into src/backend/optimizer/path/allpaths.c, particularly
> subquery_is_pushdown_safe:

My case probably fits the 'special case' description. Not all the
columns in each subquery matched up, so there were NULL::text
placeholders in some spots in the SELECT. In the case where
performance got bad, one of those columns was included in the
application's WHERE clause.

That's a good enough explanation for me. I'll spare you the gory
details of my tables, unless a developer wants to have a look at it
off-list.

Jeff


From: "Jeff Larsen" <jlar310(at)gmail(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Union within View vs.Union of Views
Date: 2007-11-04 10:11:02
Message-ID: d1f9b6f00711040211g752cbf2he3488cf62c622193@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On 11/3/07, Tom Lane wrote:
> "Jeff Larsen" <jlar310(at)gmail(dot)com> writes:
> > My case probably fits the 'special case' description. Not all the
> > columns in each subquery matched up, so there were NULL::text
> > placeholders in some spots in the SELECT. In the case where
> > performance got bad, one of those columns was included in the
> > application's WHERE clause.
>
> Please see if explicitly casting the nulls to the same datatype as the
> other items they're unioned with makes it go fast. It sounds like you
> are hitting the "no type coercions" restriction.

Sure enough, explicitly casting to exactly the same type for each
column did the trick. In fact the union within the view now has a
slight edge over the union of views.

Thanks,

Jeff