View performance

Lists: pgsql-performance
From: Bruno Wolff III <bruno(at)wolff(dot)to>
To: pgsql-performance(at)postgresql(dot)org
Subject: View performance
Date: 2002-12-24 20:16:38
Message-ID: 20021224201638.GA15882@wolff.to
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

I was looking at some queries that appeared to be slower than I remembered
them being under 7.2 (which may be a wrong perception) and noticed
that a view wasn't being handled very efficiently.

The view is security view that is used to hide some fields in some records
when displaying information on the web. The primary key is left alone
though. When this view is joined a plan is generated that applies
the field suppression for each row of the underlying table even though
only a few rows out of this view are going to be selected. It would see
that first looking for rows that will be used and only applying the
changes to rows that are going to be used would result in a significant
speed up.

The other thing that seemed odd is that the constant
(select pord from priv where pname = 'web') subqueries weren't pulled
out of the loop.

I was able to get a 20% speed up by adding an index on gameid to crate
and by disabling merge joins so that a has join was used instead.
The merge join estimate was about 20% low and the hash join estimate
was about 100% high resulting in the merge join getting picked.

View:
create view cname_web as select
areaid,
case when (select pord from priv where pname = 'web') >=
(select pord from priv where pname = privacy) then
lname else null end as lname,
case when (select pord from priv where pname = 'web') >=
(select pord from priv where pname = privacy) then
fmname else null end as fmname,
case when (select pord from priv where pname = 'web') >=
(select pord from priv where pname = privacy) then
aname else null end as aname,
case when (select pord from priv where pname = 'web') >=
(select pord from priv where pname = privacy) then
gen else null end as gen,
case when (select pord from priv where pname = 'web') >=
(select pord from priv where pname = privacy) then
genlab else null end as genlab,
case when (select pord from priv where pname = 'web') >=
(select pord from priv where pname = privacy) then
touched else null end as touched
from cname;

Query:

explain analyze select cname_web.areaid, lname, fmname, aname, coalesce(genlab, to_char(gen, 'FMRN')), rate, frq, opp, rmp, trn, to_char(crate.touched,'YYYY-MM-DD') from cname_web, crate where cname_web.areaid = crate.areaid and gameid = '776' and frq > 0 and crate.touched >= ((timestamp 'epoch' + '1040733601 second') + '2 year ago') order by rate desc, lower(lname), lower(coalesce((aname || ' ') || fmname, fmname, aname)), gen, genlab, cname_web.areaid;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=1308.35..1308.44 rows=39 width=203) (actual time=1152.67..1152.68 rows=25 loops=1)
Sort Key: crate.rate, lower(cname_web.lname), lower(CASE WHEN (((cname_web.aname || ' '::text) || cname_web.fmname) IS NOT NULL) THEN ((cname_web.aname || ' '::text) || cname_web.fmname) WHEN (cname_web.fmname IS NOT NULL) THEN cname_web.fmname WHEN (cname_web.aname IS NOT NULL) THEN cname_web.aname ELSE NULL::text END), cname_web.gen, cname_web.genlab, cname_web.areaid
-> Merge Join (cost=1270.71..1307.31 rows=39 width=203) (actual time=1120.23..1152.25 rows=25 loops=1)
Merge Cond: ("outer".areaid = "inner".areaid)
-> Sort (cost=681.95..699.97 rows=7208 width=63) (actual time=1079.55..1083.66 rows=7147 loops=1)
Sort Key: cname_web.areaid
-> Subquery Scan cname_web (cost=0.00..220.08 rows=7208 width=63) (actual time=0.40..843.48 rows=7208 loops=1)
-> Seq Scan on cname (cost=0.00..220.08 rows=7208 width=63) (actual time=0.40..818.24 rows=7208 loops=1)
InitPlan
-> Seq Scan on priv (cost=0.00..1.09 rows=1 width=4) (actual time=0.04..0.05 rows=1 loops=1)
Filter: (pname = 'web'::text)
-> Seq Scan on priv (cost=0.00..1.09 rows=1 width=4) (actual time=0.01..0.02 rows=1 loops=1)
Filter: (pname = 'web'::text)
-> Seq Scan on priv (cost=0.00..1.09 rows=1 width=4) (actual time=0.01..0.02 rows=1 loops=1)
Filter: (pname = 'web'::text)
-> Seq Scan on priv (cost=0.00..1.09 rows=1 width=4) (actual time=0.01..0.02 rows=1 loops=1)
Filter: (pname = 'web'::text)
-> Seq Scan on priv (cost=0.00..1.09 rows=1 width=4) (actual time=0.01..0.02 rows=1 loops=1)
Filter: (pname = 'web'::text)
-> Seq Scan on priv (cost=0.00..1.09 rows=1 width=4) (actual time=0.01..0.02 rows=1 loops=1)
Filter: (pname = 'web'::text)
SubPlan
-> Seq Scan on priv (cost=0.00..1.09 rows=1 width=4) (actual time=0.01..0.01 rows=1 loops=7208)
Filter: (pname = $1)
-> Seq Scan on priv (cost=0.00..1.09 rows=1 width=4) (actual time=0.01..0.01 rows=1 loops=7208)
Filter: (pname = $1)
-> Seq Scan on priv (cost=0.00..1.09 rows=1 width=4) (actual time=0.01..0.01 rows=1 loops=7208)
Filter: (pname = $1)
-> Seq Scan on priv (cost=0.00..1.09 rows=1 width=4) (actual time=0.01..0.01 rows=1 loops=7208)
Filter: (pname = $1)
-> Seq Scan on priv (cost=0.00..1.09 rows=1 width=4) (actual time=0.01..0.01 rows=1 loops=7208)
Filter: (pname = $1)
-> Seq Scan on priv (cost=0.00..1.09 rows=1 width=4) (actual time=0.01..0.01 rows=1 loops=7208)
Filter: (pname = $1)
-> Sort (cost=588.76..588.80 rows=16 width=39) (actual time=39.95..39.96 rows=25 loops=1)
Sort Key: crate.areaid
-> Seq Scan on crate (cost=0.00..588.45 rows=16 width=39) (actual time=3.14..39.58 rows=25 loops=1)
Filter: ((gameid = '776'::text) AND (frq > 0) AND (touched >= '2000-12-24 12:40:01'::timestamp without time zone))
Total runtime: 1155.29 msec
(39 rows)


From: Bruno Wolff III <bruno(at)wolff(dot)to>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: View performance
Date: 2002-12-24 21:06:37
Message-ID: 20021224210637.GA16085@wolff.to
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

As a followup to this I rewrote the view as:
create view cname_web as select
a.areaid, b.lname, b.fmname, b.aname, b.gen, b.genlab, b.touched
from cname a left join
(select areaid, lname, fmname, aname, gen, genlab, touched, privacy
from cname, priv
where pname = privacy and
pord <= (select pord from priv where pname = 'web')
) b
using (areaid);

And got the query down to about half the original time as shown here:
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=970.71..970.74 rows=15 width=113) (actual time=550.82..550.83 rows=25 loops=1)
Sort Key: crate.rate, lower(cname.lname), lower(CASE WHEN (((cname.aname || ' '::text) || cname.fmname) IS NOT NULL) THEN ((cname.aname || ' '::text) || cname.fmname) WHEN (cname.fmname IS NOT NULL) THEN cname.fmname WHEN (cname.aname IS NOT NULL) THEN cname.aname ELSE NULL::text END), cname.gen, cname.genlab, a.areaid
InitPlan
-> Seq Scan on priv (cost=0.00..1.09 rows=1 width=4) (actual time=0.02..0.03 rows=1 loops=1)
Filter: (pname = 'web'::text)
-> Merge Join (cost=484.88..970.41 rows=15 width=113) (actual time=361.92..550.53 rows=25 loops=1)
Merge Cond: ("outer".areaid = "inner".areaid)
-> Merge Join (cost=348.16..815.45 rows=7208 width=74) (actual time=358.29..520.50 rows=7147 loops=1)
Merge Cond: ("outer".areaid = "inner".areaid)
-> Index Scan using cname_pkey on cname a (cost=0.00..407.27 rows=7208 width=11) (actual time=0.03..26.59 rows=7147 loops=1)
-> Sort (cost=348.16..354.17 rows=2403 width=63) (actual time=358.20..362.38 rows=7141 loops=1)
Sort Key: cname.areaid
-> Hash Join (cost=1.09..213.25 rows=2403 width=63) (actual time=0.35..94.32 rows=7202 loops=1)
Hash Cond: ("outer".privacy = "inner".pname)
-> Seq Scan on cname (cost=0.00..146.08 rows=7208 width=55) (actual time=0.01..33.41 rows=7208 loops=1)
-> Hash (cost=1.09..1.09 rows=2 width=8) (actual time=0.07..0.07 rows=0 loops=1)
-> Seq Scan on priv (cost=0.00..1.09 rows=2 width=8) (actual time=0.06..0.07 rows=2 loops=1)
Filter: (pord <= $0)
-> Sort (cost=136.72..136.76 rows=15 width=39) (actual time=0.95..0.96 rows=25 loops=1)
Sort Key: crate.areaid
-> Index Scan using crate_game on crate (cost=0.00..136.42 rows=15 width=39) (actual time=0.10..0.67 rows=25 loops=1)
Index Cond: (gameid = '776'::text)
Filter: ((frq > 0) AND (touched >= '2000-12-24 12:40:01'::timestamp without time zone))
Total runtime: 553.17 msec
(24 rows)

On Tue, Dec 24, 2002 at 14:16:38 -0600,
Bruno Wolff III <bruno(at)wolff(dot)to> wrote:
>
> View:
> create view cname_web as select
> areaid,
> case when (select pord from priv where pname = 'web') >=
> (select pord from priv where pname = privacy) then
> lname else null end as lname,
> case when (select pord from priv where pname = 'web') >=
> (select pord from priv where pname = privacy) then
> fmname else null end as fmname,
> case when (select pord from priv where pname = 'web') >=
> (select pord from priv where pname = privacy) then
> aname else null end as aname,
> case when (select pord from priv where pname = 'web') >=
> (select pord from priv where pname = privacy) then
> gen else null end as gen,
> case when (select pord from priv where pname = 'web') >=
> (select pord from priv where pname = privacy) then
> genlab else null end as genlab,
> case when (select pord from priv where pname = 'web') >=
> (select pord from priv where pname = privacy) then
> touched else null end as touched
> from cname;
>
> Query:
>
> explain analyze select cname_web.areaid, lname, fmname, aname, coalesce(genlab, to_char(gen, 'FMRN')), rate, frq, opp, rmp, trn, to_char(crate.touched,'YYYY-MM-DD') from cname_web, crate where cname_web.areaid = crate.areaid and gameid = '776' and frq > 0 and crate.touched >= ((timestamp 'epoch' + '1040733601 second') + '2 year ago') order by rate desc, lower(lname), lower(coalesce((aname || ' ') || fmname, fmname, aname)), gen, genlab, cname_web.areaid;


From: Bruno Wolff III <bruno(at)wolff(dot)to>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: View performance
Date: 2002-12-24 21:25:56
Message-ID: 20021224212556.GA16170@wolff.to
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

By disabling merge joins and using the updated view, I got the query down
to about 25% of its original runtime.
Note the query estimate is off by a factor of more than 10.
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=3271.35..3271.39 rows=15 width=113) (actual time=232.25..232.27 rows=25 loops=1)
Sort Key: crate.rate, lower(cname.lname), lower(CASE WHEN (((cname.aname || ' '::text) || cname.fmname) IS NOT NULL) THEN ((cname.aname || ' '::text) || cname.fmname) WHEN (cname.fmname IS NOT NULL) THEN cname.fmname WHEN (cname.aname IS NOT NULL) THEN cname.aname ELSE NULL::text END), cname.gen, cname.genlab, a.areaid
InitPlan
-> Seq Scan on priv (cost=0.00..1.09 rows=1 width=4) (actual time=0.02..0.03 rows=1 loops=1)
Filter: (pname = 'web'::text)
-> Hash Join (cost=355.71..3271.05 rows=15 width=113) (actual time=106.82..231.97 rows=25 loops=1)
Hash Cond: ("outer".areaid = "inner".areaid)
-> Hash Join (cost=219.25..431.41 rows=7208 width=74) (actual time=103.86..222.00 rows=7208 loops=1)
Hash Cond: ("outer".areaid = "inner".areaid)
-> Seq Scan on cname a (cost=0.00..146.08 rows=7208 width=11) (actual time=0.01..16.23 rows=7208 loops=1)
-> Hash (cost=213.25..213.25 rows=2403 width=63) (actual time=103.70..103.70 rows=0 loops=1)
-> Hash Join (cost=1.09..213.25 rows=2403 width=63) (actual time=0.35..88.82 rows=7202 loops=1)
Hash Cond: ("outer".privacy = "inner".pname)
-> Seq Scan on cname (cost=0.00..146.08 rows=7208 width=55) (actual time=0.01..29.73 rows=7208 loops=1)
-> Hash (cost=1.09..1.09 rows=2 width=8) (actual time=0.07..0.07 rows=0 loops=1)
-> Seq Scan on priv (cost=0.00..1.09 rows=2 width=8) (actual time=0.06..0.07 rows=2 loops=1)
Filter: (pord <= $0)
-> Hash (cost=136.42..136.42 rows=15 width=39) (actual time=0.72..0.72 rows=0 loops=1)
-> Index Scan using crate_game on crate (cost=0.00..136.42 rows=15 width=39) (actual time=0.10..0.66 rows=25 loops=1)
Index Cond: (gameid = '776'::text)
Filter: ((frq > 0) AND (touched >= '2000-12-24 12:40:01'::timestamp without time zone))
Total runtime: 232.83 msec
(22 rows)


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruno Wolff III <bruno(at)wolff(dot)to>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: View performance
Date: 2002-12-26 19:42:39
Message-ID: 2666.1040931759@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Bruno Wolff III <bruno(at)wolff(dot)to> writes:
> I was looking at some queries that appeared to be slower than I remembered
> them being under 7.2 (which may be a wrong perception) and noticed
> that a view wasn't being handled very efficiently.

The change in behavior from 7.2 is probably due to this patch:

2002-12-05 16:46 tgl

* src/backend/optimizer/plan/planner.c (REL7_3_STABLE): Avoid
pulling up sublinks from a subselect's targetlist. Works around
problems that occur if sublink is referenced via a join alias
variable. Perhaps this can be improved later, but a simple and
safe fix is needed for 7.3.1.

which means that views using subselects in their targetlists will not be
flattened into the calling query in 7.3.1. This is not real desirable,
but I see no other short-term fix.

In the particular case, your view definition seemed mighty inefficient
anyway (it must recompute the subselects for each column retrieved from
the view) so I think your rewrite is a good change.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruno Wolff III <bruno(at)wolff(dot)to>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: View performance
Date: 2002-12-26 19:50:14
Message-ID: 2713.1040932214@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Bruno Wolff III <bruno(at)wolff(dot)to> writes:
> By disabling merge joins and using the updated view, I got the query down
> to about 25% of its original runtime.
> Note the query estimate is off by a factor of more than 10.

This seems to indicate some estimation problems in cost_hashjoin; the
estimated cost for the hashjoin is evidently a lot higher than it should
be.

Are you interested in digging into this; or could you send me a dump of
the tables used in the view and query, so I could look into it?

regards, tom lane


From: Bruno Wolff III <bruno(at)wolff(dot)to>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: View performance
Date: 2002-12-26 20:36:56
Message-ID: 20021226203656.GB22984@wolff.to
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Thu, Dec 26, 2002 at 14:42:39 -0500,
Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> which means that views using subselects in their targetlists will not be
> flattened into the calling query in 7.3.1. This is not real desirable,
> but I see no other short-term fix.

Thanks for the explaination.

> In the particular case, your view definition seemed mighty inefficient
> anyway (it must recompute the subselects for each column retrieved from
> the view) so I think your rewrite is a good change.

I was naively expecting that the planner would notice the common subexpressions
and only compute them once.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruno Wolff III <bruno(at)wolff(dot)to>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: View performance
Date: 2002-12-26 20:45:55
Message-ID: 4244.1040935555@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Bruno Wolff III <bruno(at)wolff(dot)to> writes:
> I was naively expecting that the planner would notice the common
> subexpressions and only compute them once.

There isn't currently any code for detection of common subexpressions of
any kind.

My gut feeling is that searching for common subexpressions would be a
net waste of cycles in the vast majority of queries. It'd be fairly
expensive (a naive implementation would be roughly O(N^2) in the number
of expression nodes), with zero payback in very many cases.

It might be worth doing for very constrained classes of subexpressions.
For instance, I was just thinking about putting in some code to
recognize duplicate aggregates (eg, "sum(foo)" appearing twice in the
same query). nodeAgg.c could do this relatively cheaply, since it has
to make a list of the aggregate expressions to be computed, anyway.
I'm not sure about recognizing duplicated sub-SELECT expressions; it
could possibly be done but some thought would have to be given to
preserving semantics.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruno Wolff III <bruno(at)wolff(dot)to>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: View performance
Date: 2002-12-26 23:43:39
Message-ID: 20650.1040946219@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

I wrote:
>> This seems to indicate some estimation problems in cost_hashjoin; the
>> estimated cost for the hashjoin is evidently a lot higher than it should
>> be.

The answer is that estimate_hash_bucketsize() is producing a rather
silly result in this situation, viz. a bucketsize "fraction" that's well
above 1.0. I've applied the following band-aid patch to CVS tip, which
perhaps you might like to use locally. But probably the long-range
answer is to rethink what that routine is doing --- its adjustment for
skewed data distributions is perhaps not such a great idea.

regards, tom lane

*** src/backend/optimizer/path/costsize.c.orig Fri Dec 13 19:17:55 2002
--- src/backend/optimizer/path/costsize.c Thu Dec 26 18:34:02 2002
***************
*** 1164,1169 ****
--- 1164,1179 ----
if (avgfreq > 0.0 && mcvfreq > avgfreq)
estfract *= mcvfreq / avgfreq;

+ /*
+ * Clamp bucketsize to sane range (the above adjustment could easily
+ * produce an out-of-range result). We set the lower bound a little
+ * above zero, since zero isn't a very sane result.
+ */
+ if (estfract < 1.0e-6)
+ estfract = 1.0e-6;
+ else if (estfract > 1.0)
+ estfract = 1.0;
+
ReleaseSysCache(tuple);

return (Selectivity) estfract;