Re: Poor performance when using a window function in a view

From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Chris Hanks <christopher(dot)m(dot)hanks(at)gmail(dot)com>
Cc: rod(at)iol(dot)ie, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-general(at)postgresql(dot)org
Subject: Re: Poor performance when using a window function in a view
Date: 2013-03-01 14:37:19
Message-ID: CAHyXU0y_uB=CMXO1hO8t8HgJE3=s+aWtt2sD7-VNpBXF-q2EmQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Fri, Mar 1, 2013 at 3:59 AM, Chris Hanks
<christopher(dot)m(dot)hanks(at)gmail(dot)com> wrote:
> On Fri, Mar 1, 2013 at 1:21 AM, Raymond O'Donnell <rod(at)iol(dot)ie> wrote:
>>
>> On 01/03/2013 00:19, Chris Hanks wrote:
>> > On Thu, Feb 28, 2013 at 6:00 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us
>> > <mailto:tgl(at)sss(dot)pgh(dot)pa(dot)us>> wrote:
>> >
>> > Chris Hanks <christopher(dot)m(dot)hanks(at)gmail(dot)com
>> > <mailto:christopher(dot)m(dot)hanks(at)gmail(dot)com>> writes:
>> > > create or replace view values_view as
>> > > select fkey1, fkey3,
>> > > (derived1 / max(derived1) over (partition by fkey1)) as
>> > derived1,
>> > > (derived2 / sum(derived1) over (partition by fkey1)) as derived2
>> > > from (
>> > > select fkey1, fkey3,
>> > > cast(sum((case when (value > 0.0) then 4 else 1 end)) as
>> > double
>> > > precision) as derived1,
>> > > sum((case when (value > 0.0) then (value * 4) else (value + 1)
>> > end)) as
>> > > derived2
>> > > from values
>> > > group by fkey1, fkey3
>> > > ) as t1;
>> >
>> > > -- This query requires a sequential scan on values, though all the
>> > data it
>> > > needs could be found much more efficiently with an index scan.
>> > > explain analyze select * from values_view where fkey1 = 1263;
>> >
>> > To use the outer WHERE clause as an index constraint, postgres would
>> > have to prove that scanning only the rows with fkey1 = 1263 would
>> > still
>> > find all the rows that would get examined by the window functions
>> > ---
>> > and in this case, it's not only the window functions that make that
>> > less
>> > than obvious, but the grouped aggregates in the sub-select below
>> > them.
>> > There's not nearly that amount of intelligence in the system about
>> > window functions, as yet. So you'll have to write out the query
>> > longhand and put the WHERE clause at the lower level, if you want
>> > this
>> > optimization to happen.
>> >
>> > regards, tom lane
>> >
>> >
>> > Ok, that makes sense, thanks.
>> >
>> > Can anyone point me to an example of wrapping a function in a view, like
>> > Merlin suggested? I'm not sure how that would work.
>>
>> Off the top of my head, I'd imagine it's as simple as:
>>
>> create view ... as
>> select * from my_function(...);
>>
>> :-)
>>
>> Ray.
>>
>>
>> --
>> Raymond O'Donnell :: Galway :: Ireland
>> rod(at)iol(dot)ie
>
>
>
> Sorry, I don't understand. I'm able to make a function that takes an integer
> and uses it in the subselect as "WHERE fkey1 = arg", and that works as I
> expect it to and it's plenty fast. But I don't see how to write a view to
> take advantage of this function - what arguments would go in
> my_function(...) when I'm declaring the view?

First let's clearly state the problem - create some test data:
postgres=# create table foo as select v1, v2 from (select
generate_series(1,10) v1) q1, (select generate_series(1,10000) v2) q2;
postgres=# create table bar as select v1 from generate_series(1,10) v1;

foo is data table, bar defines the key around which we have the data.

create an index:
postgres=# create index on foo(v1);

simple lookup is optimized obviously:
postgres=# explain select * from foo where v1 = 7;
QUERY PLAN
----------------------------------------------------------------------
Index Scan using foo_v1_idx on foo (cost=0.00..4.38 rows=1 width=8)
Index Cond: (v1 = 50)

get indexed lookup for simple window function:
postgres=# explain select v1, sum(v2) over(partition by v1 order by
v2) from foo where v1 = 7;
QUERY PLAN
----------------------------------------------------------------------------------
WindowAgg (cost=4.39..4.41 rows=1 width=8)
-> Sort (cost=4.39..4.39 rows=1 width=8)
Sort Key: v2
-> Index Scan using foo_v1_idx on foo (cost=0.00..4.38
rows=1 width=8)
Index Cond: (v1 = 50)

push to subquery (essentially what view does) and we lose the optimization:
postgres=# explain select * from (select v1, sum(v2) over(partition by
v1 order by v2) from foo) q where v1 = 7;
QUERY PLAN
-----------------------------------------------------------------------------
Subquery Scan on q (cost=9747.82..12997.82 rows=1 width=12)
Filter: (q.v1 = 50)
-> WindowAgg (cost=9747.82..11747.82 rows=100000 width=8)
-> Sort (cost=9747.82..9997.82 rows=100000 width=8)
Sort Key: foo.v1, foo.v2
-> Seq Scan on foo (cost=0.00..1443.00 rows=100000 width=8)

let's work around it!
postgres=# CREATE OR REPLACE FUNCTION getfoovals(
v1 INOUT int,
sumv2 OUT BIGINT) RETURNS SETOF RECORD AS
$$
select
v1,
sum(v2)
over(partition by v1 order by v2)
from foo where foo.v1 = getfoovals.v1;
$$ LANGUAGE SQL STABLE;

select * from getfoovals(7);
v1 | sumv2
----+----------
7 | 1
7 | 3
7 | 6
7 | 10
7 | 15
7 | 21
<snip>

abstract to view:
postgres=# CREATE OR REPLACE VIEW foovals AS
select
v1,
(getfoovals(v1)).sumv2
from bar;

postgres=# select * from foovals where v1 = 7;

Main problem with this technique is awkwardness around using column
lest set returning function if it returns > 1 column...if you'r not
careful you can get extra invocations of function. you work hack
around this to some degree via:
*) offset 0 hacks
*) return type coersion hacks
*) 9.3 LATERAL feature completely nails it

Can't use WITH because it fences of the optimization.

This (window function optimization fencing) is probably #1 performance
gotcha I hit in everyday coding after LATERAL and lack of better
ability to inline simple SQL functions.

merlin

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message JD Wong 2013-03-01 16:50:26 Re: broke postgres, how to fix??
Previous Message Marc Mamin 2013-03-01 12:51:02 query syntax to combine 2 set returning functions ?