Re: Poor performance when using a window function in a view

From: Chris Hanks <christopher(dot)m(dot)hanks(at)gmail(dot)com>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: rod(at)iol(dot)ie, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-general(at)postgresql(dot)org
Subject: Re: Poor performance when using a window function in a view
Date: 2013-03-01 22:53:32
Message-ID: CAK7KUdDywcUv5AyGCV3bvNzeAx6SpsnJH-G66Z+==LmLTzDCcQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Fri, Mar 1, 2013 at 6:37 AM, Merlin Moncure <mmoncure(at)gmail(dot)com> wrote:

> On Fri, Mar 1, 2013 at 3:59 AM, Chris Hanks
> <christopher(dot)m(dot)hanks(at)gmail(dot)com> wrote:
> > On Fri, Mar 1, 2013 at 1:21 AM, Raymond O'Donnell <rod(at)iol(dot)ie> wrote:
> >>
> >> On 01/03/2013 00:19, Chris Hanks wrote:
> >> > On Thu, Feb 28, 2013 at 6:00 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us
> >> > <mailto:tgl(at)sss(dot)pgh(dot)pa(dot)us>> wrote:
> >> >
> >> > Chris Hanks <christopher(dot)m(dot)hanks(at)gmail(dot)com
> >> > <mailto:christopher(dot)m(dot)hanks(at)gmail(dot)com>> writes:
> >> > > create or replace view values_view as
> >> > > select fkey1, fkey3,
> >> > > (derived1 / max(derived1) over (partition by fkey1)) as
> >> > derived1,
> >> > > (derived2 / sum(derived1) over (partition by fkey1)) as
> derived2
> >> > > from (
> >> > > select fkey1, fkey3,
> >> > > cast(sum((case when (value > 0.0) then 4 else 1 end)) as
> >> > double
> >> > > precision) as derived1,
> >> > > sum((case when (value > 0.0) then (value * 4) else (value +
> 1)
> >> > end)) as
> >> > > derived2
> >> > > from values
> >> > > group by fkey1, fkey3
> >> > > ) as t1;
> >> >
> >> > > -- This query requires a sequential scan on values, though all
> the
> >> > data it
> >> > > needs could be found much more efficiently with an index scan.
> >> > > explain analyze select * from values_view where fkey1 = 1263;
> >> >
> >> > To use the outer WHERE clause as an index constraint, postgres
> would
> >> > have to prove that scanning only the rows with fkey1 = 1263 would
> >> > still
> >> > find all the rows that would get examined by the window functions
> >> > ---
> >> > and in this case, it's not only the window functions that make
> that
> >> > less
> >> > than obvious, but the grouped aggregates in the sub-select below
> >> > them.
> >> > There's not nearly that amount of intelligence in the system about
> >> > window functions, as yet. So you'll have to write out the query
> >> > longhand and put the WHERE clause at the lower level, if you want
> >> > this
> >> > optimization to happen.
> >> >
> >> > regards, tom lane
> >> >
> >> >
> >> > Ok, that makes sense, thanks.
> >> >
> >> > Can anyone point me to an example of wrapping a function in a view,
> like
> >> > Merlin suggested? I'm not sure how that would work.
> >>
> >> Off the top of my head, I'd imagine it's as simple as:
> >>
> >> create view ... as
> >> select * from my_function(...);
> >>
> >> :-)
> >>
> >> Ray.
> >>
> >>
> >> --
> >> Raymond O'Donnell :: Galway :: Ireland
> >> rod(at)iol(dot)ie
> >
> >
> >
> > Sorry, I don't understand. I'm able to make a function that takes an
> integer
> > and uses it in the subselect as "WHERE fkey1 = arg", and that works as I
> > expect it to and it's plenty fast. But I don't see how to write a view to
> > take advantage of this function - what arguments would go in
> > my_function(...) when I'm declaring the view?
>
> First let's clearly state the problem - create some test data:
> postgres=# create table foo as select v1, v2 from (select
> generate_series(1,10) v1) q1, (select generate_series(1,10000) v2) q2;
> postgres=# create table bar as select v1 from generate_series(1,10) v1;
>
> foo is data table, bar defines the key around which we have the data.
>
> create an index:
> postgres=# create index on foo(v1);
>
> simple lookup is optimized obviously:
> postgres=# explain select * from foo where v1 = 7;
> QUERY PLAN
> ----------------------------------------------------------------------
> Index Scan using foo_v1_idx on foo (cost=0.00..4.38 rows=1 width=8)
> Index Cond: (v1 = 50)
>
> get indexed lookup for simple window function:
> postgres=# explain select v1, sum(v2) over(partition by v1 order by
> v2) from foo where v1 = 7;
> QUERY PLAN
>
> ----------------------------------------------------------------------------------
> WindowAgg (cost=4.39..4.41 rows=1 width=8)
> -> Sort (cost=4.39..4.39 rows=1 width=8)
> Sort Key: v2
> -> Index Scan using foo_v1_idx on foo (cost=0.00..4.38
> rows=1 width=8)
> Index Cond: (v1 = 50)
>
> push to subquery (essentially what view does) and we lose the optimization:
> postgres=# explain select * from (select v1, sum(v2) over(partition by
> v1 order by v2) from foo) q where v1 = 7;
> QUERY PLAN
>
> -----------------------------------------------------------------------------
> Subquery Scan on q (cost=9747.82..12997.82 rows=1 width=12)
> Filter: (q.v1 = 50)
> -> WindowAgg (cost=9747.82..11747.82 rows=100000 width=8)
> -> Sort (cost=9747.82..9997.82 rows=100000 width=8)
> Sort Key: foo.v1, foo.v2
> -> Seq Scan on foo (cost=0.00..1443.00 rows=100000
> width=8)
>
> let's work around it!
> postgres=# CREATE OR REPLACE FUNCTION getfoovals(
> v1 INOUT int,
> sumv2 OUT BIGINT) RETURNS SETOF RECORD AS
> $$
> select
> v1,
> sum(v2)
> over(partition by v1 order by v2)
> from foo where foo.v1 = getfoovals.v1;
> $$ LANGUAGE SQL STABLE;
>
> select * from getfoovals(7);
> v1 | sumv2
> ----+----------
> 7 | 1
> 7 | 3
> 7 | 6
> 7 | 10
> 7 | 15
> 7 | 21
> <snip>
>
> abstract to view:
> postgres=# CREATE OR REPLACE VIEW foovals AS
> select
> v1,
> (getfoovals(v1)).sumv2
> from bar;
>
> postgres=# select * from foovals where v1 = 7;
>
> Main problem with this technique is awkwardness around using column
> lest set returning function if it returns > 1 column...if you'r not
> careful you can get extra invocations of function. you work hack
> around this to some degree via:
> *) offset 0 hacks
> *) return type coersion hacks
> *) 9.3 LATERAL feature completely nails it
>
> Can't use WITH because it fences of the optimization.
>
> This (window function optimization fencing) is probably #1 performance
> gotcha I hit in everyday coding after LATERAL and lack of better
> ability to inline simple SQL functions.
>
> merlin
>

Hi - thanks everyone for your input. I bit the bullet and just made a
materialized view for now. I got Merlin's suggestion working, though - I'm
going to revisit it when 9.3 comes out and I can use LATERAL.

Thanks again!
Chris

In response to

Browse pgsql-general by date

  From Date Subject
Next Message James Cloos 2013-03-02 23:11:31 Re: Floating point error
Previous Message Michael Best 2013-03-01 22:26:59 Re: broke postgres, how to fix??