slow query plans caused by under-estimation of CTE cardinality

Lists: pgsql-performance
From: John Lumby <johnlumby(at)hotmail(dot)com>
To: <pgsql-performance(at)postgresql(dot)org>
Subject: slow query plans caused by under-estimation of CTE cardinality
Date: 2013-02-18 16:45:01
Message-ID: COL116-W19C19E568788169715A9A7A3F40@phx.gbl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance


On 2012-10-09 23:09:21
Tom Lane wrote:
>

> re subject Why am I getting great/terrible estimates with these CTE queries?
> You're assuming the case where the estimate is better is better for a
> reason ... but it's only better as a result of blind dumb luck.  The
> outer-level query planner doesn't know anything about the CTE's output
> except the estimated number of rows --- in particular, it doesn't drill
> down to find any statistics about the join column.
>

I am also struggling with a problem involving CTEs although in my case
it is caused by huge *under*-estimation of cardinality rather then *over*-estimation.
The statement is quite complex and the problem arises because there is a chain of
RECURSIVE CTEs each defined as a query involving an earlier CTE and more tables.
Eventually there is no hope for making a good cardinality estimate.

One CTE in particular has a cardinality estimate of 1  (I guess the actual
estimate is nearer zero and rounded up) but actual count is over 100000.
The planner puts this CTE as inner of a nested loop accessed by simple linear CTE scan
and the full query then takes over 20 minutes.

   ->  Nested Loop  (cost=0.00..0.06 rows=1 width=588) (actual time=2340.421..1201593.856 rows=105984 loops=1)
          Join Filter: ((winnum.subnet_id = binoptasc.subnet_id) AND (winnum.option_code = binoptasc.option_code) AND ((winnum.option_discriminator)::text = (binoptasc.option_discriminator)::text) AND (winnum.net_rel_level = binoptasc.net_rel_level))
          Rows Removed by Join Filter: 7001612448
          Buffers: shared hit=2290941
          ->  CTE Scan on winning_option_nums winnum  (cost=0.00..0.02 rows=1 width=536) (actual time=2338.422..2543.684 rows=62904 loops=1)
                Buffers: shared hit=2290941
          ->  CTE Scan on subnet_inhrt_options_asc binoptasc  (cost=0.00..0.02 rows=1 width=584) (actual time=0.000..9.728 rows=111308 loops=62904)

Whereas,  (by altering various statistics to be very wrong) the entire query runs in 21 seconds.

There have been several debates about how to address situations like this where
no practical non-query-specific statistics-gathering scheme can ever hope to
gather enough statistics to model the later derived tables.     E.g. the frowned-on
SELECTIVITY clause and ideas for query-specific statistics.

Meanwhile,   I have one other suggestion aimed specifically at problematic CTEs:
Would it be reasonable to provide a new Planner Configuration option  :

  enable_nestloop_cte_inner (boolean)
  Enables or disables the query planner's use of nested-loop join plans in which a CTE is the inner.
  It is impossible to suppress such nested-loop joins entirely,
  but turning this variable off discourages the planner from using one
  if there are other methods available,  such as sorting the CTE for merge-join
  or hashing it for hash-join.
  The default is on.

John


From: Vitalii Tymchyshyn <tivv00(at)gmail(dot)com>
To: John Lumby <johnlumby(at)hotmail(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: slow query plans caused by under-estimation of CTE cardinality
Date: 2013-02-18 20:40:37
Message-ID: CABWW-d3um0XrHEqORRSTGezV=vY3W0CBN=UJP245VHMyosrhRg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Since cte is already an optimization fence, you can go further and make it
temporary table.
Create table;analyze;select should make optimizer's work much easier.
18 лют. 2013 18:45, "John Lumby" <johnlumby(at)hotmail(dot)com> напис.

>
> On 2012-10-09 23:09:21
> Tom Lane wrote:
> >
>
>
> > re subject Why am I getting great/terrible estimates with these CTE
> queries?
> > You're assuming the case where the estimate is better is better for a
> > reason ... but it's only better as a result of blind dumb luck. The
> > outer-level query planner doesn't know anything about the CTE's output
> > except the estimated number of rows --- in particular, it doesn't drill
> > down to find any statistics about the join column.
> >
>
> I am also struggling with a problem involving CTEs although in my case
> it is caused by huge *under*-estimation of cardinality rather then
> *over*-estimation.
> The statement is quite complex and the problem arises because there is a
> chain of
> RECURSIVE CTEs each defined as a query involving an earlier CTE and more
> tables.
> Eventually there is no hope for making a good cardinality estimate.
>
> One CTE in particular has a cardinality estimate of 1 (I guess the actual
> estimate is nearer zero and rounded up) but actual count is over 100000.
> The planner puts this CTE as inner of a nested loop accessed by simple
> linear CTE scan
> and the full query then takes over 20 minutes.
>
> -> Nested Loop (cost=0.00..0.06 rows=1 width=588) (actual
> time=2340.421..1201593.856 rows=105984 loops=1)
> Join Filter: ((winnum.subnet_id = binoptasc.subnet_id) AND
> (winnum.option_code = binoptasc.option_code) AND
> ((winnum.option_discriminator)::text =
> (binoptasc.option_discriminator)::text) AND (winnum.net_rel_level =
> binoptasc.net_rel_level))
> Rows Removed by Join Filter: 7001612448
> Buffers: shared hit=2290941
> -> CTE Scan on winning_option_nums winnum (cost=0.00..0.02
> rows=1 width=536) (actual time=2338.422..2543.684 rows=62904 loops=1)
> Buffers: shared hit=2290941
> -> CTE Scan on subnet_inhrt_options_asc binoptasc
> (cost=0.00..0.02 rows=1 width=584) (actual time=0.000..9.728 rows=111308
> loops=62904)
>
> Whereas, (by altering various statistics to be very wrong) the entire
> query runs in 21 seconds.
>
> There have been several debates about how to address situations like this
> where
> no practical non-query-specific statistics-gathering scheme can ever hope
> to
> gather enough statistics to model the later derived tables. E.g. the
> frowned-on
> SELECTIVITY clause and ideas for query-specific statistics.
>
> Meanwhile, I have one other suggestion aimed specifically at problematic
> CTEs:
> Would it be reasonable to provide a new Planner Configuration option :
>
> enable_nestloop_cte_inner (boolean)
> Enables or disables the query planner's use of nested-loop join plans in
> which a CTE is the inner.
> It is impossible to suppress such nested-loop joins entirely,
> but turning this variable off discourages the planner from using one
> if there are other methods available, such as sorting the CTE for
> merge-join
> or hashing it for hash-join.
> The default is on.
>
> John
>
>
>
>
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>


From: John Lumby <johnlumby(at)hotmail(dot)com>
To: <tivv00(at)gmail(dot)com>
Cc: pgsql-performance maillist <pgsql-performance(at)postgresql(dot)org>
Subject: Re: slow query plans caused by under-estimation of CTE cardinality
Date: 2013-02-18 23:26:23
Message-ID: COL116-W25F8931477407ED7689D69A3F40@phx.gbl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance


Vitalii wrote
>
> Since cte is already an optimization fence, you can go further and make
> it temporary table.
> Create table;analyze;select should make optimizer's work much easier.
>
Thanks Vitalii  -  yes,   you are right,  and I have used that technique on other cases like this.

However,  for this one,   the entire query must be executed as a single query in order
that it is based on a consistent snapshot (in the Multiversion Concurrency Control sense)
of the base table data.    Using the temp table technique would allow a commit to occur
which would be invisible to the part of the query which would build the temp
but visible to the remaining part of the query.     I know I could set Repeatable Read
for the transaction to ensure the consistency but that causes other concurrency problems
as this query is part of a fairly long-running transaction.     I really just want this one
query to avoid "dangerous" plans (meaning relying too much on an estimate of cardinality
of ~ 1 being really correct).

I also forgot to show the fragment of "good" plan (from corrupting the statistics).
It demonstrates how effective the hash join is in comparison  -
20 minutes reduced down to 1 second for this join.

 ->  Hash Join  (cost=0.80..1.51 rows=1 width=588) (actual time=1227.517..1693.792 rows=105984 loops=1)
       Hash Cond: ((winnum.subnet_id = binoptasc.subnet_id) AND (winnum.option_code = binoptasc.option_code) AND ((winnum.option_discriminator)::text = (binoptasc.option_discriminator)::text) AND (winnum.net_rel_level = binoptasc.net_rel_level))
       Buffers: shared hit=386485 read=364
       ->  CTE Scan on winning_option_nums winnum  (cost=0.00..0.40 rows=20 width=536) (actual time=1174.558..1222.542 rows=62904 loops=1)
             Buffers: shared hit=386485 read=364
       ->  Hash  (cost=0.40..0.40 rows=20 width=584) (actual time=52.933..52.933 rows=111308 loops=1)
             Buckets: 1024  Batches: 1  Memory Usage: 8644kB
             ->  CTE Scan on subnet_inhrt_options_asc binoptasc  (cost=0.00..0.40 rows=20 width=584) (actual time=0.001..21.651 rows=111308 loops=1)

John


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: John Lumby <johnlumby(at)hotmail(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: slow query plans caused by under-estimation of CTE cardinality
Date: 2013-02-19 10:09:42
Message-ID: 23125.1361268582@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

John Lumby <johnlumby(at)hotmail(dot)com> writes:
> Meanwhile, I have one other suggestion aimed specifically at problematic CTEs:
> Would it be reasonable to provide a new Planner Configuration option :

> enable_nestloop_cte_inner (boolean)
> Enables or disables the query planner's use of nested-loop join plans in which a CTE is the inner.

Sounds pretty badly thought out to me. There might be some cases where
this would help, but there would be many more where it would be useless
or counterproductive.

The case that was discussed in the previous thread looked like it could
be addressed by teaching the planner to drill down into CTEs to find
variable referents, as it already does for subquery RTEs (cf
examine_simple_variable in selfuncs.c). I'm not sure if your case is
similar or not --- you didn't provide any useful amount of detail.

regards, tom lane