slow query plans caused by under-estimation of CTE cardinality

From: John Lumby <johnlumby(at)hotmail(dot)com>
To: <pgsql-performance(at)postgresql(dot)org>
Subject: slow query plans caused by under-estimation of CTE cardinality
Date: 2013-02-18 16:45:01
Message-ID: COL116-W19C19E568788169715A9A7A3F40@phx.gbl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance


On 2012-10-09 23:09:21
Tom Lane wrote:
>

> re subject Why am I getting great/terrible estimates with these CTE queries?
> You're assuming the case where the estimate is better is better for a
> reason ... but it's only better as a result of blind dumb luck.  The
> outer-level query planner doesn't know anything about the CTE's output
> except the estimated number of rows --- in particular, it doesn't drill
> down to find any statistics about the join column.
>

I am also struggling with a problem involving CTEs although in my case
it is caused by huge *under*-estimation of cardinality rather then *over*-estimation.
The statement is quite complex and the problem arises because there is a chain of
RECURSIVE CTEs each defined as a query involving an earlier CTE and more tables.
Eventually there is no hope for making a good cardinality estimate.

One CTE in particular has a cardinality estimate of 1  (I guess the actual
estimate is nearer zero and rounded up) but actual count is over 100000.
The planner puts this CTE as inner of a nested loop accessed by simple linear CTE scan
and the full query then takes over 20 minutes.

   ->  Nested Loop  (cost=0.00..0.06 rows=1 width=588) (actual time=2340.421..1201593.856 rows=105984 loops=1)
          Join Filter: ((winnum.subnet_id = binoptasc.subnet_id) AND (winnum.option_code = binoptasc.option_code) AND ((winnum.option_discriminator)::text = (binoptasc.option_discriminator)::text) AND (winnum.net_rel_level = binoptasc.net_rel_level))
          Rows Removed by Join Filter: 7001612448
          Buffers: shared hit=2290941
          ->  CTE Scan on winning_option_nums winnum  (cost=0.00..0.02 rows=1 width=536) (actual time=2338.422..2543.684 rows=62904 loops=1)
                Buffers: shared hit=2290941
          ->  CTE Scan on subnet_inhrt_options_asc binoptasc  (cost=0.00..0.02 rows=1 width=584) (actual time=0.000..9.728 rows=111308 loops=62904)

Whereas,  (by altering various statistics to be very wrong) the entire query runs in 21 seconds.

There have been several debates about how to address situations like this where
no practical non-query-specific statistics-gathering scheme can ever hope to
gather enough statistics to model the later derived tables.     E.g. the frowned-on
SELECTIVITY clause and ideas for query-specific statistics.

Meanwhile,   I have one other suggestion aimed specifically at problematic CTEs:
Would it be reasonable to provide a new Planner Configuration option  :

  enable_nestloop_cte_inner (boolean)
  Enables or disables the query planner's use of nested-loop join plans in which a CTE is the inner.
  It is impossible to suppress such nested-loop joins entirely,
  but turning this variable off discourages the planner from using one
  if there are other methods available,  such as sorting the CTE for merge-join
  or hashing it for hash-join.
  The default is on.

John

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Vitalii Tymchyshyn 2013-02-18 20:40:37 Re: slow query plans caused by under-estimation of CTE cardinality
Previous Message Nicolas Charles 2013-02-16 09:25:30 Re: Surprising no use of indexes - low performance