Weird plan variation with recursive CTEs

Lists: pgsql-performance
From: Claudio Freire <klaussfreire(at)gmail(dot)com>
To: postgres performance list <pgsql-performance(at)postgresql(dot)org>
Subject: Weird plan variation with recursive CTEs
Date: 2012-04-26 17:37:54
Message-ID: CAGTBQpYLZ2zHXhet+7W-gpeurf9gTUsM6Z+g1rxeRjySByJBMw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Here's a strange thing.

Postgres 9.1.0 on a severely underpowered test machine

effective_cache_size = 128M
work_mem = 48M

This query:

WITH
RECURSIVE subordinates AS (
SELECT id, originator_id FROM partner_deliveries WHERE originator_id
in (225645)
UNION ALL
SELECT partner_deliveries.id, subordinates.originator_id
FROM partner_deliveries, subordinates
WHERE partner_deliveries.originator_id = subordinates.id
),
distinct_subordinates AS ( SELECT id, originator_id FROM (
SELECT DISTINCT id, originator_id FROM subordinates
UNION DISTINCT
SELECT id, id FROM partner_deliveries WHERE id in (225645)
) itab ORDER BY id )
SELECT
s.originator_id,
sum(o.opens) as opens,
sum(o.clicks) as clicks,
sum(o.questionnaire) as questionnaire,
sum(o.completes) as completes,
sum(o.quotafulls) as quotafulls,
sum(o.screenouts) as screenouts
FROM overview o
JOIN distinct_subordinates s ON s.id = o.partner_delivery_id
GROUP BY s.originator_id;

Works perfectly: http://explain.depesz.com/s/j9Q

The plan produces an index scan on overview (roughly 1.5M tuples),
which is desired.

Now, I tried to skip one hashagg to "speed it up a bit", and found
something really unexpected:

http://explain.depesz.com/s/X1c

for

WITH
RECURSIVE subordinates AS (
SELECT id, originator_id FROM partner_deliveries WHERE originator_id
in (225645)
UNION ALL
SELECT partner_deliveries.id, subordinates.originator_id
FROM partner_deliveries, subordinates
WHERE partner_deliveries.originator_id = subordinates.id
),
distinct_subordinates AS ( SELECT id, originator_id FROM (
SELECT id, originator_id FROM subordinates
UNION DISTINCT
SELECT id, id FROM partner_deliveries WHERE id in (225645)
) itab ORDER BY id )
SELECT
s.originator_id,
sum(o.opens) as opens,
sum(o.clicks) as clicks,
sum(o.questionnaire) as questionnaire,
sum(o.completes) as completes,
sum(o.quotafulls) as quotafulls,
sum(o.screenouts) as screenouts
FROM overview o
JOIN distinct_subordinates s ON s.id = o.partner_delivery_id
GROUP BY s.originator_id;

If you don't notice, the only difference is I removed the distinct
from the select against the recursive CTE for distinct_subordinates,
expecting the union distinct to take care. It did. But it took a whole
2 seconds longer! (WTF)

Fun thing is, nothing in the CTE's execution really changed. The only
change, is that now a sequential scan of overview was chosen instead
of the index.
Why could this be? The output (number of search values, even the
values themselves and their order) is the same between both plans.


From: Claudio Freire <klaussfreire(at)gmail(dot)com>
To: postgres performance list <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Weird plan variation with recursive CTEs
Date: 2012-04-26 18:23:09
Message-ID: CAGTBQpaNWKtgn3C6vcPrQcP2S0uH1bF23se=2HLjkc9AkjdXyQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Thu, Apr 26, 2012 at 2:37 PM, Claudio Freire <klaussfreire(at)gmail(dot)com> wrote:
> Fun thing is, nothing in the CTE's execution really changed. The only
> change, is that now a sequential scan of overview was chosen instead
> of the index.
> Why could this be? The output (number of search values, even the
> values themselves and their order) is the same between both plans.

I just noticed it's misestimating the output of the union distinct,
but not of the select distinct.

One would expect the estimation procedure to be the same in both cases.
Any reason for the difference? Any way to "teach" pg about it?


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Claudio Freire <klaussfreire(at)gmail(dot)com>
Cc: postgres performance list <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Weird plan variation with recursive CTEs
Date: 2012-04-26 20:58:20
Message-ID: 14114.1335473900@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Claudio Freire <klaussfreire(at)gmail(dot)com> writes:
> On Thu, Apr 26, 2012 at 2:37 PM, Claudio Freire <klaussfreire(at)gmail(dot)com> wrote:
>> Fun thing is, nothing in the CTE's execution really changed. The only
>> change, is that now a sequential scan of overview was chosen instead
>> of the index.
>> Why could this be? The output (number of search values, even the
>> values themselves and their order) is the same between both plans.

The estimated size of the UNION output is a lot different, thus
discouraging use of a nestloop for the outer query's join.

> I just noticed it's misestimating the output of the union distinct,
> but not of the select distinct.

> One would expect the estimation procedure to be the same in both cases.

No, I don't think that follows. The UNION code is aware that people
frequently write UNION rather than UNION ALL even when they're not
expecting any duplicates, so it shies away from assuming that UNION
will reduce the number of rows at all. On the other hand, it's not
at all common to write SELECT DISTINCT unless you actually expect
some duplicate removal to happen, so the default assumption in the
absence of any stats is different --- looks like it's assuming 10X
compression by the DISTINCT operation.

The real issue here of course is that we have no useful idea how many
rows will be produced by the recursive-union CTE, let alone what their
statistics will be like. So the planner is falling back on rules of
thumb that are pretty much guaranteed to be wrong in any particular
case.

regards, tom lane