Re: Two "equivalent" WITH RECURSIVE queries, one of them slow.

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Octavio Alvarez <alvarezp(at)alvarezp(dot)ods(dot)org>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Two "equivalent" WITH RECURSIVE queries, one of them slow.
Date: 2010-07-06 19:21:49
Message-ID: AANLkTinE_BZ4QSo-uZcQBKzPTSPcYKQ0wOS81v0Jrkwd@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Mon, Jul 5, 2010 at 2:07 AM, Octavio Alvarez
<alvarezp(at)alvarezp(dot)ods(dot)org> wrote:
> Hello.
>
> I have a tree-like table with a three-field PK (name, date, id) and one
> parent field.
> It has 5k to 6k records as of now, but it will hold about 1 million records.
>
> I am trying the following WITH RECURSIVE query:
>
> WITH RECURSIVE t AS (
>                 SELECT par.id AS tid, par.name, par.date, par.id, par.text,
> par.h_title, par.h_name, par.parent
>                   FROM _books.par
>        UNION
>                 SELECT t.tid AS pid, p.name, p.date, p.id, p.text,
> p.h_title, p.h_name, p.parent
>                   FROM t, _books.par p
>                  WHERE p.name = t.name AND p.date = t.date AND t.id =
> p.parent
>        )
>  SELECT t.tid, t.name, t.date, t.id, t.text, t.h_title, t.h_name, t.parent
>   FROM t WHERE name = 'cfx' AND date = '2009-08-19' AND tid = '28340';
>
> ... which takes 2547.503 ms
>
> However, if I try the same query but adding the same WHERE clause to the
> non-recursive term, I get much better results.
>
>
> WITH RECURSIVE t AS (
>                 SELECT par.id AS tid, par.name, par.date, par.id, par.text,
> par.h_title, par.h_name, par.parent
>                   FROM _books.par WHERE name = 'cfx' AND date = '2009-08-19'
> AND par.id = '28340'
>        UNION
>                 SELECT t.tid AS pid, p.name, p.date, p.id, p.text,
> p.h_title, p.h_name, p.parent
>                   FROM t, _books.par p
>                  WHERE p.name = t.name AND p.date = t.date AND t.id =
> p.parent
>        )
>  SELECT t.tid, t.name, t.date, t.id, t.text, t.h_title, t.h_name, t.parent
>   FROM t WHERE name = 'cfx' AND date = '2009-08-19' AND tid = '28340';
>
> ... which takes 0.221 ms
>
> I am being forced to use the slow query because I want to define it as a
> view, leaving the WHERE clause to the application.

I think this is just a limitation of the optimizer. Recursive queries
are a relatively new feature and the optimizer doesn't know a whole
lot about how to deal with them. That may get improved at some point,
but the optimization you're hoping for here is pretty tricky. In
order to push the WHERE clauses down into the non-recursive term, the
optimizer would need to prove that this doesn't change the final
results. I think that's possible here because it so happens that your
recursive term only generates results that have the same name, date,
and tid as some existing result, but with a slightly different
recursive query that wouldn't be true, so you'd need to make the code
pretty smart to work this one out.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Robert Haas 2010-07-06 19:30:29 Re: Question about partitioned query behavior
Previous Message Robert Haas 2010-07-06 19:01:44 Re: Highly Efficient Custom Sorting