Re: Status of Hierarchical Queries

From: Gregory Stark <stark(at)enterprisedb(dot)com>
To: "Gavin Sherry" <swm(at)alcove(dot)com(dot)au>
Cc: "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>, "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Status of Hierarchical Queries
Date: 2007-02-22 01:29:27
Message-ID: 87wt2bm13c.fsf@stark.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

"Gavin Sherry" <swm(at)alcove(dot)com(dot)au> writes:

> Can you elaborate on the 'two different sets of parameters' bit? I'm still
> without coffee.

The spec allows for arbitrarily complex recursive query structures. Including
mutually recursive queries, and even non-linearly recursive queries. I found
grokking them requires far stronger brews than coffee :)

But in a simple recursive tree search you have a node which wants to do a join
between the output of tree level n against some table to produce tree level
n+1. It can't simply execute the plan to produce tree level n since that's the
same tree it's executing itself. If it calls the Init method on itself it'll
lose all its state.

There's another reason it can't just execute the previous node. You really
don't want to recompute all the results for level n when you go to produce
level n+1. You want to keep them around from the previous iteration. Otherwise
you have an n^2 algorithm.

Think of the fibonacci sequence algorithm: if you implement it recursively the
naive way you have to return all the way back to the beginning to calculate
each number. But if you remember the last two you can calculate it directly
without recalculating all the previous number repeatedly.

>> It is sufficient for the non-recursive case which might make it worthwhile
>> putting it in 8.3. But even there user's expectations are probably that the
>> reason they're writing it as a cte is precisely to avoid duplicate execution.
>
> I wonder if the planner should decide that?

That's one option. For the non-recursive case we could inline the cte subquery
everywhere it's referenced and then add smarts to the planner to find
identical subqueries and have a heuristic to determine when it would be
advantageous to calculate the result once.

The alternative is to retain them as references to a single plan. Then have a
heuristic for when to inline them.

In neither case is a heuristic going to be particularly good. The problem is
that for any reasonably complex plan it'll be cheaper to execute it only once
than multiple times. Unless there's an advantage to be gained by inlining it
such as being able to push conditions down into it. But the only way to find
out if that will be possible would be to try planning it and see.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Gavin Sherry 2007-02-22 01:39:08 Re: Status of Hierarchical Queries
Previous Message Gavin Sherry 2007-02-22 01:08:34 Re: Status of Hierarchical Queries