Re: Recursive query syntax ambiguity

From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: Gregory Stark <stark(at)enterprisedb(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Recursive query syntax ambiguity
Date: 2007-02-05 16:18:23
Message-ID: 20070205161823.GC4811@svana.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jan 29, 2007 at 01:38:02PM +0000, Gregory Stark wrote:
> Instead I suggest we create one type of reentrant node, which memoizes its
> output. It would be a kind of on-demand Materialize. It would be paired with a
> second node type which would be the only type of node which can invoke it.
> This RecursiveCall node would invoke the Memoize node using a special
> interface in which it passes enough state for the Memoize node to seek to the
> correct place in its output.

That I beleive is the right approach. I think an equivalent way of
looking at it is a "Loop" node with an InitPlan and a StepPlan.

Initially, the Loop node executes the InitPlan to get it's initial set
of tuples, storing them like a Materialize node does. After that it
keeps calling the StepNode, where somewhere inside the it has a node
that extracts a row from the aforementioned tuplestore. It stores these
returned tuples in the tuplestore also, thus giving you recursion.

<snip>

> (I've convinced myself that that's true but I should probably work out
> a good proof of it before I make all this depend on it.)

Yeah, proving it is going to be tricky, I'm not sure what the standard
says about infinite recursion.

> There are three general cases of the Memoize node. Breadth-first, Depth-first,
> and non-linearly-recursive.

I think the the only difference between depth and bredth-first searches
is (if you consider the tuplesort to be a queue) whether the new tuples
go to the front or the back of the list. But a data structures and
algorithms book will know this.

> There are a few open issues to consider. Namely, how to cost a RecursiveCall
> node.

One note: if you know that if you get p tuples out for every tuple in
(where p<1) then the asymptotic result of 1 + p + p*p+ ... is 1/(1-p)

However, I don't know it matters. You only need to cost the plan if
there are alternate paths and given the plan structure is strongly
constrained, I'm not sure how much it matters.

> Also, if a subplan has exactly one call-site we really ought to inline it as
> it will get much more reliable plans. Similarly if there are two call sites we
> should consider inlining it if the cost of materializing the results (and
> reading them back) is more than n-call-sites x the cost of executing the
> query. I would expect That would happen with plain sequential scans for
> example.

In the case where the subplan has side-effects, you can't optimise at
all. In the case of read-committed mode, will two seq-scans always
return the same result?

Hope this helps,
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2007-02-05 16:23:58 Re: Recursive query syntax ambiguity
Previous Message Gregory Stark 2007-02-05 16:15:21 Re: VC2005 build and pthreads