Re: Recursive Queries

Lists: pgsql-hackers
From: Gregory Stark <stark(at)enterprisedb(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Recursive Queries
Date: 2007-01-25 11:08:14
Message-ID: 87hcufcqjl.fsf@stark.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Hm, having skimmed through the Evgen Potemkin's recursive queries patch I find
it quite different from what I was expecting. My own thinking was headed off
in a different direction.

Basically the existing patch reimplements a new kind of join which implements
a kind of nested loop join (with newer versions adding a kind of hash join)
which feeds a new kind of tuplestore called a tupleconn.

I was thinking to have a new node above a standard join. The new node would
repeatedly feed back down to the join the results of the previous iteration
and reexecute the join to get the next generation.

I think my approach is more in line with the DB2/ANSI "WITH" style query which
is expected to do a breadth-first search. The Oracle CONNECT BY syntax is
expected to do a depth first search.

I have two major issues with the repeated-join model though.

a) Ideally we would want to switch between nested loop, merge join, and hash
join depending on the size of the previous generation. That means the join
node wouldn't be the same type of join for all the iterations. This is
important since in most applications you're traversing either up or down a
tree and are likely starting with very few nodes but often ending up with very
broad levels with many nodes. No single type of join will be appropriate for
the whole plan execution.

b) I do want to be able to support depth-first searching too. I'm not sure how
to reconcile that with the repeated-join conceptual model. We could always
resort the entire result set after generating it but that seems like an
unsatisfactory solution.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com


From: Hannu Krosing <hannu(at)skype(dot)net>
To: Gregory Stark <stark(at)enterprisedb(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Recursive Queries
Date: 2007-01-25 11:31:53
Message-ID: 1169724713.3302.6.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Ühel kenal päeval, N, 2007-01-25 kell 11:08, kirjutas Gregory Stark:
> Hm, having skimmed through the Evgen Potemkin's recursive queries patch I find
> it quite different from what I was expecting. My own thinking was headed off
> in a different direction.
>
> Basically the existing patch reimplements a new kind of join which implements
> a kind of nested loop join (with newer versions adding a kind of hash join)
> which feeds a new kind of tuplestore called a tupleconn.
>
> I was thinking to have a new node above a standard join. The new node would
> repeatedly feed back down to the join the results of the previous iteration
> and reexecute the join to get the next generation.
>
> I think my approach is more in line with the DB2/ANSI "WITH" style query which
> is expected to do a breadth-first search.

IIRC the ISO/ANSI spec has a special clause for specifying both BREADTH
FIRST and DEPTH FIRST searches

> The Oracle CONNECT BY syntax is expected to do a depth first search.
>
> I have two major issues with the repeated-join model though.
>
> a) Ideally we would want to switch between nested loop, merge join, and hash
> join depending on the size of the previous generation. That means the join
> node wouldn't be the same type of join for all the iterations. This is
> important since in most applications you're traversing either up or down a
> tree and are likely starting with very few nodes but often ending up with very
> broad levels with many nodes. No single type of join will be appropriate for
> the whole plan execution.

Then it's probably better to optimize for "very broad levels with many
nodes" case, as bad plan fro few joins will probably affect you less in
case of only a small number of nodes.

> b) I do want to be able to support depth-first searching too. I'm not sure how
> to reconcile that with the repeated-join conceptual model. We could always
> resort the entire result set after generating it but that seems like an
> unsatisfactory solution.

I guess there may be difference in breadth-first and depth-first actual
data, if you use some recursion control clauses or include level
variables.

--
----------------
Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me: callto:hkrosing
Get Skype for free: http://www.skype.com


From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: Gregory Stark <stark(at)enterprisedb(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Recursive Queries
Date: 2007-01-25 12:02:57
Message-ID: 20070125120257.GA13744@svana.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Jan 25, 2007 at 11:08:14AM +0000, Gregory Stark wrote:
> b) I do want to be able to support depth-first searching too. I'm not sure how
> to reconcile that with the repeated-join conceptual model. We could always
> resort the entire result set after generating it but that seems like an
> unsatisfactory solution.

If you have a tuplestore storing the intermediate tuples for looping,
then surely the only difference between depth and breadth searching is
that for the former new tuples goes to the front of the tuplestore, and
the latter to the end.

Have a nice day,
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.


From: Gregory Stark <stark(at)enterprisedb(dot)com>
To: "Martijn van Oosterhout" <kleptog(at)svana(dot)org>
Cc: "PostgreSQL Hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Recursive Queries
Date: 2007-01-25 13:23:27
Message-ID: 87tzyfb5ps.fsf@stark.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Martijn van Oosterhout" <kleptog(at)svana(dot)org> writes:

> On Thu, Jan 25, 2007 at 11:08:14AM +0000, Gregory Stark wrote:
>> b) I do want to be able to support depth-first searching too. I'm not sure how
>> to reconcile that with the repeated-join conceptual model. We could always
>> resort the entire result set after generating it but that seems like an
>> unsatisfactory solution.
>
> If you have a tuplestore storing the intermediate tuples for looping,
> then surely the only difference between depth and breadth searching is
> that for the former new tuples goes to the front of the tuplestore, and
> the latter to the end.

That's basically how the existing patch approached the problem. It invents a
new type of join and a new type of tuplestore that behaves this way. This has
the advantage of working the way Oracle users expect and being relatively
simple conceptually. It has the disadvantage of locking us into what's
basically a nested loop join and not reusing existing join code so it's quite
a large patch.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com


From: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To: Gregory Stark <stark(at)enterprisedb(dot)com>
Cc: Martijn van Oosterhout <kleptog(at)svana(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Recursive Queries
Date: 2007-01-25 15:52:01
Message-ID: 45B8D221.7050301@commandprompt.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Gregory Stark wrote:
> "Martijn van Oosterhout" <kleptog(at)svana(dot)org> writes:
>
>> On Thu, Jan 25, 2007 at 11:08:14AM +0000, Gregory Stark wrote:
>>> b) I do want to be able to support depth-first searching too. I'm not sure how
>>> to reconcile that with the repeated-join conceptual model. We could always
>>> resort the entire result set after generating it but that seems like an
>>> unsatisfactory solution.
>> If you have a tuplestore storing the intermediate tuples for looping,
>> then surely the only difference between depth and breadth searching is
>> that for the former new tuples goes to the front of the tuplestore, and
>> the latter to the end.
>
> That's basically how the existing patch approached the problem. It invents a
> new type of join and a new type of tuplestore that behaves this way. This has
> the advantage of working the way Oracle users expect and being relatively
> simple conceptually. It has the disadvantage of locking us into what's
> basically a nested loop join and not reusing existing join code so it's quite
> a large patch.

I believe our Syntax should be whatever the standard dictates,
regardless of Oracle.

Joshua D. Drake

--

=== The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive PostgreSQL solutions since 1997
http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/


From: Gregory Stark <gsstark(at)mit(dot)edu>
To: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
Cc: Martijn van Oosterhout <kleptog(at)svana(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Recursive Queries
Date: 2007-01-25 17:12:12
Message-ID: 87y7nrni8j.fsf@stark.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


"Joshua D. Drake" <jd(at)commandprompt(dot)com> writes:

> > That's basically how the existing patch approached the problem. It invents a
> > new type of join and a new type of tuplestore that behaves this way. This has
> > the advantage of working the way Oracle users expect and being relatively
> > simple conceptually. It has the disadvantage of locking us into what's
> > basically a nested loop join and not reusing existing join code so it's quite
> > a large patch.
>
> I believe our Syntax should be whatever the standard dictates,
> regardless of Oracle.

Well the issue here isn't one of syntax. The syntax is really an orthogonal
issue. The basic question is whether to treat this as a new type of plan node
with its behaviour hard coded or whether to try to reuse existing join types
executing them recursively on their output. I can see advantages either way.

As far as the syntax goes, now that I've actually read up on both, I have to
say: I'm not entirely sure I'm happy IBM won this battle. The Oracle syntax is
simple easy to use. The IBM/ANSI syntax is, well, baroque. There's a certain
logical beauty to it but I can't see users being happy trying to figure out
how to use it.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com


From: Hubert FONGARNAND <informatique(dot)internet(at)fiducial(dot)fr>
To: Gregory Stark <gsstark(at)mit(dot)edu>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Recursive Queries
Date: 2007-01-26 09:08:11
Message-ID: 1169802491.25694.3.camel@hublinux.fidudev.fr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Le jeudi 25 janvier 2007 à 12:12 -0500, Gregory Stark a écrit :

> "Joshua D. Drake" <jd(at)commandprompt(dot)com> writes:
>
> > > That's basically how the existing patch approached the problem. It invents a
> > > new type of join and a new type of tuplestore that behaves this way. This has
> > > the advantage of working the way Oracle users expect and being relatively
> > > simple conceptually. It has the disadvantage of locking us into what's
> > > basically a nested loop join and not reusing existing join code so it's quite
> > > a large patch.
> >
> > I believe our Syntax should be whatever the standard dictates,
> > regardless of Oracle.
>
> Well the issue here isn't one of syntax. The syntax is really an orthogonal
> issue. The basic question is whether to treat this as a new type of plan node
> with its behaviour hard coded or whether to try to reuse existing join types
> executing them recursively on their output. I can see advantages either way.
>
> As far as the syntax goes, now that I've actually read up on both, I have to
> say: I'm not entirely sure I'm happy IBM won this battle. The Oracle syntax is
> simple easy to use. The IBM/ANSI syntax is, well, baroque. There's a certain
> logical beauty to it but I can't see users being happy trying to figure out
> how to use it.
>

I agree with THAT, it's clear that WITH RECURSIVE is more standard...
but for the SQL developper CONNECT BY is a paradise... the syntax is
clear and powerful... That's why we've chosen to developp our queries
with that (using the connectby() function and the evgen potemkin.'s
patch (http://gppl.moonbone.ru/)

_______________________________________________
Ce message et les ventuels documents joints peuvent contenir des informations confidentielles.
Au cas o il ne vous serait pas destin, nous vous remercions de bien vouloir le supprimer et en aviser immdiatement l'expditeur. Toute utilisation de ce message non conforme sa destination, toute diffusion ou publication, totale ou partielle et quel qu'en soit le moyen est formellement interdite.
Les communications sur internet n'tant pas scurises, l'intgrit de ce message n'est pas assure et la socit mettrice ne peut tre tenue pour responsable de son contenu.


From: Hubert FONGARNAND <informatique(dot)internet(at)fiducial(dot)fr>
To: Gregory Stark <stark(at)enterprisedb(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Recursive Queries
Date: 2007-01-26 11:12:02
Message-ID: 1169809922.25694.21.camel@hublinux.fidudev.fr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

The CONNECT BY patch from evgen potemkin has been ported to pg 8.2...
and it's now in BSD License...

I will test it on our test environement

Le jeudi 25 janvier 2007 à 11:08 +0000, Gregory Stark a écrit :

> Hm, having skimmed through the Evgen Potemkin's recursive queries patch I find
> it quite different from what I was expecting. My own thinking was headed off
> in a different direction.
>
> Basically the existing patch reimplements a new kind of join which implements
> a kind of nested loop join (with newer versions adding a kind of hash join)
> which feeds a new kind of tuplestore called a tupleconn.
>
> I was thinking to have a new node above a standard join. The new node would
> repeatedly feed back down to the join the results of the previous iteration
> and reexecute the join to get the next generation.
>
> I think my approach is more in line with the DB2/ANSI "WITH" style query which
> is expected to do a breadth-first search. The Oracle CONNECT BY syntax is
> expected to do a depth first search.
>
> I have two major issues with the repeated-join model though.
>
> a) Ideally we would want to switch between nested loop, merge join, and hash
> join depending on the size of the previous generation. That means the join
> node wouldn't be the same type of join for all the iterations. This is
> important since in most applications you're traversing either up or down a
> tree and are likely starting with very few nodes but often ending up with very
> broad levels with many nodes. No single type of join will be appropriate for
> the whole plan execution.
>
> b) I do want to be able to support depth-first searching too. I'm not sure how
> to reconcile that with the repeated-join conceptual model. We could always
> resort the entire result set after generating it but that seems like an
> unsatisfactory solution.
>
_______________________________________________
Ce message et les ventuels documents joints peuvent contenir des informations confidentielles.
Au cas o il ne vous serait pas destin, nous vous remercions de bien vouloir le supprimer et en aviser immdiatement l'expditeur. Toute utilisation de ce message non conforme sa destination, toute diffusion ou publication, totale ou partielle et quel qu'en soit le moyen est formellement interdite.
Les communications sur internet n'tant pas scurises, l'intgrit de ce message n'est pas assure et la socit mettrice ne peut tre tenue pour responsable de son contenu.