Re: recursing down a tree

Lists: pgsql-general
From: Carl Meyer <mrbz(at)gmx(dot)net>
To: pgsql-general(at)postgresql(dot)org
Subject: recursing down a tree
Date: 2002-06-28 06:29:16
Message-ID: jb0ohu86ct8g27ovvlvsn4rvous07am9fc@4ax.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

hi,

say i have a table with something like

id,parent,description

and parent points to an id of the very same table. now i have
a specific id and i want to recurse down the tree to the root.
is it in any way possible to do that without to doing a sql query
for each level ? until today i always saved the parent and formulated
a new query using it to get the next upper level. however, it seems to
me that this might be quite some work given a tree of a larger size.

thanks for your ideas
carl


From: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
To: Carl Meyer <mrbz(at)gmx(dot)net>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: recursing down a tree
Date: 2002-06-28 13:40:24
Message-ID: Pine.GSO.4.44.0206281638220.13076-100000@ra.sai.msu.su
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Carl,

you certainly need our "famous" module contrib/tree
(http://www.sai.msu.su/~megera/postgres/gist/)
Documentation is sparse and written in english but some people in
mailing list already use it.

Oleg

On Fri, 28 Jun 2002, Carl Meyer wrote:

> hi,
>
> say i have a table with something like
>
> id,parent,description
>
> and parent points to an id of the very same table. now i have
> a specific id and i want to recurse down the tree to the root.
> is it in any way possible to do that without to doing a sql query
> for each level ? until today i always saved the parent and formulated
> a new query using it to get the next upper level. however, it seems to
> me that this might be quite some work given a tree of a larger size.
>
> thanks for your ideas
> carl
>
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo(at)postgresql(dot)org
>
>

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83


From: Gunther Schadow <gunther(at)aurora(dot)regenstrief(dot)org>
To: Carl Meyer <mrbz(at)gmx(dot)net>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: recursing down a tree
Date: 2002-07-01 23:04:07
Message-ID: 3D20DFE7.3070008@aurora.regenstrief.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Carl Meyer wrote:

> say i have a table with something like
>
> id,parent,description
>
> and parent points to an id of the very same table. now i have
> a specific id and i want to recurse down the tree to the root.
> is it in any way possible to do that without to doing a sql query
> for each level ? until today i always saved the parent and formulated
> a new query using it to get the next upper level. however, it seems to
> me that this might be quite some work given a tree of a larger size.
>
> thanks for your ideas
> carl

Well, this calls for recursive union support as per SQL '99. The
query would be

CREATE TABLE MyTree(id, parent, description);

WITH tmp(id) AS (
SELECT parent FROM MyTree WHERE id=$myId
UNION ALL

SELECT MyTree.parent

FROM tmp INNER JOIN MyTree ON tmp.id = MyTree.id

) SELECT *;

This is most powerful if you search for children though, since
then you get the full power of the recursive union join to find
more and more stuff:

WITH tmp(id) AS (
SELECT id FROM MyTree WHERE id=$myId
UNION ALL
SELECT MyTree.id
FROM tmp INNER JOIN MyTree ON MyTree.parent = tmp.id
) SELECT *;

Now, this is not supported in PostgreSQL, and I only know of DB2

implementing it actually. But it's way cool!

In the absence of this feature ... and even to speed some of such
queries up, you can use some tree representation in a special
field. There are many approaches. An interval method is theoretically
nice because it has a constant storage requirement regardless of
depth and fan-out of the tree. Oleg's GiST tree type is certainly
very nice here. If you don't want to use special data types, you
can use some path expression string, but the problem is you'll
have to have leading zeroes in each path step.

regards
-Gunther

--
Gunther Schadow, M.D., Ph.D. gschadow(at)regenstrief(dot)org
Medical Information Scientist Regenstrief Institute for Health Care
Adjunct Assistant Professor Indiana University School of Medicine
tel:1(317)630-7960 http://aurora.regenstrief.org


From: Fran Fabrizio <ffabrizio(at)mmrd(dot)com>
To: Carl Meyer <mrbz(at)gmx(dot)net>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: recursing down a tree
Date: 2002-07-02 13:07:54
Message-ID: 3D21A5AA.80807@mmrd.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Carl Meyer wrote:

>hi,
>
>say i have a table with something like
>
>id,parent,description
>
>
>
Depending on what your most common types of queries are, you might be
better off with Joe Celko's nested set model (from the book SQL for
Smarties). It would do:

id, left, right, description

where you traverse the tree depth-first and as you pass the left side of
each node you increment a counter (left) and as you pass by it again on
the right you increment it again (right).

This model provides much easier ways of making queries such as "give me
all descendants of id=2" because there's then no recursion (WHERE
child.left between parent.left and parent.right).

I had a table set up exactly as you have described, and when the
recursion proved too costly in terms of db performance, I went to nested
set and we're pleased with the results.

Just another option,
Fran


From: Curt Sampson <cjs(at)cynic(dot)net>
To: Carl Meyer <mrbz(at)gmx(dot)net>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: recursing down a tree
Date: 2002-07-04 09:10:57
Message-ID: Pine.NEB.4.44.0207041805501.6356-100000@angelic.cynic.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Fri, 28 Jun 2002, Carl Meyer wrote:

> id,parent,description
>
> and parent points to an id of the very same table. now i have
> a specific id and i want to recurse down the tree to the root.
> is it in any way possible to do that without to doing a sql query
> for each level ?

There are a bunch of different ways to do this (several of which
have been mentioned already), and a lot of it depends on the type
of data you store, how big your trees are, etc. etc.

One really, really fast option, if you don't need to be moving up
and down arbitrary levels, but just need to find the root, or find
all children in a given tree, or similar operations, is to add a
column holding the ID of the root of the tree the node is in:

id, root_id, parent_id, description

I was tending to use smaller trees, though, (typically a few hundred,
almost never more than a few thousand nodes), and so for any more
complex searches manipulation I'd just load the whole tree into
memory with one query. (I had a clustered index on root_id--this
was MS SQL Server--and so this load was extremely fast; just a
matter of reading a few disk blocks.) This turned out to be far
more efficient and faster than doing three or four queries to get
up to the root.

cjs
--
Curt Sampson <cjs(at)cynic(dot)net> +81 90 7737 2974 http://www.netbsd.org
Don't you know, in this new Dark Age, we're all light. --XTC


From: Jeff Davis <list-pgsql-general(at)empires(dot)org>
To: Fran Fabrizio <ffabrizio(at)mmrd(dot)com>, Carl Meyer <mrbz(at)gmx(dot)net>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: recursing down a tree
Date: 2002-07-04 10:22:32
Message-ID: 200207040322.32773.list-pgsql-general@empires.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

I did a little research on the subject and it seemed quite interesting.
However, it seemed that insertion would require updating most of the tree for
a minor insert. Can you send along a few more details about your setup and
algorithms? I would like to find more information about this.

Regards,
Jeff

On Tuesday 02 July 2002 06:07 am, Fran Fabrizio wrote:
> Carl Meyer wrote:
> >hi,
> >
> >say i have a table with something like
> >
> >id,parent,description
>
> Depending on what your most common types of queries are, you might be
> better off with Joe Celko's nested set model (from the book SQL for
> Smarties). It would do:
>
> id, left, right, description
>
> where you traverse the tree depth-first and as you pass the left side of
> each node you increment a counter (left) and as you pass by it again on
> the right you increment it again (right).
>
> This model provides much easier ways of making queries such as "give me
> all descendants of id=2" because there's then no recursion (WHERE
> child.left between parent.left and parent.right).
>
> I had a table set up exactly as you have described, and when the
> recursion proved too costly in terms of db performance, I went to nested
> set and we're pleased with the results.
>
> Just another option,
> Fran
>
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster


From: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
To: Jeff Davis <list-pgsql-general(at)empires(dot)org>
Cc: Fran Fabrizio <ffabrizio(at)mmrd(dot)com>, Carl Meyer <mrbz(at)gmx(dot)net>, <pgsql-general(at)postgresql(dot)org>
Subject: Re: recursing down a tree
Date: 2002-07-04 11:35:23
Message-ID: Pine.GSO.4.44.0207041424440.13908-100000@ra.sai.msu.su
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Jeff, some detailed information about nested set model by Celco is in
http://www.mip.berkeley.edu/mip/related/thesaurus/thesaurus.pdf

Generally, if you want to support fast online update of tree stucture,
you have to maintain link information separately - parent, child, level
but this is very slow for select. If you code link information into
something, you will have a very fast select but slow insert.
I didn't found good compomise :( It's a nature of relational database,
when tuple know nothing except itself. We develope our module
tree ( www.sai.msu.su/~megera/postgres/gist) to support tree data type
using path information, so tuple does know hierarchical information
and we was able to provide index support. Unfortunately, it's very fast
for select. I think solution could be separating working with tree
structure (editing) and the exporting into our data type for
public usage.

Oleg
On Thu, 4 Jul 2002, Jeff Davis wrote:

> I did a little research on the subject and it seemed quite interesting.
> However, it seemed that insertion would require updating most of the tree for
> a minor insert. Can you send along a few more details about your setup and
> algorithms? I would like to find more information about this.
>
> Regards,
> Jeff
>
> On Tuesday 02 July 2002 06:07 am, Fran Fabrizio wrote:
> > Carl Meyer wrote:
> > >hi,
> > >
> > >say i have a table with something like
> > >
> > >id,parent,description
> >
> > Depending on what your most common types of queries are, you might be
> > better off with Joe Celko's nested set model (from the book SQL for
> > Smarties). It would do:
> >
> > id, left, right, description
> >
> > where you traverse the tree depth-first and as you pass the left side of
> > each node you increment a counter (left) and as you pass by it again on
> > the right you increment it again (right).
> >
> > This model provides much easier ways of making queries such as "give me
> > all descendants of id=2" because there's then no recursion (WHERE
> > child.left between parent.left and parent.right).
> >
> > I had a table set up exactly as you have described, and when the
> > recursion proved too costly in terms of db performance, I went to nested
> > set and we're pleased with the results.
> >
> > Just another option,
> > Fran
> >
> >
> >
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 4: Don't 'kill -9' the postmaster
>
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo(at)postgresql(dot)org
>
>

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83


From: 71062(dot)1056(at)compuserve(dot)com (--CELKO--)
To: pgsql-general(at)postgresql(dot)org
Subject: Re: recursing down a tree
Date: 2002-07-12 20:15:28
Message-ID: c0d87ec0.0207121215.63cdd047@posting.google.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

>> I did a little research on the subject and it seemed quite
interesting. However, it seemed that insertion would require updating
most of the tree for
a minor insert. Can you send along a few more details about your setup
and algorithms? I would like to find more information about this. <<

I am writing a separate book on "Trees in SQL" which will cover
several different models; I hope to be done by the end of the year. I
also hope to win the lottery.

Updating is not the problem people think it is. The nodes are in one
table and the structure is in another. The Tree table has (node_id,
lft, rgt) in its rows and those the rows are very short; a lot of them
fit into main storage at once. Since the newest addition (i.e.
youngest sibling) is made on the right side of the immediate
subordinates (siblings are ordered in the Nested Set model), you do
not have to update half the nodes on average. Finally, in the real
world, the tree structure tends to stay the same and the nodes change
-- even dot-com firms don't re-organize faster than their personnel
turnover <g>.

However, someone did have a situation where they used the nested set
model for the message threads in a newsgroup front end. Their answer
to this "fixed nodes and changing structure" problem was to use large
gaps in the numbering of (lft, rgt) pairs instead of incrementing
(lft, rgt) by one.

Think about it; subordination is shown with the BETWEEN predicate; any
sequence of unique, nested numbers will work. Subtree size is shown
by the formula ((rgt-lft+1)/2) which does depend on an increment and
happens to also give you subordination.


From: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
To: --CELKO-- <71062(dot)1056(at)compuserve(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: recursing down a tree
Date: 2002-07-15 13:39:08
Message-ID: Pine.GSO.4.44.0207151632170.15324-100000@ra.sai.msu.su
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On 12 Jul 2002, --CELKO-- wrote:

> >> I did a little research on the subject and it seemed quite
> interesting. However, it seemed that insertion would require updating
> most of the tree for
> a minor insert. Can you send along a few more details about your setup
> and algorithms? I would like to find more information about this. <<
>
> I am writing a separate book on "Trees in SQL" which will cover
> several different models; I hope to be done by the end of the year. I
> also hope to win the lottery.

Cool. btw, we have developed contrib/tree module which is very fast
for r/o operations. It's available on our GiST development page
http://www.sai.msu.su/~megera/postgres/gist/tree/README.tree.english
The idea is simple - to code path information to the node name,
so tuple itself does know it's place in hierarchy. It's sort of
cheating of relational data model. Also, we're developing another
module - contrib/ltree which will be much more powerful (we use
real names in path instead of digits as in contrib/tree).

>
> Updating is not the problem people think it is. The nodes are in one
> table and the structure is in another. The Tree table has (node_id,
> lft, rgt) in its rows and those the rows are very short; a lot of them
> fit into main storage at once. Since the newest addition (i.e.
> youngest sibling) is made on the right side of the immediate
> subordinates (siblings are ordered in the Nested Set model), you do
> not have to update half the nodes on average. Finally, in the real
> world, the tree structure tends to stay the same and the nodes change
> -- even dot-com firms don't re-organize faster than their personnel
> turnover <g>.
>
> However, someone did have a situation where they used the nested set
> model for the message threads in a newsgroup front end. Their answer
> to this "fixed nodes and changing structure" problem was to use large
> gaps in the numbering of (lft, rgt) pairs instead of incrementing
> (lft, rgt) by one.
>
> Think about it; subordination is shown with the BETWEEN predicate; any
> sequence of unique, nested numbers will work. Subtree size is shown
> by the formula ((rgt-lft+1)/2) which does depend on an increment and
> happens to also give you subordination.
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
> message can get through to the mailing list cleanly
>

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83


From: Curt Sampson <cjs(at)cynic(dot)net>
To: --CELKO-- <71062(dot)1056(at)compuserve(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: recursing down a tree
Date: 2002-07-15 13:53:11
Message-ID: Pine.NEB.4.44.0207152243290.492-100000@angelic.cynic.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On 12 Jul 2002, --CELKO-- wrote:

> I am writing a separate book on "Trees in SQL" which will cover
> several different models; I hope to be done by the end of the year. I
> also hope to win the lottery.

Are you *the* Joe Celko? "I'm not worthy! I'm not worthy!" :-)
Though I hope you'll let us know when your book comes out.

> Updating is not the problem people think it is. The nodes are in one
> table and the structure is in another. The Tree table has (node_id,
> lft, rgt) in its rows and those the rows are very short; a lot of them
> fit into main storage at once.

Unfortunately, this is not such a great assumption for postgres,
unless you otherwise have longish rows. The row overhead is over
40 bytes, including the row pointers in the page (giving you about
150 rows per page, when all is said and done). So with smallish
rows anyway, and depending on the application, and yada yada yada,
you might be better off not using a separate table.

cjs
--
Curt Sampson <cjs(at)cynic(dot)net> +81 90 7737 2974 http://www.netbsd.org
Don't you know, in this new Dark Age, we're all light. --XTC