self referencing tables/ nested sets etc...

Lists: pgsql-general
From: Rob Hoopman <rob(at)tuna(dot)nl>
To: pgsql-general(at)postgresql(dot)org
Subject: self referencing tables/ nested sets etc...
Date: 2004-03-23 21:25:17
Message-ID: 200403232225.17986.rob@tuna.nl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Hi all,
I mostly just lurk on the list, but I need to vent. And I could use any
advice/ experiences you'd be willing to share.

For those of you that don't want to read my rant:
What solutions are there to storing a wide/shallow self referencing tree in a
database?
Any pointers to previous discussions, articles or other background information
is very much apreciated. I seem to be googling in circles, the same articles
keep popping up.
If I learned one thing today it's that I need to educate myself a bit before
settling on an approach for this ;-)

Cheers,
Rob

Pre-P.S. If anyone is interested in my plpgsql version of the approach from
the dbazine.com article from my rant, you're welcome to it. I'd be happy to
post it to the list or sending it in the mail.

<RANT>
Today I was confronted with the problem of storing self referencing data, (The
popular tutorial material seems to be employees with a boss/subordinate
relationship.) I'm sure many of you have been there.
So like a good boy I went to trawl the pgsql archives and found some
references to the Celko 'nested set' model
[http://www.intelligententerprise.com/001020/celko.shtml]. After some more
googling around I found http://www.dbazine.com/tropashko4.shtml.

I'm not sure if any of you are familiar with this approach, but it's similar
to the 'nested sets' approach and somehow this approach appealed to me more
than the Celko 'nested sets'. ( And the 'Real World Performance group at
Oracle Corp.' sounds like a ringing endorsement )

I don't have any mathematical background, no formal CS education of note and
not a lot of experience with plpgsql programming. But... I'm a sucker for a
challenge.
So, equipped with my limited skillset, I have spent today wrestling with my
lack of skill, my unfamiliarity with the problem domain, inaccuracies in the
article, oracle to postgres translation quirks.
But I pressed on. Never loosing sight of my goal: the reward and pride of
overcoming this academic challenge.

It's been a long day, I managed to squeeze in a lunch break, two bathroom
visits and a few trips to the coffee machine. But my heart was light at the
prospect of a job well done.

And now,... at the end of the day I am proud to announce to the world:
"I have actually done it! Today I have acheived something I have not done
before. I am a better man tonight than I was this morning."

But,... it's a bittersweet victory because the provided solutions is
completely useless to me. It appears that when adding more than 48 sub nodes
to any node in the tree, craps out because of an INT8 column overflowing.

So after this aticlimax I've set aside my pride and turn to the list for
guidance.
</RANT>


From: "Shawn Harrison" <harrison(at)tbc(dot)net>
To: <pgsql-general(at)postgresql(dot)org>
Subject: Re: self referencing tables/ nested sets etc...
Date: 2004-03-25 00:22:16
Message-ID: 002401c411ff$3a8e3f30$119de3cf@THP63412
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Rob,

I'm not sure what application you're doing this for, so I don't know how to
advise. However, I implemented my own interpretation of "nested sets"
recently, which I will describe for you.

I was inspired by reading
http://www.yafla.com/papers/sqlhierarchies/sqlhierarchies.htm by Dennis W.
Forbes. But my own approach is to let the unique id of each object serve as
its "key" in the nested set key-chain (which I call its "ancestry"), and
separate the ids with path separator, e.g.:

sah=# select id, parent, ancestry, depth from objects;
id | parent | ancestry | depth

---+--------+----------+------

2 | 1 | 1/2 | 1

1 | 1 | 1 | 0

3 | 1 | 1/3 | 1

4 | 3 | 1/3/4 | 2

5 | 4 | 1/3/4/5 | 3

The ancestry attribute (type varchar) shows the "path" to each item. Then I
can do queries like the following:

-- selects all objects where the ancestry attribute has '3' as a complete
word.

select * from objects where ancestry ~ '.*[[:<:]]3[[:>:]].*';

The ancestry and depth are created automatically by trigger. This has been
working fine for me, but I haven't tested it with a tree of any great depth.
It would seem to be perfect for wide, shallow trees. The trick is the
trigger to create ancestry and depth on insert & update to objects. I just
use a "recursive trigger" -- it updates the ancestry and depth for each
direct child of the updated object, which in turn calls the trigger on each
of those children. I'm not aware of any recursion limits on this type of
thing in pgsql -- I'd be interested to know from the list if there are any.

HTH,

Shawn Harrison

----- Original Message -----
From: "Rob Hoopman" <rob(at)tuna(dot)nl>
To: <pgsql-general(at)postgresql(dot)org>
Sent: Tuesday, March 23, 2004 3:25 PM
Subject: [GENERAL] self referencing tables/ nested sets etc...

> Hi all,
> I mostly just lurk on the list, but I need to vent. And I could use any
> advice/ experiences you'd be willing to share.
>
> For those of you that don't want to read my rant:
> What solutions are there to storing a wide/shallow self referencing tree
in a
> database?
> Any pointers to previous discussions, articles or other background
information
> is very much apreciated. I seem to be googling in circles, the same
articles
> keep popping up.
> If I learned one thing today it's that I need to educate myself a bit
before
> settling on an approach for this ;-)
>
> Cheers,
> Rob
>
> Pre-P.S. If anyone is interested in my plpgsql version of the approach
from
> the dbazine.com article from my rant, you're welcome to it. I'd be happy
to
> post it to the list or sending it in the mail.
>
>
> <RANT>
> Today I was confronted with the problem of storing self referencing data,
(The
> popular tutorial material seems to be employees with a boss/subordinate
> relationship.) I'm sure many of you have been there.
> So like a good boy I went to trawl the pgsql archives and found some
> references to the Celko 'nested set' model
> [http://www.intelligententerprise.com/001020/celko.shtml]. After some more
> googling around I found http://www.dbazine.com/tropashko4.shtml.
>
> I'm not sure if any of you are familiar with this approach, but it's
similar
> to the 'nested sets' approach and somehow this approach appealed to me
more
> than the Celko 'nested sets'. ( And the 'Real World Performance group at
> Oracle Corp.' sounds like a ringing endorsement )
>
> I don't have any mathematical background, no formal CS education of note
and
> not a lot of experience with plpgsql programming. But... I'm a sucker for
a
> challenge.
> So, equipped with my limited skillset, I have spent today wrestling with
my
> lack of skill, my unfamiliarity with the problem domain, inaccuracies in
the
> article, oracle to postgres translation quirks.
> But I pressed on. Never loosing sight of my goal: the reward and pride of
> overcoming this academic challenge.
>
> It's been a long day, I managed to squeeze in a lunch break, two bathroom
> visits and a few trips to the coffee machine. But my heart was light at
the
> prospect of a job well done.
>
> And now,... at the end of the day I am proud to announce to the world:
> "I have actually done it! Today I have acheived something I have not done
> before. I am a better man tonight than I was this morning."
>
> But,... it's a bittersweet victory because the provided solutions is
> completely useless to me. It appears that when adding more than 48 sub
nodes
> to any node in the tree, craps out because of an INT8 column overflowing.
>
> So after this aticlimax I've set aside my pride and turn to the list for
> guidance.
> </RANT>
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/docs/faqs/FAQ.html


From: Joe Conway <mail(at)joeconway(dot)com>
To: Shawn Harrison <harrison(at)tbc(dot)net>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: self referencing tables/ nested sets etc...
Date: 2004-03-25 05:37:33
Message-ID: 4062701D.80606@joeconway.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Shawn Harrison wrote:
> Forbes. But my own approach is to let the unique id of each object serve as
> its "key" in the nested set key-chain (which I call its "ancestry"), and
> separate the ids with path separator, e.g.:

See connectby() in contrib/tablefunc. It produces much the same output
dynamically.

Joe


From: Manfred Koizar <mkoi-pg(at)aon(dot)at>
To: Rob Hoopman <rob(at)tuna(dot)nl>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: self referencing tables/ nested sets etc...
Date: 2004-03-25 11:19:45
Message-ID: rha560l2v6ot23sp1e5kj24vvgs6i5bgo1@email.aon.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Tue, 23 Mar 2004 22:25:17 +0100, Rob Hoopman <rob(at)tuna(dot)nl> wrote:
>What solutions are there to storing a wide/shallow self referencing tree in a
>database?

contrib/ltree (http://www.sai.msu.su/~megera/postgres/gist/ltree/) or
Joe's connectby() might help.

>googling around I found http://www.dbazine.com/tropashko4.shtml.

Thanks for the link. I really enjoyed reading that article. It has
beautiful coloured graphics and is nicely spiced with buzz words like
*fractal* etc.

>( And the 'Real World Performance group at
>Oracle Corp.' sounds like a ringing endorsement )

Ok, that made me change my sig. <bg>

> I have spent today wrestling with [...]
> inaccuracies in the article,
^
... and bugs ...

>But,... it's a bittersweet victory because the provided solutions is
>completely useless to me.

So you had your fun. Now throw it away and start working on a real
solution.

> It appears that when adding more than 48 sub nodes
>to any node in the tree, craps out because of an INT8 column overflowing.

AFAICS it doesn't depend on the number of siblings, but it fails when
the sum of the numbers in dotted path notation exceeds 62.

<rant>
Tropashko's Nested Intervals concept as presented in dbazine is broken
by design. It is essentially a Materialized Path implementation using
bit patterns to store path information. I'll call it Obfuscated
Materialized Path Method, OMPM.

Given the definitions as found in the article the following conditions
hold for each tree node:

0 < numer < 2*denom

denom = 2^n, n >= 1

Adding the first child to a node identified by path p:

denom(p.1) = 2 * denom(p)
numer(p.1) = 2 * numer(p) + 1

Adding child c+1 to a parent node identified by path p:

denom(p.(c+1)) = 2 * denom(p.c)
numer(p.(c+1)) = 2 * numer(p.c) - 3

For the example nodes used in the article we get the following values:

path x+y bin(numer)
<root> 1/1 1
1 3/2 11
1.1 7/4 111
1.1.1 15/8 1111
1.1.1.1 31/16 11111
1.2 11/8 1011
1.2.1 23/16 10111
1.3 19/16 10011
1.3.1 39/32 100111
2 3/4 011
2.1 7/8 0111
2.1.1 15/16 01111
2.1.2 27/32 011011
2.2 11/16 01011
2.2.1 19/32 010111
3 3/8 0011
3.1 7/16 00111
4 3/16 00011

The third column in this table shows the binary representation of the
numerator, padded to the length of the denominator with leading zeros.
The last two bits are always 11, they are redundant and we subsequently
ignore them. The remaining bits can be used to find a node in this
binary tree:

0 0 0
1---------------------2------------3------------4---- - . . .
| | | |
1| 1| 1| 1|
| 0 0 | | |
1.1-----1.2-----1.3 2.1 3.1---3.2 4.1---4.2
| | | |
1| 1| 1| |
| | | |
| 1.2.1---1.2.2 | |
| | | |
| 1.2.1.1 | |
| | 0 |
1.1.1 2.1.1---2.1.2 4.1.1
| |

Horizontal edges are marked with 0, they connect each node to its
immediate siblings on the same tree level. Vertical edges are marked
with 1, they lead from a parent node to its first child. The length of
the edges doesn't have any significance.

The node corresponding to a (numer, denom) pair is found by scanning the
bits of the binary representation of numer from left (most significant)
to right (least significant) starting at the position of the single bit
which is set in denom (and ignoring the last two bits as has already
been mentioned). We locate the upper left node "1" in the tree and move
to the right for each 0 bit and down for each 1 bit.

Using dotted path notation this means that we start with path "1". For
each 1 bit we append ".1" to the path, for each 0 bit we increase the
last component of the path by 1.

Converting a path to a bit pattern and a bit pattern to numer and denom
is left for the reader.

So path strings and OMPM (numer, denom) pairs uniquely correspond to
each other, and the number of bits required for numer and denom depends
on the *sum* of the path components.

E.g. the path "4.3.24.22.15" cannot be stored if you use 64 bit integers
for numer and denom.

It seems the author is aware of this limitation because he writes:
| More thorough implementation of the method would involve a domain
| of integers with a unlimited range

Does Oracle have integers with unlimited range? AFAIK Postgres does
not. But we have strings with practically unlimited length (text).
contrib/ltree stores path information in strings. It seems to have a
limit of 256 bytes.

Storage efficiency: Even if we had variable length integers with
unlimited range, the number of bits required to store a path in OMPM
would depend on the sum of the path components. With 64 bit integers we
could store paths like "1.1.1.1. ... .1.1.1" (62 levels) or "1.61" in 16
bytes (2 * 64 bits). In 16 bytes Postgres can store a string with a
length of up to 12 bytes. For "1.1" to "9.99" and "10.1" to "99.9" even
8 bytes are sufficient.

So for wide/shallow trees dotted path strings are more storage efficient
than OMPM bitmaps with long runs of 0 bits.

Regarding performance, both queries given near the end of the article

| * All the descendants of JONES, excluding himself:

| select h1.name from hierarchy h1, hierarchy h2
| where h2.name = 'JONES'
| and distance(h1.numer, h1.denom,
| h2.numer, h2.denom)>0;

and

| * All the ancestors of FORD, excluding himself:

| select h2.name from hierarchy h1, hierarchy h2
| where h1.name = 'FORD'
| and distance(h1.numer, h1.denom,
| h2.numer, h2.denom)>0;

require a full table scan over hierarchy because the condition involving
a distance() function call cannot use an index. So with OMPM the
performance of common queries is worse than with classic materialized
path strings (e1.path LIKE e2.path || '%').

One more reason to look at contrib/ltree.
</rant>

Good luck!

Servus
Manfred, self-appointed member of the Global PG Performance Group


From: Rob Hoopman <rob(at)tuna(dot)nl>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: self referencing tables/ nested sets etc...
Date: 2004-03-25 14:38:39
Message-ID: 200403251538.39928.rob@tuna.nl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Thank you both, I should have known to have look at the contribs first.

On Thursday 25 March 2004 06:37, Joe Conway wrote:
> Shawn Harrison wrote:
> > Forbes. But my own approach is to let the unique id of each object serve
> > as its "key" in the nested set key-chain (which I call its "ancestry"),
> > and separate the ids with path separator, e.g.:
>
> See connectby() in contrib/tablefunc. It produces much the same output
> dynamically.
>
> Joe
>
> ---------------------------(end of broadcast)---------------------------
> TIP 8: explain analyze is your friend


From: "Shawn Harrison" <harrison(at)tbc(dot)net>
To: "Joe Conway" <mail(at)joeconway(dot)com>
Cc: <pgsql-general(at)postgresql(dot)org>
Subject: Re: self referencing tables/ nested sets etc...
Date: 2004-03-25 15:20:07
Message-ID: 000101c41348$e72919a0$119de3cf@THP63412
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Joe,

That's cool! Thanks for the tip. It looks like your "branch" is my
"ancestry". I'll look in contrib more often. I could use connectby() instead
of my homebrew plpgsql functions.

Shawn

----- Original Message -----
From: "Joe Conway" <mail(at)joeconway(dot)com>
To: "Shawn Harrison" <harrison(at)tbc(dot)net>
Cc: <pgsql-general(at)postgresql(dot)org>
Sent: Wednesday, March 24, 2004 11:37 PM
Subject: Re: [GENERAL] self referencing tables/ nested sets etc...

> Shawn Harrison wrote:
> > Forbes. But my own approach is to let the unique id of each object serve
as
> > its "key" in the nested set key-chain (which I call its "ancestry"), and
> > separate the ids with path separator, e.g.:
>
> See connectby() in contrib/tablefunc. It produces much the same output
> dynamically.
>
> Joe


From: Rob Hoopman <rob(at)tuna(dot)nl>
To: pgsql-general(at)postgresql(dot)org
Cc: Manfred Koizar <mkoi-pg(at)aon(dot)at>
Subject: Re: self referencing tables/ nested sets etc...
Date: 2004-03-25 19:56:35
Message-ID: 200403252056.35035.rob@tuna.nl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Thursday 25 March 2004 12:19, Manfred Koizar wrote:
> On Tue, 23 Mar 2004 22:25:17 +0100, Rob Hoopman <rob(at)tuna(dot)nl> wrote:
> >What solutions are there to storing a wide/shallow self referencing tree
> > in a database?
>
> contrib/ltree (http://www.sai.msu.su/~megera/postgres/gist/ltree/) or
> Joe's connectby() might help.
That's just what I need, thanks!

>
> >( And the 'Real World Performance group at
> >Oracle Corp.' sounds like a ringing endorsement )
>
> Ok, that made me change my sig. <bg>
:-)
I should have know the Oracle Corp's 'Real World' has little or nothing to do
with my real world.

>
> > I have spent today wrestling with [...]
> > inaccuracies in the article,
>
> ^
> ... and bugs ...
That's what I was thinking, but I thought surely this Oracle dude knows better
than me.

> > It appears that when adding more than 48 sub nodes
> >to any node in the tree, craps out because of an INT8 column overflowing.
>
> AFAICS it doesn't depend on the number of siblings, but it fails when
> the sum of the numbers in dotted path notation exceeds 62.
>
Maybe, but some of the intermediate steps are larger than the number that gets
stored in the end. Actually that's where this implementation broke for me.

> <rant>
SNIP
> </rant>
Thanks, that was educational.

Cheers,
Rob


From: Manfred Koizar <mkoi-pg(at)aon(dot)at>
To: Rob Hoopman <rob(at)tuna(dot)nl>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: self referencing tables/ nested sets etc...
Date: 2004-03-25 21:01:30
Message-ID: 5ih660tq1i1ovtsiacg40kus9gu77ud3rb@email.aon.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Thu, 25 Mar 2004 20:56:35 +0100, Rob Hoopman <rob(at)tuna(dot)nl> wrote:
>> > It appears that when adding more than 48 sub nodes
>> >to any node in the tree, craps out because of an INT8 column overflowing.
>>
>> AFAICS it doesn't depend on the number of siblings, but it fails when
>> the sum of the numbers in dotted path notation exceeds 62.
>>
>Maybe, but some of the intermediate steps are larger than the number that gets
>stored in the end. Actually that's where this implementation broke for me.

Rob, do you still have the functions and the data that led to the
overflow? If so, would you care to locate the parent of the node you
failed to insert and this parent's last child. Then please post the
output of

SELECT pk, numer, denom, path(numer, denom)
FROM yourtable
WHERE pk = 'parentpk' OR pk = 'childpk';

I'd like to find out whether OMPM is flawed or my theory about it.

Thanks.
Servus
Manfred


From: Rob Hoopman <rob(at)tuna(dot)nl>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: self referencing tables/ nested sets etc...
Date: 2004-03-26 00:03:38
Message-ID: 200403260103.39009.rob@tuna.nl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Thursday 25 March 2004 22:01, Manfred Koizar wrote:
> On Thu, 25 Mar 2004 20:56:35 +0100, Rob Hoopman <rob(at)tuna(dot)nl> wrote:
> >> > It appears that when adding more than 48 sub nodes
> >> >to any node in the tree, craps out because of an INT8 column
> >> > overflowing.
> >>
> >> AFAICS it doesn't depend on the number of siblings, but it fails when
> >> the sum of the numbers in dotted path notation exceeds 62.
> >
> >Maybe, but some of the intermediate steps are larger than the number that
> > gets stored in the end. Actually that's where this implementation broke
> > for me.
>
> Rob, do you still have the functions
Yes

> and the data
No, but I can reproduce. I wrote a function that let's me insert a number of
child nodes in the tree.

> that led to the
> overflow? If so, would you care to locate the parent of the node you
> failed to insert and this parent's last child. Then please post the
> output of
>
> SELECT pk, numer, denom, path(numer, denom)
> FROM yourtable
> WHERE pk = 'parentpk' OR pk = 'childpk';

Here you go:
test=> SELECT name, numer, denom, path(numer,denom)
test-> FROM emps
test-> WHERE name = 'Drone 1.1.10.8' OR name = 'Drone 1.1.10.8.29';
name | numer | denom | path
-------------------+-----------------+-----------------+--------------
Drone 1.1.10.8 | 1573379 | 1048576 | .1.1.10.8
Drone 1.1.10.8.29 | 844700881780739 | 562949953421312 | .1.1.10.8.29
(2 rows)

Have another:
test=> SELECT name, numer, denom, path(numer,denom)
test-> FROM emps
test-> WHERE name = 'KING' OR name = 'Drone 1.48';
name | numer | denom | path
------------+-----------------+-----------------+-------
KING | 3 | 2 | .1
Drone 1.48 | 562949953421315 | 562949953421312 | .1.48

test=>

So it seems that you are right, but the magic number seems to be 49

>
> I'd like to find out whether OMPM is flawed or my theory about it.
I wouldn't want to rule out the possibility of me screwing up somewhere in the
code. I had never done any plpgsl before yesterday.

Cheers,
Rob

Some more info:

The insert fails like this after enabling some debugging ( ni_insert_nodes is
the function that autogenerates childnodes ):
test2=> SELECT ni_insert_nodes( '1', 1 );
NOTICE: Current Child: 49
NOTICE: Last Child: 49
NOTICE: Inserting Drone 1.49
NOTICE: >>>> start child_number
NOTICE: num is: 1
NOTICE: den is: 1
NOTICE: child is: 1
NOTICE: <<<< end child_number
NOTICE: >>>> start child_number
NOTICE: num is: 3
NOTICE: den is: 2
NOTICE: child is: 49
NOTICE: <<<< end child_number
WARNING: Error occurred while executing PL/pgSQL function child_numer
WARNING: while casting return value to function's return type
ERROR: Bad int8 external representation "1.12589990684263e+15"

The function that fails:
CREATE FUNCTION child_numer( INT8, INT8, INT8) RETURNS INT8 AS'
DECLARE
num ALIAS FOR $1;
den ALIAS FOR $2;
child ALIAS FOR $3;
BEGIN
RAISE NOTICE '' >>>> start child_number'';
RAISE NOTICE ''num is: %'', num;
RAISE NOTICE ''den is: %'', den;
RAISE NOTICE ''child is: %'', child;

RAISE NOTICE '' <<<< end child_number'';
RETURN num*pow(2, child)+3-pow(2, child);
END
' LANGUAGE plpgsql;


From: Manfred Koizar <mkoi-pg(at)aon(dot)at>
To: Rob Hoopman <rob(at)tuna(dot)nl>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: self referencing tables/ nested sets etc...
Date: 2004-03-26 09:15:32
Message-ID: lao7605t695dguk7ehql7ikcm56rufhc0s@email.aon.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Fri, 26 Mar 2004 01:03:38 +0100, Rob Hoopman <rob(at)tuna(dot)nl> wrote:
>So it seems that you are right, but the magic number seems to be 49

>NOTICE: num is: 3
>NOTICE: den is: 2
>NOTICE: child is: 49
>WARNING: Error occurred while executing PL/pgSQL function child_numer
>WARNING: while casting return value to function's return type
>ERROR: Bad int8 external representation "1.12589990684263e+15"

> RETURN num*pow(2, child)+3-pow(2, child);

pow() is a floating point function. Double has a precision of 48 bits,
so 2 ^ 48 is the largest number that can reliably be converted from
double to bigint. Better use integer arithmetic:

CREATE OR REPLACE FUNCTION pow2(bigint) RETURNS bigint AS '
DECLARE
e bigint;
ret bigint;
BEGIN
e = $1;
IF (e < 0) THEN
RAISE EXCEPTION ''2 ^ % does not fit into a bigint'', e;
END IF;
IF (e > 62) THEN
RAISE EXCEPTION ''2 ^ % does not fit into a bigint'', e;
END IF;
ret = 1;
WHILE (e > 0) LOOP
ret = 2 * ret;
e = e - 1;
END LOOP;
RETURN ret;
END
' LANGUAGE plpgsql;

Thus you could use OMPM to store up to 61 child nodes, but only for the
very first parent node ("1.1", ..., "1.61").

BTW, I've read Tropashko's follow-up articles
http://arxiv.org/html/cs.DB/0401014, http://arxiv.org/pdf/cs.DB/0402051,
as well as various discussions on comp.databases.theory. My conclusion
is that OMPM is irreparably broken. With every kludge added to it
(Farey Intervals, Continued Fractions) the correspondence to
materialized path strings gets even more obfuscated, but the main
shortcoming remains: If you try to store materialized path information
in a fixed number of integers you run out of bits very soon.

And if you use floating point you lose accuracy.
Fuzzy trees: "My boss is JONES. Or is it CLARK?"

Servus
Manfred


From: vadimtro_invalid(at)yahoo(dot)com (Vadim Tropashko)
To: pgsql-general(at)postgresql(dot)org
Subject: Re: self referencing tables/ nested sets etc...
Date: 2004-03-30 02:24:29
Message-ID: c7ec22df.0403291824.142627ab@posting.google.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

mkoi-pg(at)aon(dot)at (Manfred Koizar) wrote in message news:<lao7605t695dguk7ehql7ikcm56rufhc0s(at)email(dot)aon(dot)at>...
> BTW, I've read Tropashko's follow-up articles
> http://arxiv.org/html/cs.DB/0401014, http://arxiv.org/pdf/cs.DB/0402051,
> as well as various discussions on comp.databases.theory. My conclusion
> is that OMPM is irreparably broken. With every kludge added to it
> (Farey Intervals, Continued Fractions) the correspondence to
> materialized path strings gets even more obfuscated, but the main
> shortcoming remains: If you try to store materialized path information
> in a fixed number of integers you run out of bits very soon.

The article series
Binary Fractions (dbazine) -> Farey Fractions -> Continued Fractions

might give you wrong impression that it's one kludge upon another. In
reality it's just a trail of discovery process. Let me summarize what
I think withstanded the test of time:

http://www.dbazine.com/tropashko4.shtml
The idea of generalizing nested integers to nested fractions is still
valid, although the particular encoding schema with Binary Rationals
proved to be not practical.

http://www.dbazine.com/tropashko5.shtml
Uses the same Binary Rationals encoding schema solving tree relocation
problem.

Farey Fractions
Different encoding schema, that still is waiting somebody to refute
it. Unlike previous articles I did some volume testing there.

Continued Fractions
That is just a different perspective into essentially the same
encoding. Now the connection to Materialized Path is transparent:
whenever you concatenate paths in path encoding, for example
11.2 + 7.3.5 = 11.2.7.3.5
you nest continued fractions
x=7+1/(3+...) inside 11+1/(2+1/x) to get 11+1/(2+1/(7+...))
Technically, this could be done much easier than nesting continued
fractions. The encoding is just four integer numbers <a,b,c,d> that
can be arranged either into Moebius function
(ax+c)/(bx+d)
so that concatenation of paths is substitution of functions, or
alternatively, into 2x2 matrix:
Matrix([a,c],[b,d])
so that concatenation of paths is Matrix multiplication. Example:
path
3.1.1.9.9.9.12.31.24.500.17.4.39
corresponds to continued fraction
3+1/(1+1/(1+1/(9+1/(9+1/(9+1/(12+1/(31+1/(24+1/(500+1/(17+1/(4+1/39)))))))))));
which corresponds to
Matrix([68075613118554,1734570625891],[19306670376781,491935096655])
All matrices have additional property that absolute value of the
determinant is 1 (Proposition 1). In our case
68075613118554*491935096655-1734570625891*19306670376781=1

In short, I'm taking back my previous accessment that for all
practical purposes Materialized Path is the best encoding. Continued
Fractions/Moebius transformations/2x2 Matrices are much nicer IMO. But
this totally depends upon how comfortable are you with math.