Re: Making a tree with "millions and millions" of dynamic nodes

From: "Mikito Harakiri" <mikharakiri(at)iahu(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Making a tree with "millions and millions" of dynamic nodes
Date: 2003-12-03 02:51:07
Message-ID: 3kczb.23$2A3.117@news.oracle.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general


"Christian Fowler" <google(at)NOSPAM(dot)gravesweeper(dot)com> wrote in message
news:6b-dnQJnDI6YvlCiXTWc-g(at)speakeasy(dot)net(dot)(dot)(dot)
> For selection, it is crucial for me to get:
>
> 1. path generation speed

Path to the root? It is coded explicitly in the materialized path. All you
need to do is parsing the path and generating a list of prefixes that you
use within your query like this:

select * from hierarchy where path in ('1','1.2','1.2.1', '1.2.1.1')

If you have an index, and your rdbms optimiser is smart enough, the query
should execute fast.

> 2. immediate sibling speed

Here is the kludge:

select * from hierarchy where path in ('1.2.1.1','1.2.1.2','1.2.1.3',
'1.2.1.4','1.2.1.5')

Again, I assume that your query would use an index here.

If you get exactly 5 records, then you can't be sure that your node has that
many children. You have to repeat query like this:
select * from hierarchy where path in (
'1.2.1.1','1.2.1.2','1.2.1.3', '1.2.1.4','1.2.1.5'
,'1.2.1.6','1.2.1.7','1.2.1.8', '1.2.1.9','1.2.1.10')
,'1.2.1.11','1.2.1.12','1.2.1.13', '1.2.1.4','1.2.1.15')
,'1.2.1.16','1.2.1.17','1.2.1.18', '1.2.1.19','1.2.1.20')
,'1.2.1.21','1.2.1.22','1.2.1.23', '1.2.1.4','1.2.1.25')

Yet again, there might be more than 25 children, so you'll have to run yet
again more "generos" query.

The pitfall here is that at some point the optimiser may get tired to do
"or" expansion, so that at some point you might end up with full table scan.
But, this is implementation dependent, so that you might be able to
influence optimizer decision.

As you see, I'm not worried how many bytes materialized path has, or if
parsing it takes more time than multiplying 2 integers. My concern is if
your query can leverage index or not.

> 3. children count speed

Children, or descendants? I guess no method can beat Celko's descendant's
formula
#descendants=(rgt-lft+1)/2
The count is implicitly stored at the root node, so that even for hierarchy
1M nodes we answer how many nodes are there instantly. This is also an
Achiles' heel of the method: any update to the hierarchy triggers refresh
of the "counter". One aggregate is never enough, though: it's often useful
to know total salary too:-)

If you meant not descendants, but children, then use a method similar to
bullet #2.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Lamar Owen 2003-12-03 04:09:14 Re: perl(Pg) (S)RPM
Previous Message jini us 2003-12-03 01:23:11 Re: embedded postgresql + C++ IDE