Making a tree with "millions and millions" of dynamic nodes

From: Christian Fowler <google(at)NOSPAM(dot)gravesweeper(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Making a tree with "millions and millions" of dynamic nodes
Date: 2003-12-02 23:40:53
Message-ID: 6b-dnQJnDI6YvlCiXTWc-g@speakeasy.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

I have a VERY LARGE pile of geographic data that I am importing into a database (db of choice is postgres,
though may hop to oracle if necessary). The data is strictly hierarchical - each node has one, and only one
parent. The depth should not exceed 6 or 7 levels. The initial import will have about 6 million leaves, and
3 million branches. I would expect the leaves to grow significantly, in number easily tripling. However, the
branches will likely stay very constant in number, but I expect there locations to shift around somewhat
(affecting up to thousand of children).

For selection, it is crucial for me to get:

1. path generation speed
2. immediate sibling speed
3. children count speed

I have implemented a first run using Joe Celko's Nested Sets (w/ a mod to store tree level for speed). The
update propigation issue is the achilles heel of this approach for me. I have read Vadim Tropashko Nested
Interval concept ( http://www.dbazine.com/tropashko4.html ) , and my brain is painfully stretched enough to
get the general idea. I have a feeling i will hit the arithmetic issues others of reported.

So it seems Materialized Path is my only option, however I am concerned about LIKE performance for the right
hand side of the tree, where the path is 8digits x 6 levels = 48 chars. Should I be concerned? I need
split-second real-time performance, and can't imagine it will be as fast the Nested Set arithmatic approach.
I can flatten out the import to insure the upper tree has the smallest numbers, however, it will save at
most 8 chars on the path.

I've been googling through USENET archives watching the big debates of years gone by. I've grazed much
knowledge, but now am curious if anyone has any thoughts/advice/war stories about a data set this large.

(and if any fellow postgres fans have some reassuring words about real-time performance of a 20 million row
tree, i'd love to hear ;-)

--
[ \ /
[ >X< Christian Fowler | http://www.steelsun.com/
[ / \

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2003-12-02 23:41:19 Re: Error in select
Previous Message Jiexin Luo 2003-12-02 23:21:01 How to download pgsql