Re: Table Design for Hierarchical Data

From: Achilleas Mantzios <achill(at)matrix(dot)gatewaynet(dot)com>
To: pgsql-sql(at)postgresql(dot)org
Subject: Re: Table Design for Hierarchical Data
Date: 2010-04-07 07:00:14
Message-ID: 201004071000.15529.achill@matrix.gatewaynet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

You could also consider the genealogical approach, e.g.

postgres(at)dynacom=# \d paintgentypes
Table "public.paintgentypes"
Column | Type | Modifiers
---------+-----------+---------------------------------------------------------------------------
id | integer | not null default nextval(('public.paintgentypes_id_seq'::text)::regclass)
name | text | not null
parents | integer[] |
Indexes:
"paintgentypes_pkey" PRIMARY KEY, btree (id)
"paintgentypes_name2" UNIQUE, btree (name) WHERE parents IS NULL
"paintgentypes_uk" UNIQUE, btree (name, parents)
"paintgentypes_first" btree (first(parents))
"paintgentypes_last" btree (last(parents))
"paintgentypes_level" btree (level(parents))
"paintgentypes_name" btree (name)
"paintgentypes_parents" gin (parents gin__int_ops)

The indexes are based on the contrib/intarray package.
It is very fast to do any operation on this tree.
Also it is very fast to search for the parent of any node, or the
children of any node, or the whole subtree of any node, or the depth of any node in the tree.

The parents of any node to the root, i.e. the path of any node to the root are depicted as
parents[0] : immediate parent
parents[1] : immediate parent of the above parent
.....
parents[n] : root of the tree

Στις Tuesday 06 April 2010 20:33:18 ο/η Lee Hachadoorian έγραψε:
> Please point me to another listserv or forum if this question is more
> appropriately addressed elsewhere.
>
> I am trying to come up with a structure to store employment data by NAICS
> (North American Industrial Classification System). The data uses a
> hierarchical encoding scheme ranging between 2 and 5 digits. That is, each
> 2-digit code includes all industries beginning with the same two digits. 61
> includes 611 which includes 6111, 6112, 6113, etc. A portion of the
> hierarchy is shown after the sig.
>
> A standard way to store hierarchical data is the adjacency list model, where
> each node's parent appears as an attribute (table column). So 6111 would
> list 611 as its parent. Since NAICS uses a hierarchical encoding scheme, the
> node's name is the same as the node's id, and the parent can always be
> derived from the node's id. Storing the parent id separately would seem to
> violate a normal form (because of the redundancy).
>
> One way to store this data would be to store at the most granular level
> (5-digit NAICS) and then aggregate up if I wanted employment at the 4-, 3-,
> or 2-digit level. The problem is that because of nondisclosure rules, the
> data is sometimes censored at the more specific level. I might, for example,
> have data for 6114, but not 61141, 61142, 61143. For a different branch of
> the tree, I might have data at the 5-digit level while for yet another
> branch I might have data only to the 3-digit level (not 4 or 5). I think
> that means I have to store all data at multiple levels, even if some of the
> higher-level data could be reconstructed from other, lower-level data.
>
> Specifically I'd like to know if this should be a single table or should
> there be a separate table for each level of the hierarchy (four in all)? If
> one table, should the digits be broken into separate columns? Should parent
> ids be stored in each node?
>
> More generally, what questions should I be asking to help decide what
> structure makes the most sense? Are there any websites, forums, or books
> that cover this kind of problem?
>
> Regards,
> --Lee
>

--
Achilleas Mantzios

In response to

Responses

Browse pgsql-sql by date

  From Date Subject
Next Message silly sad 2010-04-07 07:53:00 Re: Table Design for Hierarchical Data
Previous Message Scott Marlowe 2010-04-07 06:17:55 Re: Table Design for Hierarchical Data