Distributed keys

From: Michael Glaesemann <grzm(at)myrealbox(dot)com>
To: 'pgsql-sql(at)postgresql(dot)org' <pgsql-sql(at)postgresql(dot)org>
Subject: Distributed keys
Date: 2003-12-23 21:42:24
Message-ID: E537DC48-3590-11D8-97F0-000A95C88220@myrealbox.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

Hello all!

An area of the relational database model that is not included in the
SQL standard are distributed keys and distributed foreign keys. A quick
example (Case 1):

employees (id, name);
schools (name, location);
teachers (employee_id, school_name);
subs (employee_id);
managers (employee_id, school_name);

with constraints
unique employees(id)
teachers(employee_id) references employees(id)
teachers(school_name) references schools(name)
subs(employee_id) references employees(id)
managers(employee_id) references employees(id)

The idea is that employees must be in one (and only one) of either
teachers, subs, or managers. Currently, this might be represented in
something like (Case 2)

employees (id, name, employee_type, school_name);
schools (name, location);
employee_types (type);

with constraints
employees(employee_type) references employee_types(type)
employees(school_name) references schools(name)

where employee_types includes "teacher", "sub", and "manager"

Or it might be represented with a number of rules or triggers that
perform all of the necessary checking.

employees(school_name) can't have a not null constraint because if the
employee is a sub, they aren't associated with a school.

Using the terms "distributed key" and "foreign distributed key", in the
first case employee_id is a "distributed key" in that it must occur in
only one of the tables teachers, subs, or managers. Distributed keys
are similar in concept to primary keys—they must be unique. This
guarantees an employee_id in teachers is not found in subs or managers,
an employee_id in subs is not found in managers or teachers, and an
employee_id in managers is not found in subs or teachers.

employees(id) is a foreign distributed key in teachers, subs, and
managers (as employee_id). Foreign distributed keys are similar in
concept to foreign keys in that employees(id) must be referenced by a
single tuple in one of teachers, subs, or managers.

Another use would be in this situation (something I'm working on right
now): I want to link comments by employees by employee_id, but comments
from non-employees by name (as they don't have an id).

comments(id, comment);
comments_nonemployees(comment_id, name);
comments_employees(comment_id, employee_id);

with constraints
comments_nonemployees(comment_id) references comments(id)
comments_employees(comment_id) references comments(id)
and comments(id) must be listed in either
comments_nonemployees(comment_id) or comments_employees(comment_id)

I haven't looked very far into how to implement distributed keys and
foreign distributed keys in PostgreSQL beyond briefly looking at the
pg_constraint system table, thinking a distributed key would be
something making employee_id unique in teachers(employee_id) UNION
subs(employee_id) UNION managers(employee_id). A distributed key is
distributed over a number of tables, rather than a single one, so
there'd have to be a list of relid-attnum pairs, rather than a single
relid-attnum pair, such as conrelid and conkey in pg_constraint. Here's
a brief sketch of the idea:

pg_distributed
distname name the name of the distributed key constraint
distrelid oid the relid of one of the tables involved in the
distributed keys
distkey int2[] a list of the attnum of the columns of the table with
oid distrelid involved in the distributed key
distforkey bool true if foreign distributed key
distfrelid oid if a foreign distributed key, the relid of the the
referenced table, else 0
distfkey int2[] if a foreign distributed key, a list of the attnum of
the columns of the table with oid distfrelid referenced by the foreign
distributed key, else 0

In pg_distributed, distname, distfrelid, distfkey would be the same in
every tuple involved in the distributed key, while distrelid and
distkey would vary. Basically a one-to-many relation. These are just
some thoughts, and the first time I've looked at the PostgreSQL
internals, so I wouldn't be suprised if this approach is wrongheaded.

I know PostgreSQL endeavors to adhere closely to the SQL standard, and
I think this is important as SQL *is* a standard. One of the things
that makes PostgreSQL great in my opinion is that it goes beyond the
standard in some areas where users and developers have found useful,
such as the PostgreSQL rule system. I think distributed key support
would extend the usefulness of PostgreSQL without hindering SQL
conformance.

I'm interested in hearing what others have to say, especially along the
lines of implementation. I haven't seen much discussion of distributed
keys on the lists (other than Josh Berkus), so perhaps there isn't much
interest. Would there be any foreseeable opposition if I (and/or
others) worked on this?

Regards,

Michael Glaesemann
grzm myrealbox com

Responses

Browse pgsql-sql by date

  From Date Subject
Next Message Tom Lane 2003-12-23 23:39:27 Re: Distributed keys
Previous Message Chris Gamache 2003-12-23 16:46:31 Historic Query using a view/function ?