Re: complex referential integrity constraints

From: "Robert Haas" <Robert(dot)Haas(at)dyntek(dot)com>
To: "Joris Dobbelsteen" <Joris(at)familiedobbelsteen(dot)nl>, "elein" <elein(at)varlena(dot)com>
Cc: <pgsql-general(at)postgresql(dot)org>
Subject: Re: complex referential integrity constraints
Date: 2007-02-22 14:57:58
Message-ID: 57653AD4C1743546B3EE80B21262E5CB11A856@EXCH01.ds.local
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

The ability to make a foreign key reference a specific partial unique
index (rather than just a set of columns that have a unique index) would
solve many problems of this type. As another example, you might have a
table where one of the columns is "is_deleted boolean not null". By
creating a partial unique index on the primary key of that table "WHERE
NOT is_deleted" and then pointing a foreign key at it, you could enforce
that each row in the child table references a parent who isn't deleted.

However, this would break down when there's more one than intermediate
step involved. For example, if you have:

CREATE TABLE animal_type (
id serial,
name varchar(60) not null,
is_attacker boolean not null,
primary key (id)
);

CREATE TABLE animal (
id serial,
type_id integer not null references animal_type (id),
name varchar(60) not null,
primary key (id)
);

CREATE TABLE mauling (
id serial,
attacker_id integer not null references animal (id),
victim_id integer not null references animal (id),
attack_time timestamp with time zone not null,
primary key (id)
);

It would be easy to enforce the constraint that the attacker must be an
animal of some specific type, but difficult to enforce the constraint
that the attacker must be an animal whose type, in turn, has a true
value for is_attacker.

The best idea that I can think of right now to handle multiple levels of
tables is to allow FOREIGN KEY constraints to references a VIEW, rather
than a table. Then you could say:

CREATE VIEW attackers AS
SELECT a.id FROM animal a, animal_type t WHERE a.type_id = t.id AND
t.attacker;

...and then FOREIGN KEY (attacker_id) REFERENCES attackers (id).

This syntax would solve a number of other problems as well, such as
requiring that some record in table A has a parent either in table P or
in table Q. However, I think this would probably require implementing
some kind of materialized view so that you could actually build an index
on the view, and that opens up a whole new can of worms, because it's
not very difficult to define a view that is costly to update
incrementally.

The problem is really that there is a pretty large gap between writing a
foreign key constraint, which is trivial, and enforcing a constraint
using triggers, which is quite a bit more complex (and therefore, easy
to screw up), because the foreign key automatically handles all the
cases (insert into child table, update of child table, update of parent
table, delete from parent table) whereas with triggers you have to
address each of those cases individually. Unfortunately, something
tells me that implementing a more powerful system for foreign key
constraints is a non-trivial project, however useful it would be.
Still, I'd love to see it in the TODO file, too.

...Robert

-----Original Message-----
From: Joris Dobbelsteen [mailto:Joris(at)familiedobbelsteen(dot)nl]
Sent: Thursday, February 22, 2007 8:03 AM
To: Robert Haas; elein
Cc: pgsql-general(at)postgresql(dot)org
Subject: RE: [GENERAL] complex referential integrity constraints

I partially agree:
If people CAN do stupid things, they are 'clever' enough to find a way
to actually do it. I've seen them destroy things, by just using a system
in a way it was not intended. They effectively found a way to blow away
the very thing that part was designed for.
But indeed, it's a lot of work, especially if the number of tables that
must be referenced increases. I'm a strong supporter for ensuring
consistency. Postgres has what it takes to do the job, but it doesn't
make my life a lot easier. But it seems to be as good as it gets
today...

Perhaps we should rather define a 'database' constraint in the order of:
"For every mauling, the attacking animal must be of the attacker type"
(in a computer understandable manner). From the set theory this should
be possible without too much problems, However doing so efficiently
might be slightly harder.
This might be a fun project and useful for the TODO list. At least it
makes it a lot easier (and maintanable) to enforce database-wide
constraints.

- Joris

>-----Original Message-----
>From: Robert Haas [mailto:Robert(dot)Haas(at)dyntek(dot)com]
>Sent: woensdag 21 februari 2007 3:37
>To: Joris Dobbelsteen; elein
>Cc: pgsql-general(at)postgresql(dot)org
>Subject: RE: [GENERAL] complex referential integrity constraints
>
>Yes, exactly. And while you might not care about all of those
>(e.g. I care about the first two but am not worried about the
>third one because I'm the only one who will ever update that
>table), writing multiple triggers to enforce each constraint
>of this type quickly gets old if there are even a few of them.
> It is exponentially harder to write a constraint of this type
>than it is to write a simple foreign key constraint.
>
>...Robert
>
>-----Original Message-----
>From: Joris Dobbelsteen [mailto:Joris(at)familiedobbelsteen(dot)nl]
>Sent: Monday, February 19, 2007 5:59 AM
>To: elein; Robert Haas
>Cc: pgsql-general(at)postgresql(dot)org
>Subject: RE: [GENERAL] complex referential integrity constraints
>
>>Why don't you add a field in animal_types that is boolean mauler.
>>Then you can add a trigger on the mauling table to raise an
>error when
>>the attacker_id is an animal type mauler.
>
>This is only partial. You need a lot more triggers to
>guarentee the constraints are enforced.
>Precisely you need to validate:
>* mauling on insert/update of attacker_id
>* animal on update of type_id
>* animal_type on update of your property
>
>Of course you need to think about the MVCC model, such that:
>Transaction 1 executes
>INSERT INTO mauling VALUES ('someattacker'), Transaction 2
>executes UPDATE animal_type SET mauler = false WHERE name =
>'someattacker', such that both transaction happen in parallel.
>
>This is perfectly possible and will make it possible to
>violate the constraint, UNLESS locking of the tuples is done correctly.
>
>These contraints are not trivial to implement (unfortunally).
>It would be great if they where.
>
>- Joris
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Chris Coleman 2007-02-22 15:01:59 Slony subscription problem
Previous Message Teodor Sigaev 2007-02-22 14:56:58 Re: tsearch2: word position