Re: Multi-table-unique-constraint

Lists: pgsql-hackerspgsql-patches
From: Matt Newell <newellm(at)blur(dot)com>
To: pgsql-patches(at)postgresql(dot)org
Subject: foreign key constraints referencing inherited relations - WIP
Date: 2005-11-10 21:45:00
Message-ID: 200511101345.00513.newellm@blur.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


Warning: This patch is only a work in work in progress.

Current Status:

This patch modifies postgres's current behavior by creating foreign-key
triggers on not only the source and referenced tables, but also the children
of the referenced table. It modifies the source triggers, removing the ONLY
keyword when looking up the referenced key.

Shortcomings:

The triggers are only created when the foreign key is created, so any new
tables that inherit from the referenced table will not have the triggers.
This can be fixed fairly easily.

When removing the ONLY clause on the foreign key checks, I also had to remove
the FOR SHARE OF x part. This was in the following functions: RI_FKey_check,
ri_Check_Pk_Match, RI_FKey_noaction_del, RI_FKey_noaction_upd,
RI_FKey_restrict_del. I don't really understand the significance of this.

There is no way to turn off the new behavior, which will likely cause
breakages in existing applications. If this behavior was made optional on a
per foreign key basis, with the current syntax being backwards compatible,
then this would not be an issue.

Inherited relations don't share primary key indexes, so there is the
possibility of duplicates. Can this be fixed by putting triggers on the
inherited tables to ensure they aren't duplicating a primary key?

Can/should indexes, constraints, triggers be shared between inherited tables?
Would this be a lot of work, and is it desired? Does it need to be
configurable per index, or per inheritance?

Matt Newell,
Blur Studio

Attachment Content-Type Size
inherit_fkey.diff text/x-diff 21.5 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Matt Newell <newellm(at)blur(dot)com>
Cc: pgsql-patches(at)postgresql(dot)org
Subject: Re: foreign key constraints referencing inherited relations - WIP
Date: 2005-11-10 23:44:19
Message-ID: 12952.1131666259@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Matt Newell <newellm(at)blur(dot)com> writes:
> When removing the ONLY clause on the foreign key checks, I also had to remove
> the FOR SHARE OF x part. This was in the following functions: RI_FKey_check,
> ri_Check_Pk_Match, RI_FKey_noaction_del, RI_FKey_noaction_upd,
> RI_FKey_restrict_del. I don't really understand the significance of this.

It means you broke it for concurrent-update cases; there's no interlock
preventing someone from dropping the PK row after you look at it
and before you commit the referencing row.

> Inherited relations don't share primary key indexes, so there is the
> possibility of duplicates.

Which means the whole thing doesn't work, really. Without a uniqueness
constraint across the set of PK rows, the semantics of FK constraints
are not well-defined. The multi-table-unique-constraint problem has to
be solved before we can even think much about multi-table FKs :-(

Without having read the patch in detail, I'm guessing that what you've
done might allow the referencing side of an FK constraint to be
inherited (since all this takes is copying down the triggers to the
child tables), but the referenced side still has to be a single table.

regards, tom lane


From: Matt Newell <newellm(at)blur(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Multi-table-unique-constraint
Date: 2005-11-11 18:23:01
Message-ID: 200511111023.01265.newellm@blur.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Thursday 10 November 2005 15:58, you wrote:
> >> The multi-table-unique-constraint problem has to
> >> be solved before we can even think much about multi-table FKs :-(
> >
> > Do you have ideas about how this should be solved?
>
> There's some discussions in the pghackers archives --- look for
> "multi-table index" and similar phrases. But if anyone had had
> a really decent plan, it would have got done by now :-(
>

Are multi-table indexes really required? After reading the code some more, I
came across this comment in nbtinsert.c:_bt_doinsert

* NOTE: obviously, _bt_check_unique can only detect keys that are already in
* the index; so it cannot defend against concurrent insertions of the
* same key. We protect against that by means of holding a write lock on
* the target page. Any other would-be inserter of the same key must
* acquire a write lock on the same target page, so only one would-be
* inserter can be making the check at one time. Furthermore, once we are
* past the check we hold write locks continuously until we have performed
* our insertion, so no later inserter can fail to see our insertion.
* (This requires some care in _bt_insertonpg.)

Would it be possible to make another routine that locates and aquires a write
lock on the page where the key would be inserted in each index(for each table
in the inheritance), and holds all these locks until the key is inserted into
the correct index. It seems this would solve the unique problem without
changing much else.

Matt


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Matt Newell <newellm(at)blur(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Multi-table-unique-constraint
Date: 2005-11-11 19:07:43
Message-ID: 20767.1131736063@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Matt Newell <newellm(at)blur(dot)com> writes:
> Would it be possible to make another routine that locates and aquires
> a write lock on the page where the key would be inserted in each
> index(for each table in the inheritance), and holds all these locks
> until the key is inserted into the correct index. It seems this would
> solve the unique problem without changing much else.

It's an idea, but you are now staring directly into the hornet's nest:

1. How do you avoid deadlock among multiple processes all doing the
above for similar (same page anyway) keys? It's difficult if not
impossible to ensure that they'll try to take the page locks in
the same order.

2. What happens when another process is adding/dropping indexes that
should be in the index set? In the normal scenario you don't have
any sort of lock on any of the other tables, only the one you are
trying to insert into; and so you have no defense against somebody
changing their schemas, up to and including dropping the index you
are fooling with. Adding such locks would increase the deadlock
hazard.

Also, for many scenarios (including FKs) it's important to be able to
*look up* a particular key, not only to prevent insertion of duplicates.
The above approach would require searching multiple indexes.

Most of the people who have thought about this have figured that the
right solution involves a single index spanning multiple tables (hence,
adding a table ID to the index entry headers in such indexes). This
fixes the lookup and entry problems, but it's not any help for the
lock-against-schema-mods problem, and it leaves you with a real headache
if you want to drop just one of the tables.

'Tis a hard problem :-(

regards, tom lane


From: Matt Newell <newellm(at)blur(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Multi-table-unique-constraint
Date: 2005-11-11 20:10:40
Message-ID: 200511111210.40495.newellm@blur.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Friday 11 November 2005 11:07, you wrote:

> It's an idea, but you are now staring directly into the hornet's nest:
>
> 1. How do you avoid deadlock among multiple processes all doing the
> above for similar (same page anyway) keys? It's difficult if not
> impossible to ensure that they'll try to take the page locks in
> the same order.
>
Isn't all that is required is that they iterate through the indexes in the
same order. This shouldn't be hard to do, because the set of indexes is the
same no matter what table you are inserting into, because the unique
constraint will apply to all tables both up and down the inheritance tree.
That order just needs to be stored somewhere.

What if there was a new system relation(pg_indexset) that stores an array of
index oids. Each index that is part of an index set has an fkey into this
table.

When aquiring the locks on the index pages, you must
a) have a ROW SHARE lock on the pg_indexset row for this set, this
ensures that the schema won't change from under us.

b) do so in the order that the index oids are in.

This solves the problem below also, because you would hold a row exclusive
lock on the row in this table whenever adding or removing indexes from the
set.

Now that i think about it some more, i realize that you only need to hold read
locks on the index pages that you don't plan on actually inserting a new key
into, which shouldn't cause near as much lock contention as holding write
locks on multiple indexes' pages.

> 2. What happens when another process is adding/dropping indexes that
> should be in the index set? In the normal scenario you don't have
> any sort of lock on any of the other tables, only the one you are
> trying to insert into; and so you have no defense against somebody
> changing their schemas, up to and including dropping the index you
> are fooling with. Adding such locks would increase the deadlock
> hazard.

> Also, for many scenarios (including FKs) it's important to be able to
> *look up* a particular key, not only to prevent insertion of duplicates.
> The above approach would require searching multiple indexes.
>
Why would this be required, if it currently isn't? I mean you can already do
Select from parent where key=X; and the planner takes care of scanning
multiple indexes(or sequence scans).

If it is required though, it should be no more difficult that doing what i
described above, right?

> Most of the people who have thought about this have figured that the
> right solution involves a single index spanning multiple tables (hence,
> adding a table ID to the index entry headers in such indexes). This
> fixes the lookup and entry problems, but it's not any help for the
> lock-against-schema-mods problem, and it leaves you with a real headache
> if you want to drop just one of the tables.
>
It seems that the above solution would be less work, and would keep the data
separate, which seems to be one of the biggest advantages of the current
inheritance design.

> 'Tis a hard problem :-(
I think that's why i'm interested:) I hope that I can succeed so as to not
have wasted your valuable time.

BTW, i'm on the list now, so no need to cc me.

Matt


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Matt Newell <newellm(at)blur(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Multi-table-unique-constraint
Date: 2005-11-11 21:10:29
Message-ID: 21554.1131743429@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Matt Newell <newellm(at)blur(dot)com> writes:
> On Friday 11 November 2005 11:07, you wrote:
>> 1. How do you avoid deadlock among multiple processes all doing the
>> above for similar (same page anyway) keys?

> Isn't all that is required is that they iterate through the indexes in the
> same order.

Yeah, I was thinking along the same lines. As long as any one index is
a member of at most one index set, this'd probably work. (Maybe you
wouldn't even need that restriction if you used a globally defined
ordering, such as always processing the indexes in order by their
pg_class OIDs.) Some concept of shared and exclusive locks on index
sets (extending only to the membership of the set, not to operations on
the individual member indexes) might fix the schema-change problem, too,
although you still need to think about whether there's a risk of
deadlocks for that. In the past we've figured that exclusively locking
a table is necessary and sufficient for schema alterations on that
table, but I'm not sure what to do for cross-table index sets.

> What if there was a new system relation(pg_indexset) that stores an array of
> index oids. Each index that is part of an index set has an fkey into this
> table.

I'd be inclined to think about using pg_inherits instead, ie, pretend
that the child table indexes are inheritance children of the parent
table index. If this is too inefficient, it suggests that we need to
fix pg_inherits anyway.

>> Also, for many scenarios (including FKs) it's important to be able to
>> *look up* a particular key, not only to prevent insertion of duplicates.
>> The above approach would require searching multiple indexes.
>>
> Why would this be required, if it currently isn't?

Well, because we're trying to do something that currently isn't possible?
It might not matter that we don't have a single instant at which we can
swear that the key is not present anywhere in the hierarchy, but I'm not
convinced that this is obviously true.

Your thought about leaving read locks on index pages while searching
other indexes might fix that, though, if it needs fixed at all.

>> Most of the people who have thought about this have figured that the
>> right solution involves a single index spanning multiple tables (hence,
>> adding a table ID to the index entry headers in such indexes).

> It seems that the above solution would be less work, and would keep the data
> separate, which seems to be one of the biggest advantages of the current
> inheritance design.

Yeah, I'm getting more attracted to the idea as I think about it. Not
so much because it keeps the data separate, as that it avoids needing to
store a table OID in index headers, which has been a principal objection
to cross-table indexes all along because of the space cost.

Probably the next thing to think about is how this would impact the
index AM API. I'm disinclined to want to put all of this logic inside
the index AMs, so somehow the "find and leave page write locked" behavior
would need to be exposed in the AM API. That ties into a larger goal of
not wanting the unique-index behavior to be totally the AM's
responsibility as it is right now --- I dislike the fact that nbtree is
responsible for reaching into the heap to test rows for liveness, for
instance. If we could separate that out a bit, it might make it easier
to support unique-index behavior in the other AMs.

> BTW, i'm on the list now, so no need to cc me.

Common practice around here is to cc people anyway --- this has grown
out of a history of occasionally-slow list mail delivery. If you don't
want it, best to fix it in your mail filters rather than expecting
people to change habits for you.

regards, tom lane


From: Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Matt Newell <newellm(at)blur(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Multi-table-unique-constraint
Date: 2005-11-13 06:15:14
Message-ID: 4376D9F2.4050807@familyhealth.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

> Most of the people who have thought about this have figured that the
> right solution involves a single index spanning multiple tables (hence,
> adding a table ID to the index entry headers in such indexes). This
> fixes the lookup and entry problems, but it's not any help for the
> lock-against-schema-mods problem, and it leaves you with a real headache
> if you want to drop just one of the tables.
>
> 'Tis a hard problem :-(

Maybe the solution is to make inherited tables actually the same table,
and jank it with an extra per-row attribute to differentiate them or
something :)

Might make constraint_exclusion less useful then.

Chris


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>
Cc: Matt Newell <newellm(at)blur(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Multi-table-unique-constraint
Date: 2005-11-13 15:28:28
Message-ID: 4185.1131895708@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au> writes:
> Maybe the solution is to make inherited tables actually the same table,
> and jank it with an extra per-row attribute to differentiate them or
> something :)

Aside from destroying the inheritance-for-partitioning stuff, this
wouldn't work for multiple inheritance, so I'm afraid it's not a very
attractive alternative.

Matt's idea about keeping the indexes separate seems that it probably
*would* work, modulo some lingering worries about when to take what kind
of lock on the index-set-as-a-whole. It seems worth pursuing, anyway.

regards, tom lane


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Matt Newell <newellm(at)blur(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Multi-table-unique-constraint
Date: 2005-11-13 16:12:19
Message-ID: 437765E3.10808@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Tom Lane wrote:

>Matt Newell <newellm(at)blur(dot)com> writes:
>
>
>>BTW, i'm on the list now, so no need to cc me.
>>
>>
>
>Common practice around here is to cc people anyway --- this has grown
>out of a history of occasionally-slow list mail delivery. If you don't
>want it, best to fix it in your mail filters rather than expecting
>people to change habits for you.
>
>
>
>

You can also change your subscriptions so that you don't get a copy from
the list if you are in the To: or Cc: lists. I find this is better then
having to set up filters. Visit
http://mail.postgresql.org/mj/mj_wwwusr/domain=postgresql.org

cheers

andrew