Re: foreign keys for array/period contains relationships

Lists: pgsql-hackers
From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: pgsql-hackers(at)postgresql(dot)org
Subject: foreign keys for array/period contains relationships
Date: 2010-10-25 19:11:16
Message-ID: 1288033876.6278.6.camel@vanquo.pezone.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Currently, foreign keys only work with the = operator (the name might be
different, but it needs to behave like equality). I'm thinking there
are other scenarios that could be useful, for example with arrays and
range types.

Example #1: Foreign key side is an array, every member must match some
PK.

CREATE TABLE pk (a int PRIMARKY KEY, ...);

CREATE TABLE fk (x int[] REFERENCES pk (a), ...);

Example #2: Foreign key side as a (hypothetical) period type, PK is a
timestamp. Every FK period must contain a PK timestamp.

CREATE TABLE pk (a timestamp PRIMARY KEY, ...);

CREATE TABLE fk (x period/range of timestamp REFERENCES pk (a), ...);

Implementing the foreign key side of this merely requires the system to
have some knowledge of the required "contains" operator, which it does
in the array case, and something can surely be arranged for the range
case. The problem is you can't do cascading updates or deletes, but you
could do on update/delete restrict, which is still useful.

It get's more interesting when the "container" type is the primary key:

Example #3: PK is array, FK is element type. FK must be element of some
PK array.

CREATE TABLE pk (a int[] PRIMARY KEY, ...);

CREATE TABLE fk (x int REFERENCES pk (a), ...);

Example #4: PK is period, FK is timestamp. FK must be contained in some
PK period.

CREATE TABLE pk (a period PRIMARY KEY, ...);

CREATE TABLE fk (x timestamp REFERENCES pk (a), ...);

As above, we can probably arrange the operator knowledge to make these
checks. But I think additionally, you'd need an exclusion constraint on
the PK side to ensure nonoverlapping arrays/periods so that on
update/delete restrict as well as cascading deletes work.

Additional interesting examples involve IP network containment using
inet/cidr or ip4/ip4r. There, you'd probably need additional syntax to
tell the system explicitly which operators to use.

Now I originally arrived at this issue via Example #1, but it appeared
to me that with the ongoing work on range types, Example #4 would be a
very eminent use case.

Is this sort of thing feasible? Has anyone done more research into the
necessary details?


From: Greg Stark <gsstark(at)mit(dot)edu>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: foreign keys for array/period contains relationships
Date: 2010-10-25 19:34:45
Message-ID: AANLkTimcs8eViVSkMqmGQXSr1j1=pbsRYnZ1aNsFyCFZ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Oct 25, 2010 at 12:11 PM, Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:
> Is this sort of thing feasible?  Has anyone done more research into the
> necessary details?

I think the problems arise when you try to figure out what records you
need to lock to prevent someone from deleting the referenced rows
before you commit.

--
greg


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: foreign keys for array/period contains relationships
Date: 2010-10-25 19:43:05
Message-ID: AANLkTimcWAesj+cDh62qxXKuzrvA--+ULJa8i-kSr=Pn@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Oct 25, 2010 at 3:11 PM, Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:
> Currently, foreign keys only work with the = operator (the name might be
> different, but it needs to behave like equality).  I'm thinking there
> are other scenarios that could be useful, for example with arrays and
> range types.
>
> Example #1: Foreign key side is an array, every member must match some
> PK.
>
> CREATE TABLE pk (a int PRIMARKY KEY, ...);
>
> CREATE TABLE fk (x int[] REFERENCES pk (a), ...);

I've wished for this before when doing app dev with PG. I think it
would be pretty neat. The other cases seem potentially useful, too,
but especially this one.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: foreign keys for array/period contains relationships
Date: 2010-10-25 20:10:21
Message-ID: AANLkTimm9BSP-4Qb3HYtGVm8npM311YVd_Rg21dHumWf@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2010/10/25 Robert Haas <robertmhaas(at)gmail(dot)com>:
> On Mon, Oct 25, 2010 at 3:11 PM, Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:
>> Currently, foreign keys only work with the = operator (the name might be
>> different, but it needs to behave like equality).  I'm thinking there
>> are other scenarios that could be useful, for example with arrays and
>> range types.
>>
>> Example #1: Foreign key side is an array, every member must match some
>> PK.
>>
>> CREATE TABLE pk (a int PRIMARKY KEY, ...);
>>
>> CREATE TABLE fk (x int[] REFERENCES pk (a), ...);

What about optimalizations and planning? This is classic sample how
don't use a arrays?

Regards

Pavel

>
> I've wished for this before when doing app dev with PG.  I think it
> would be pretty neat.  The other cases seem potentially useful, too,
> but especially this one.
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: foreign keys for array/period contains relationships
Date: 2010-10-26 00:24:47
Message-ID: 1288052687.11160.2.camel@jdavis-ux.asterdata.local
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, 2010-10-25 at 12:34 -0700, Greg Stark wrote:
> On Mon, Oct 25, 2010 at 12:11 PM, Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:
> > Is this sort of thing feasible? Has anyone done more research into the
> > necessary details?
>
> I think the problems arise when you try to figure out what records you
> need to lock to prevent someone from deleting the referenced rows
> before you commit.

I think that's easier when the PK must contain the FK, because then you
only need to lock one record. Even when you need to lock multiple
records, it seems feasible, and is just an index lookup, right? Do you
see a particular problem?

Regards,
Jeff Davis


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: foreign keys for array/period contains relationships
Date: 2010-10-26 00:38:02
Message-ID: 1288053482.11412.13.camel@jdavis-ux.asterdata.local
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, 2010-10-25 at 22:11 +0300, Peter Eisentraut wrote:
> Currently, foreign keys only work with the = operator (the name might be
> different, but it needs to behave like equality). I'm thinking there
> are other scenarios that could be useful, for example with arrays and
> range types.

I agree completely. I had not previously considered that arrays could
benefit from this idea as well as range types. Mentally, I had already
been calling them "foreign range keys" ;)

> Implementing the foreign key side of this merely requires the system to
> have some knowledge of the required "contains" operator, which it does
> in the array case, and something can surely be arranged for the range
> case. The problem is you can't do cascading updates or deletes, but you
> could do on update/delete restrict, which is still useful.

Why can't you do cascading updates/deletes?

> Is this sort of thing feasible? Has anyone done more research into the
> necessary details?

Yes, I think so. #3 and #4 are very feasible. #1 and #2 are, as well,
unless I'm missing something.

Regards,
Jeff Davis


From: Greg Stark <gsstark(at)mit(dot)edu>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: foreign keys for array/period contains relationships
Date: 2010-10-26 00:57:54
Message-ID: AANLkTi=1h3=pSB8JCKpsNaT5xt2Kq_QvFEnCrku1yhUT@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Oct 25, 2010 at 5:24 PM, Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
> I think that's easier when the PK must contain the FK, because then you
> only need to lock one record. Even when you need to lock multiple
> records, it seems feasible, and is just an index lookup, right? Do you
> see a particular problem?

Well if you lock multiple records then it's not clear what operations
you should conflict with. Removing any one of them wouldn't actually
invalidate the foreign key reference unless you remove the last one.

I always assumed this was why we require the unique constraint at all.
Otherwise we could just do a sequential scan and lock all the matching
records. but we would be preventing someone from removing those
records even though removing just one wouldn't be breaking the
constraint.

--
greg


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: foreign keys for array/period contains relationships
Date: 2010-10-26 17:22:17
Message-ID: 1288113737.22800.1.camel@vanquo.pezone.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On mån, 2010-10-25 at 22:10 +0200, Pavel Stehule wrote:
> 2010/10/25 Robert Haas <robertmhaas(at)gmail(dot)com>:
> >> Example #1: Foreign key side is an array, every member must match some
> >> PK.
> >>
> >> CREATE TABLE pk (a int PRIMARKY KEY, ...);
> >>
> >> CREATE TABLE fk (x int[] REFERENCES pk (a), ...);
>
> What about optimalizations and planning?

Some work will be required to get the respective checking queries to do
the right think fast, but it's solvable in principle.


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: foreign keys for array/period contains relationships
Date: 2010-10-26 17:25:24
Message-ID: 1288113924.22800.4.camel@vanquo.pezone.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On mån, 2010-10-25 at 17:38 -0700, Jeff Davis wrote:
> > Implementing the foreign key side of this merely requires the system
> to
> > have some knowledge of the required "contains" operator, which it
> does
> > in the array case, and something can surely be arranged for the
> range
> > case. The problem is you can't do cascading updates or deletes, but
> you
> > could do on update/delete restrict, which is still useful.
>
> Why can't you do cascading updates/deletes?

Let's say you have

PK

1
2
3
4
5

FK

[1,2,3]
[3,4,5]
[4,4,4]

When you delete PK = 3, what do you expect to happen? OK, you might
decide to delete the first two rows from the FK table. This might or
might not make sense in a particular case, but on delete cascade is an
option the user has to choose explicitly. But I don't see what to do
about cascading an update when you, say, update PK 1 => 6.


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: foreign keys for array/period contains relationships
Date: 2010-10-26 17:27:34
Message-ID: 1288114054.22800.5.camel@vanquo.pezone.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On mån, 2010-10-25 at 17:57 -0700, Greg Stark wrote:
> Well if you lock multiple records then it's not clear what operations
> you should conflict with. Removing any one of them wouldn't actually
> invalidate the foreign key reference unless you remove the last one.
>
> I always assumed this was why we require the unique constraint at all.

I did mention that you would need an exclusion constraint in some of the
cases, to get an effect analogous to unique constraints.


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: foreign keys for array/period contains relationships
Date: 2010-10-26 18:53:27
Message-ID: 1288119207.15279.24.camel@jdavis-ux.asterdata.local
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, 2010-10-26 at 20:25 +0300, Peter Eisentraut wrote:
> Let's say you have
>
> PK
>
> 1
> 2
> 3
> 4
> 5
>
> FK
>
> [1,2,3]
> [3,4,5]
> [4,4,4]
>
> When you delete PK = 3, what do you expect to happen? OK, you might
> decide to delete the first two rows from the FK table. This might or
> might not make sense in a particular case, but on delete cascade is an
> option the user has to choose explicitly.

That's what I would expect.

> But I don't see what to do
> about cascading an update when you, say, update PK 1 => 6.

Intuitively, I would expect all 1's to be replaced by 6's in all arrays.
But I can now see why you would be hesitant to try to support that.

Regards,
Jeff Davis


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: foreign keys for array/period contains relationships
Date: 2010-10-26 18:53:42
Message-ID: 1288119222.15279.26.camel@jdavis-ux.asterdata.local
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, 2010-10-25 at 17:57 -0700, Greg Stark wrote:
> On Mon, Oct 25, 2010 at 5:24 PM, Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
> > I think that's easier when the PK must contain the FK, because then you
> > only need to lock one record. Even when you need to lock multiple
> > records, it seems feasible, and is just an index lookup, right? Do you
> > see a particular problem?
>
> Well if you lock multiple records then it's not clear what operations
> you should conflict with. Removing any one of them wouldn't actually
> invalidate the foreign key reference unless you remove the last one.

I didn't word my statement clearly. If the PK contains the FK, and you
have an Exclusion Constraint on the PK (as Peter suggested), then you
only need to lock one record. I think that logic would be pretty much
the same as a normal FK.

The case where you might need to lock multiple records is when the FK
contains the PK (case #1 in Peter's original email). But in that case,
you would still have a UNIQUE constraint on the PK (right?) and removing
any referenced element should cause a conflict.

Case #2 is the strange one, I think. It's not actually just an
adaptation of #1. #1 requires that all elements of the array have a
corresponding PK value; but #2 just requires that one of them does.
Peter, can you clarify case #2? Did you have a use case in mind?

Regards,
Jeff Davis


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Greg Stark <gsstark(at)mit(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: foreign keys for array/period contains relationships
Date: 2010-10-26 19:09:01
Message-ID: 1288120141.22800.8.camel@vanquo.pezone.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On tis, 2010-10-26 at 11:53 -0700, Jeff Davis wrote:
> Case #2 is the strange one, I think. It's not actually just an
> adaptation of #1. #1 requires that all elements of the array have a
> corresponding PK value; but #2 just requires that one of them does.
> Peter, can you clarify case #2? Did you have a use case in mind?

[ That's the period references timestamp case. ]

You're right, it's probably not useful.


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: foreign keys for array/period contains relationships
Date: 2010-10-27 20:53:37
Message-ID: 4CC89151.6040204@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 10/26/10 11:53 AM, Jeff Davis wrote:
> Intuitively, I would expect all 1's to be replaced by 6's in all arrays.
> But I can now see why you would be hesitant to try to support that.

If people want cascading update to work, they shouldn't be denormalizing.

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com


From: Rod Taylor <rod(dot)taylor(at)gmail(dot)com>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: foreign keys for array/period contains relationships
Date: 2011-03-20 15:18:06
Message-ID: AANLkTikeVH3UFDSbdWA7R0_Oor3ssL_s8424_TbQxry3@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Oct 25, 2010 at 15:11, Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:

> Example #4: PK is period, FK is timestamp. FK must be contained in some
> PK period.
>
> CREATE TABLE pk (a period PRIMARY KEY, ...);
>
> CREATE TABLE fk (x timestamp REFERENCES pk (a), ...);
>
> As above, we can probably arrange the operator knowledge to make these
> checks. But I think additionally, you'd need an exclusion constraint on
> the PK side to ensure nonoverlapping arrays/periods so that on
> update/delete restrict as well as cascading deletes work.
>

Additional interesting examples involve IP network containment using
> inet/cidr or ip4/ip4r. There, you'd probably need additional syntax to
> tell the system explicitly which operators to use.
>

There are a large number of use-cases for this type of foreign key with
geometry ( PostGIS ) types as well. Point references Area or Line, Area
references Area, etc.


From: Andrew Tipton <andrew(dot)t(dot)tipton(at)gmail(dot)com>
To: Rod Taylor <rod(dot)taylor(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: foreign keys for array/period contains relationships
Date: 2011-03-20 20:17:27
Message-ID: AANLkTikTgCRKMDr+EuHDGYqE=CubRvc43nJ7HvbdpFMk@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

>
> On Mon, Oct 25, 2010 at 15:11, Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:
>
>> Example #4: PK is period, FK is timestamp. FK must be contained in some
>> PK period.
>>
>> CREATE TABLE pk (a period PRIMARY KEY, ...);
>>
>> CREATE TABLE fk (x timestamp REFERENCES pk (a), ...);
>>
>> As above, we can probably arrange the operator knowledge to make these
>> checks. But I think additionally, you'd need an exclusion constraint on
>> the PK side to ensure nonoverlapping arrays/periods so that on
>> update/delete restrict as well as cascading deletes work.
>>
>
> Additional interesting examples involve IP network containment using
>> inet/cidr or ip4/ip4r. There, you'd probably need additional syntax to
>> tell the system explicitly which operators to use.
>>
>
> There are a large number of use-cases for this type of foreign key with
> geometry ( PostGIS ) types as well. Point references Area or Line, Area
> references Area, etc.
>

You may be interested in an experiment I did late last year, where I did a
bit of playing around in the system catalogs to create this kind of
relationship. Turns out that the RI infrastructure stores the oid of the
equality operators it uses in pg_constraint, and after creating a normal
foreign key constraint it can be updated to change these operators. Of
course, Bad Things can probably happen if you violate assumptions that the
RI code depends on. I suspect that the most important one is that a child
row must reference exactly one parent row.

For establishing a point-contained-in-box relationship, the parent-eq-parent
operator is BOX && BOX and the child-eq-parent operator is POINT <@ BOX.
The parent-eq-child operator is BOX @> POINT, which is the commutator of
child-eq-parent. Declaring an exclusion constraint on the parent column
using the && operator guarantees that the <@ operator can only match a
single parent row.

If we're able to teach Postgres about these operator relationships -- that
is, && combined with <@ satisfies the restriction that a child row can only
reference one parent row -- extending the RI creation code to support this
kind of a relationship looks to be fairly straightforward.

(I've attached a small .sql script which demonstrates this for the POINT <@
BOX case.)

-Andrew

Attachment Content-Type Size
rihack.sql text/x-sql 2.0 KB