Re: [PATCH] Support for Array ELEMENT Foreign Keys

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org, Andrew Dunstan <andrew(at)dunslane(dot)net>, Marco Nenciarini <marco(dot)nenciarini(at)2ndquadrant(dot)it>
Subject: Re: [PATCH] Support for Array ELEMENT Foreign Keys
Date: 2012-10-24 04:36:31
Message-ID: 16787.1351053391@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Now that it seems like we've got consensus on syntax, let's talk about
some implementation issues.

Ordinarily, the query executed during an insert or update on the
referencing table looks like, for example,

SELECT 1 FROM ONLY "public"."pk" x
WHERE "f1" OPERATOR(pg_catalog.=) $1 FOR SHARE OF x

where $1 is a parameter representing the referencing column's value.
This will find and lock the referenced row if there is one. (There
can't be more than one, because the equality constraint corresponds
to the unique index on the referenced column pk.f1.)

The proposed patch uses this if the referencing column is an array:

SELECT 1 WHERE
(SELECT pg_catalog.count(DISTINCT y) FROM pg_catalog.unnest($1) y)
OPERATOR(pg_catalog.=)
(SELECT pg_catalog.count(*) FROM
(SELECT 1 FROM ONLY "public"."pk" x
WHERE "f1" OPERATOR(pg_catalog.=) ANY ($1) FOR SHARE OF x) z)

In English, the idea is to find and lock all PK rows matching any
element of the array referencing value, and count them, and then see if
that's equal to the number of distinct non-null elements in the array
value. I find this pretty grotty. Quite aside from any aesthetic
concerns, it's broken because it presupposes that count(distinct y)
has exactly the same notion of equality that the PK unique index has.
In reality, count(distinct) will fall back to the default btree opclass
for the array element type. There might not be such an opclass, or it
might not be compatible with the PK unique index. This is not just an
academic concern: for instance, there are distinct values of the numeric
type that will compare equal to the same float8 PK value, because of the
limited precision of float comparison.

In my working copy of the patch, I've dealt with this by inserting a
creation-time restriction that the array element type has to have a
default btree opclass that is part of the PK index's opfamily. This
is not very desirable, because it means that some cases that are allowed
in plain FKs are disallowed in array FKs. Example:

regression=# create table ff (f1 float8 primary key);
CREATE TABLE
regression=# create table cc (f1 numeric references ff);
CREATE TABLE
regression=# create table cc2 (f1 numeric[], foreign key(each element of f1) references ff);
ERROR: foreign key constraint "cc2_f1_fkey" cannot be implemented
DETAIL: Key column "f1" has element type numeric which does not have a default btree operator class that's compatible with class "float8_ops".

So I'm looking for a better answer. One somewhat brute-force answer
is to not try to use = ANY at all in the RI test query, but instead
deconstruct the array value within the RI trigger and execute the
standard scalar locking query for each array element. One attraction
that would have is making it easier to produce a decent error message.
Right now, if you insert an array value that has an invalid element,
you get something like this:

regression=# create table pk (f1 int primary key);
CREATE TABLE
regression=# create table ref1 (f1 int[], foreign key(each element of f1) references pk);
CREATE TABLE
regression=# insert into pk values (1),(2);
INSERT 0 2
regression=# insert into ref1 values(array[1,2,5]);
ERROR: insert or update on table "ref1" violates foreign key constraint "ref1_f1_fkey"
DETAIL: Key (f1)=({1,2,5}) is not present in table "pk".

I don't find that too helpful even with three elements, and it would be
very much not helpful with hundreds. I would rather it told me that
"5" is the problem.

So that's the direction I was thinking of going in, but I wonder if
anyone has a better idea.

BTW, there is a second undesirable dependency on default opclass
semantics in the patch, which is that it supposes it can use array_eq()
to detect whether or not the referencing column has changed. But I
think that can be fixed without undue pain by providing a refactored
version of array_eq() that can be told which element-comparison function
to use.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Xiong He 2012-10-24 04:55:09 About the translation of po files into zh_CN (Simplified Chinese) in PG 9.2 release (branch)
Previous Message Alvaro Herrera 2012-10-24 01:07:15 proposed community service: make coverage