Re: [PATCH] Support for foreign keys with arrays

From: Noah Misch <noah(at)leadboat(dot)com>
To: Marco Nenciarini <marco(dot)nenciarini(at)2ndQuadrant(dot)it>
Cc: pgsql-hackers(at)postgresql(dot)org, gabriele(dot)bartolini(at)2ndQuadrant(dot)it
Subject: Re: [PATCH] Support for foreign keys with arrays
Date: 2012-03-17 03:33:12
Message-ID: 20120317033312.GB19556@tornado.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

[used followup EACH-foreign-key.v4b.patch.bz2 for review]

On Wed, Mar 14, 2012 at 07:03:08PM +0100, Marco Nenciarini wrote:
> please find attached v4 of the EACH Foreign Key patch (formerly known
> also as Foreign Key Array).

> On Fri, Feb 24, 2012 at 09:01:35PM -0500, Noah Misch wrote:

> > > We can use multi-dimensional arrays as well as referencing columns. In
> > > that case though, ON DELETE EACH CASCADE will behave like ON DELETE EACH
> > > SET NULL. This is a safe way of implementing the action.
> > > We have some ideas on how to implement this, but we feel it is better to
> > > limit the behaviour for now.
> >
> > This still feels awfully unprincipled to me. How about just throwing an error
> > when we need to remove an element from a multidimensional array? Essentially,
> > have ON DELETE EACH CASCADE downgrade to ON DELETE RESTRICT when it encounters
> > a multidimensional array. That's strictly less code than what you have now,
> > and it keeps our options open. We can always change from error to set-null
> > later, but we can't so readily change from set-null to anything else.
>
> That seems reasonable to me. Implemented and documented.

I like the semantics now. You implemented this by doing two scans per PK
delete, the first to detect multidimensional dependants of the PK row and the
second to fix 1-dimensional dependants. We don't require an index to support
the scan, so this can mean two seq scans. Currently, the only benefit to
doing it this way is a change of error message. Here is the current error
message when we would need to remove a multidimensional array element:

ERROR: update or delete on table "pktableforarray" violates foreign key constraint "fktableforarray_ftest1_fkey" on table "fktableforarray"
DETAIL: Key (ptest1)=(2) is still referenced from table "fktableforarray".

If I just rip out the first scan, we get the same outcome with a different
error message:

ERROR: removing elements from multidimensional arrays is not supported
CONTEXT: SQL statement "UPDATE ONLY "public"."fktableforarray" SET "ftest1" = array_remove("ftest1", $1) WHERE $1 OPERATOR(pg_catalog.=) ANY ("ftest1")"

That has less polish, but I think it's actually more useful. The first
message gives no indication that a multidimensional array foiled your ON
DELETE EACH CASCADE. The second message hints at that cause.

I recommend removing the new block of code in RI_FKey_eachcascade_del() and
letting array_remove() throw the error. If you find a way to throw a nicer
error without an extra scan, by all means submit that to a future CF as an
improvement. I don't think it's important enough to justify cycles at this
late hour of the current CF.

> > As I pondered this patch again, I came upon formal hazards around non-default
> > operator classes. Today, ATAddForeignKeyConstraint() always chooses pfeqop,
> > ffeqop and ppeqop in the same operator family. If it neglected this, we would
> > get wrong behavior when one of the operators is sensitive to a change to which
> > another is insensitive. For EACH FK constraints, this patch sets ffeqop =
> > ARRAY_EQ_OP unconditionally. array_eq() uses the TYPECACHE_EQ_OPR (usually
> > from the default B-tree operator class). That operator may not exist at all,
> > let alone share an operator family with pfeqop. Shall we deal with this by
> > retrieving TYPECACHE_EQ_OPR in ATAddForeignKeyConstraint() and rejecting the
> > constraint creation if it does not share an operator family with pfeqop? The
> > same hazard affects ri_triggers.c use of array_remove() and array_replace(),
> > and the same change mitigates it.
>
> Check added. As a consequence of this stricter policy, certain
> previously allowed cases are now unacceptable (e.g pk FLOAT and fk
> INT[]).
> Regression tests have been added.

Ah, good. Not so formal after all.

> > pg_constraint.confeach needs documentation.
>
> Most of pg_constraint columns, including all the boolean ones, are
> documented only in the "description" column of
>
> http://www.postgresql.org/docs/9.1/static/catalog-pg-constraint.html#AEN85760
>
> it seems that confiseach should not be an exception, since it just
> indicates whether the constraint is of a certain kind or not.

Your patch adds two columns to pg_constraint, confiseach and confeach, but it
only mentions confiseach in documentation. Just add a similar minimal mention
of confeach.

> > > ***************
> > > *** 5736,5741 **** ATAddForeignKeyConstraint(AlteredTableInfo *tab, Relation rel,
> > > --- 5759,5807 ----
> > > (errcode(ERRCODE_INVALID_FOREIGN_KEY),
> > > errmsg("number of referencing and referenced columns for foreign key disagree")));
> > >
> > > + /* Enforce each foreign key restrictions */
> > > + if (fkconstraint->fk_is_each)
> > > + {
> > > + /*
> > > + * ON UPDATE CASCADE action is not supported on EACH foreign keys
> > > + */
> > > + if (fkconstraint->fk_upd_action == FKCONSTR_ACTION_CASCADE)
> > > + ereport(ERROR,
> > > + (errcode(ERRCODE_INVALID_FOREIGN_KEY),
> > > + errmsg("ON UPDATE CASCADE action is not supported on "
> > > + "EACH foreign keys"),
> > > + errhint("Use ON UPDATE EACH CASCADE, instead")));
> >
> > Note from the Error Message Style Guide that errhint and errdetail messages
> > shall be complete sentences.
>
> Fixed.

That is to say, they start with a capital letter and end with a period. Your
old text was fine apart from the lack of a period. (Your new text also lacks
a period.)

> > This processing should happen in parse_utilcmd.c rather than gram.y. Magic
> > values "t" and "f" won't do. Make fk_attrs a list of ForeignKeyColumnElem or
> > else at least use an int list of true/false. Either way, you would no longer
> > need the decode loop in tablecmds.c.
>
> Fixed. We have removed the magic values and moved decoding of the
> ForeignKeyColumnElem list inside parse_utilcmd.c.

When I said "int list", I meant a T_IntList (via lappend_int(), lfirst_int(),
etc.). You used a T_List of T_Integer.

> > > ***************
> > > *** 2678,2695 **** RI_Initial_Check(Trigger *trigger, Relation fk_rel, Relation pk_rel)
> > > * The query string built is:
> > > * SELECT fk.keycols FROM ONLY relname fk
> > > * LEFT OUTER JOIN ONLY pkrelname pk
> > > ! * ON (pk.pkkeycol1=fk.keycol1 [AND ...])
> > > * WHERE pk.pkkeycol1 IS NULL AND
> > > * For MATCH unspecified:
> > > * (fk.keycol1 IS NOT NULL [AND ...])
> > > * For MATCH FULL:
> > > * (fk.keycol1 IS NOT NULL [OR ...])
> > > *
> > > * We attach COLLATE clauses to the operators when comparing columns
> > > * that have different collations.
> > > *----------
> > > */
> > > initStringInfo(&querybuf);
> > > appendStringInfo(&querybuf, "SELECT ");
> > > sep = "";
> > > for (i = 0; i < riinfo.nkeys; i++)
> > > --- 3527,3557 ----
> > > * The query string built is:
> > > * SELECT fk.keycols FROM ONLY relname fk
> > > * LEFT OUTER JOIN ONLY pkrelname pk
> > > ! * ON (pk.pkkeycol1=fk.keycol1 [AND ...]
> > > ! * [AND NOT EXISTS(<RECHECK_SUBQUERY>)])
> > > * WHERE pk.pkkeycol1 IS NULL AND
> > > * For MATCH unspecified:
> > > * (fk.keycol1 IS NOT NULL [AND ...])
> > > * For MATCH FULL:
> > > * (fk.keycol1 IS NOT NULL [OR ...])
> > > *
> > > + * In case of an EACH foreign key, a recheck subquery is added to
> > > + * the join condition in order to check that every combination of keys
> > > + * is actually referenced.
> > > + * The RECHECK_SUBQUERY is
> > > + * SELECT 1 FROM
> > > + * unnest(fk.keycol1) x1(x1) [CROSS JOIN ...]
> > > + * LEFT OUTER JOIN ONLY pkrelname pk
> > > + * ON (pk.pkkeycol1=x1.x1 [AND ...])
> > > + * WHERE pk.pkkeycol1 IS NULL AND
> > > + * (fk.keycol1 IS NOT NULL [AND ...])
> >
> > What is a worst-case query plan for a constraint with n ordinary keys and m
> > EACH keys?
>
> The subquery complexity can be compared with the one of the main query.
> Anyway, it is a one-off cost, paid at constraint creation, which I
> believe is acceptable. The fact that we now implement only single-column
> EACH foreign keys might suggest that we postpone this discussion at a
> later time.

If the cost doesn't exceed O(F log P), where F is the size of the FK table and
P is the size of the PK table, I'm not worried. If it can be O(F^2), we would
have a problem to be documented, if not fixed.

Your change to array_replace() made me look at array_remove() again and
realize that it needs the same treatment. This command yields a segfault:
SELECT array_remove('{a,null}'::text[], null);

This latest version introduces two calls to get_element_type(), both of which
should be get_base_element_type().

Patch "Avoid FK validations for no-rewrite ALTER TABLE ALTER TYPE", committed
between v3b and v4 of this patch, added code to ATAddForeignKeyConstraint()
requiring an update in this patch. Run this in the regression DB:
[local] regression=# alter table fktableforeachfk alter ftest1 type int[];
ERROR: could not find cast from 1007 to 23

RI_PLAN_EACHCASCADE_DEL_DOUPDATE should be RI_PLAN_EACHCASCADE_DEL_DODELETE.

I think the patch has reached the stage where a committer can review it
without wasting much time on things that might change radically. So, I'm
marking it Ready for Committer. Please still submit an update correcting the
above; I'm sure your committer will appreciate it.

Thanks,
nm

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Stark 2012-03-17 04:07:14 Re: EquivalenceClasses and subqueries and PlaceHolderVars, oh my
Previous Message Greg Stark 2012-03-17 00:34:02 Re: Storage Manager crash at mdwrite()