Re: [PATCH] Support for foreign keys with arrays

From: Marco Nenciarini <marco(dot)nenciarini(at)2ndQuadrant(dot)it>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH] Support for foreign keys with arrays
Date: 2012-03-14 18:03:08
Message-ID: 1331748188.6744.15.camel@greygoo.devise-it.lan
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

please find attached v4 of the EACH Foreign Key patch (formerly known
also as Foreign Key Array).

All the open issues have been fixed, including the crucial ones; see
below for details.

Regards,
Marco

--
Marco Nenciarini - 2ndQuadrant Italy
PostgreSQL Training, Services and Support
marco(dot)nenciarini(at)2ndQuadrant(dot)it | www.2ndQuadrant.it

On Fri, Feb 24, 2012 at 09:01:35PM -0500, Noah Misch wrote:
> You support, and have test cases for, constraints with multiple EACH columns.
> The documentation and comments do not explain their semantics. On reviewing
> the behavior and code, you have implemented it in terms of a Cartesian
> product. That's logical, but when is such a constraint actually useful? If
> someone can think of an application, great; let's document his example.
> Otherwise, let's reject such constraints at DDL time.

Now multiple EACH column are rejected.

>
> > We can use multi-dimensional arrays as well as referencing columns. In
> > that case though, ON DELETE EACH CASCADE will behave like ON DELETE EACH
> > SET NULL. This is a safe way of implementing the action.
> > We have some ideas on how to implement this, but we feel it is better to
> > limit the behaviour for now.
>
> This still feels awfully unprincipled to me. How about just throwing an error
> when we need to remove an element from a multidimensional array? Essentially,
> have ON DELETE EACH CASCADE downgrade to ON DELETE RESTRICT when it encounters
> a multidimensional array. That's strictly less code than what you have now,
> and it keeps our options open. We can always change from error to set-null
> later, but we can't so readily change from set-null to anything else.

That seems reasonable to me. Implemented and documented.

> As I pondered this patch again, I came upon formal hazards around non-default
> operator classes. Today, ATAddForeignKeyConstraint() always chooses pfeqop,
> ffeqop and ppeqop in the same operator family. If it neglected this, we would
> get wrong behavior when one of the operators is sensitive to a change to which
> another is insensitive. For EACH FK constraints, this patch sets ffeqop =
> ARRAY_EQ_OP unconditionally. array_eq() uses the TYPECACHE_EQ_OPR (usually
> from the default B-tree operator class). That operator may not exist at all,
> let alone share an operator family with pfeqop. Shall we deal with this by
> retrieving TYPECACHE_EQ_OPR in ATAddForeignKeyConstraint() and rejecting the
> constraint creation if it does not share an operator family with pfeqop? The
> same hazard affects ri_triggers.c use of array_remove() and array_replace(),
> and the same change mitigates it.

Check added. As a consequence of this stricter policy, certain
previously allowed cases are now unacceptable (e.g pk FLOAT and fk
INT[]).
Regression tests have been added.

> Let me say again how much I like the test coverage from this patch. It's
> great to think of something worth verifying and find existing coverage for it
> in the submitted test cases.
>
>
> > *** a/doc/src/sgml/catalogs.sgml
> > --- b/doc/src/sgml/catalogs.sgml
>
> > ***************
> > *** 2102,2107 ****
> > --- 2106,2118 ----
> > <entry></entry>
> > <entry>If a check constraint, a human-readable representation of the expression</entry>
> > </row>
> > +
> > + <row>
> > + <entry><structfield>confisarray</structfield></entry>
>
> Name is now confiseach.

Fixed.

> pg_constraint.confeach needs documentation.

Most of pg_constraint columns, including all the boolean ones, are
documented only in the "description" column of

http://www.postgresql.org/docs/9.1/static/catalog-pg-constraint.html#AEN85760

it seems that confiseach should not be an exception, since it just
indicates whether the constraint is of a certain kind or not.

>
> > + <entry><type>bool</type></entry>
> > + <entry></entry>
> > + <entry>If a foreign key, true if a EACH REFERENCE foreign key</entry>
> > + </row>
> > </tbody>
> > </tgroup>
> > </table>
> > *** a/doc/src/sgml/ddl.sgml
> > --- b/doc/src/sgml/ddl.sgml
>
> > + <para>
> > + Another option you have with foreign keys is to use a
> > + referencing column which is an array of elements with
> > + the same type (or a compatible one) as the referenced
> > + column in the related table. This feature, commonly known
> > + as <firstterm>foreign key arrays</firstterm>, is implemented
>
> Is it indeed "commonly known" by that name? My web searching did not turn up
> any independent use of the term.
>
> In any event, this should be the only place where we mention multiple names
> for the feature. Pick one preferred term and use it for all other mentions.
> The documentation, comments and messages currently have a mix of "each foreign
> key", "EACH foreign key" and "foreign key array".

In the end we uniformly adopted "EACH foreign key" as the unique name
of this feature; documentation, comments and messages have been
updated accordingly.

> > + <para>
> > + Even though the most common use case for foreign key arrays
> > + is on a single column key, you can define an <quote>each foreign
> > + key constraint</quote> on a group of columns. As the following
> > + contrived example shows, it needs to be written in table constraint form:
> > + <programlisting>
> > + CREATE TABLE t1 (
> > + a integer PRIMARY KEY,
> > + b integer,
> > + c integer[],
> > + <emphasis>FOREIGN KEY (b, EACH c) REFERENCES other_table (c1, c2)</emphasis>
> > + );
> > + </programlisting>
>
> Rather than merely showing an abstract syntax example, some words about the
> semantics are in order. You might have a parent table with primary key
> (category, value). Then your child table(s) have a category column and an
> array of values. The constraint then ensures that all the entries in the
> array exist in the parent table *and* belong to the proper category.

That example has been replaced by a concrete one.

> > *** a/src/backend/commands/tablecmds.c
> > --- b/src/backend/commands/tablecmds.c
>
> > ***************
> > *** 5736,5741 **** ATAddForeignKeyConstraint(AlteredTableInfo *tab, Relation rel,
> > --- 5759,5807 ----
> > (errcode(ERRCODE_INVALID_FOREIGN_KEY),
> > errmsg("number of referencing and referenced columns for foreign key disagree")));
> >
> > + /* Enforce each foreign key restrictions */
> > + if (fkconstraint->fk_is_each)
> > + {
> > + /*
> > + * ON UPDATE CASCADE action is not supported on EACH foreign keys
> > + */
> > + if (fkconstraint->fk_upd_action == FKCONSTR_ACTION_CASCADE)
> > + ereport(ERROR,
> > + (errcode(ERRCODE_INVALID_FOREIGN_KEY),
> > + errmsg("ON UPDATE CASCADE action is not supported on "
> > + "EACH foreign keys"),
> > + errhint("Use ON UPDATE EACH CASCADE, instead")));
>
> Note from the Error Message Style Guide that errhint and errdetail messages
> shall be complete sentences.

Fixed.

> > +
> > + /*
> > + * EACH CASCADE and EACH SET NULL actions are only available
> > + * in single-column EACH foreign keys
> > + */
> > + if (numpks > 1 &&
> > + (fkconstraint->fk_upd_action == FKCONSTR_ACTION_ARRCASCADE
> > + || fkconstraint->fk_upd_action == FKCONSTR_ACTION_ARRSETNULL
> > + || fkconstraint->fk_del_action == FKCONSTR_ACTION_ARRCASCADE
> > + || fkconstraint->fk_del_action == FKCONSTR_ACTION_ARRSETNULL))
> > + ereport(ERROR,
> > + (errcode(ERRCODE_INVALID_FOREIGN_KEY),
> > + errmsg("EACH CASCADE and EACH SET NULL actions are only "
> > + "available on single-column EACH foreign keys")));
> > + }
> > + else
> > + {
> > + /*
> > + * EACH CASCADE and EACH SET NULL actions are only available
> > + * in EACH foreign keys
> > + */
> > + if (fkconstraint->fk_upd_action == FKCONSTR_ACTION_ARRCASCADE
> > + || fkconstraint->fk_upd_action == FKCONSTR_ACTION_ARRSETNULL
> > + || fkconstraint->fk_del_action == FKCONSTR_ACTION_ARRCASCADE
> > + || fkconstraint->fk_del_action == FKCONSTR_ACTION_ARRSETNULL)
> > + ereport(ERROR,
> > + (errcode(ERRCODE_INVALID_FOREIGN_KEY),
> > + errmsg("EACH CASCADE and EACH SET NULL actions are only "
> > + "available on EACH foreign keys")));
>
> I would just reuse the previous error message. It will be fine, maybe even
> better, for everyone to see the phrase "single-column".

Fixed.

> > ***************
> > *** 5812,5823 **** ATAddForeignKeyConstraint(AlteredTableInfo *tab, Relation rel,
> > Oid target_typeids[2];
> >
> > input_typeids[0] = pktype;
> > ! input_typeids[1] = fktype;
> > target_typeids[0] = opcintype;
> > target_typeids[1] = opcintype;
> > if (can_coerce_type(2, input_typeids, target_typeids,
> > COERCION_IMPLICIT))
> > ! pfeqop = ffeqop = ppeqop;
> > }
> >
> > if (!(OidIsValid(pfeqop) && OidIsValid(ffeqop)))
> > --- 5905,5925 ----
> > Oid target_typeids[2];
> >
> > input_typeids[0] = pktype;
> > ! input_typeids[1] = fktyped;
>
> By sending the base type instead of the domain type to can_coerce_type(), this
> changes behavior ever so slightly for ordinary foreign keys. It would take
> something fairly baroque to get a different result. Perhaps a polymorphic
> opcintype and an FK column of a domain over an enum. Still, please don't
> change that part of the logic.

Fixed.

> > target_typeids[0] = opcintype;
> > target_typeids[1] = opcintype;
> > if (can_coerce_type(2, input_typeids, target_typeids,
> > COERCION_IMPLICIT))
> > ! {
> > ! pfeqop = ppeqop;
> > !
> > ! /*
> > ! * In caso of an EACH FK the ffeqop must be left untouched
>
> Typo.

Fixed.

>
> > *** a/src/backend/parser/gram.y
> > --- b/src/backend/parser/gram.y
>
> > ***************
> > *** 2900,2917 **** ConstraintElem:
> > yyscanner);
> > $$ = (Node *)n;
> > }
> > ! | FOREIGN KEY '(' columnList ')' REFERENCES qualified_name
> > opt_column_list key_match key_actions ConstraintAttributeSpec
> > {
> > Constraint *n = makeNode(Constraint);
> > n->contype = CONSTR_FOREIGN;
> > n->location = @1;
> > n->pktable = $7;
> > ! n->fk_attrs = $4;
> > n->pk_attrs = $8;
> > n->fk_matchtype = $9;
> > n->fk_upd_action = (char) ($10 >> 8);
> > n->fk_del_action = (char) ($10 & 0xFF);
> > processCASbits($11, @11, "FOREIGN KEY",
> > &n->deferrable, &n->initdeferred,
> > &n->skip_validation,
> > --- 2918,2959 ----
> > yyscanner);
> > $$ = (Node *)n;
> > }
> > ! | FOREIGN KEY '(' foreignKeyColumnList ')' REFERENCES qualified_name
> > opt_column_list key_match key_actions ConstraintAttributeSpec
> > {
> > Constraint *n = makeNode(Constraint);
> > n->contype = CONSTR_FOREIGN;
> > n->location = @1;
> > n->pktable = $7;
> > ! n->fk_attrs = $4;
> > n->pk_attrs = $8;
> > n->fk_matchtype = $9;
> > n->fk_upd_action = (char) ($10 >> 8);
> > n->fk_del_action = (char) ($10 & 0xFF);
> > +
> > + /*
> > + * Split the content of foreignKeyColumnList
> > + * in two separate list. One lis of fileds
>
> Two typos.

Fixed.

> > + * and one list of boolean values.
> > + */
> > + {
> > + ListCell *i;
> > +
> > + n->fk_attrs = NIL;
> > + n->fk_is_each = false;
> > + n->fk_each_attrs = NIL;
> > + foreach (i, $4)
> > + {
> > + ForeignKeyColumnElem *elem =
> > + (ForeignKeyColumnElem *)lfirst(i);
> > +
> > + n->fk_attrs = lappend(n->fk_attrs, elem->name);
> > + n->fk_is_each |= elem->each;
> > + n->fk_each_attrs = lappend(n->fk_each_attrs,
> > + makeString(elem->each?"t":"f"));
> > + }
> > + }
>
> This processing should happen in parse_utilcmd.c rather than gram.y. Magic
> values "t" and "f" won't do. Make fk_attrs a list of ForeignKeyColumnElem or
> else at least use an int list of true/false. Either way, you would no longer
> need the decode loop in tablecmds.c.

Fixed. We have removed the magic values and moved decoding of the
ForeignKeyColumnElem list inside parse_utilcmd.c.

> > *** a/src/backend/utils/adt/arrayfuncs.c
> > --- b/src/backend/utils/adt/arrayfuncs.c
>
> > + Datum
> > + array_replace(PG_FUNCTION_ARGS)
> > + {
>
> > + for (i = 0; i < nitems; i++)
> > + {
> > + bool isNull;
> > + bool oprresult;
> > +
> > + /* Get source element, checking for NULL */
> > + if (bitmap && (*bitmap & bitmask) == 0)
> > + {
> > + if (old_value_isnull)
> > + {
> > + values[i] = new_value;
> > + isNull = false;
> > + changed = true;
> > + }
> > + else
> > + isNull = true;
> > + }
>
> This does the wrong thing if old_value_isnull == new_value_isnull == true:
>
> [local] regression=# select array_replace('{4,null,5}'::int[], null, null);
> array_replace
> ---------------
> {4,0,5}
>
> (With text[], that yields a crash.)

Fixed.

> > *** a/src/backend/utils/adt/ri_triggers.c
> > --- b/src/backend/utils/adt/ri_triggers.c
>
> > ! Datum
> > ! RI_FKey_eachcascade_upd(PG_FUNCTION_ARGS)
> > {
>
> > ! /* ----------
> > ! * The query string built is
> > ! * UPDATE ONLY <fktable>
> > ! * SET fkatt1 = array_replace(fkatt1, $n, $1) [, ...]
> > ! * WHERE $n = fkatt1 [AND ...]
> > ! * The type id's for the $ parameters are those of the
> > ! * corresponding PK attributes.
> > ! * ----------
> > ! */
> > ! initStringInfo(&querybuf);
> > ! initStringInfo(&qualbuf);
> > ! quoteRelationName(fkrelname, fk_rel);
> > ! appendStringInfo(&querybuf, "UPDATE ONLY %s SET", fkrelname);
> > ! querysep = "";
> > ! qualsep = "WHERE";
>
> An elog(ERROR) when riinfo.nkeys != 1 would make sense here and for other
> functions where we forbid multiple columns at creation time. If nothing else,
> this needs a comment that the looping is vestigial and the function only
> supports single-column constraints.

Logic implemented and comment added.

> > ***************
> > *** 2678,2695 **** RI_Initial_Check(Trigger *trigger, Relation fk_rel, Relation pk_rel)
> > * The query string built is:
> > * SELECT fk.keycols FROM ONLY relname fk
> > * LEFT OUTER JOIN ONLY pkrelname pk
> > ! * ON (pk.pkkeycol1=fk.keycol1 [AND ...])
> > * WHERE pk.pkkeycol1 IS NULL AND
> > * For MATCH unspecified:
> > * (fk.keycol1 IS NOT NULL [AND ...])
> > * For MATCH FULL:
> > * (fk.keycol1 IS NOT NULL [OR ...])
> > *
> > * We attach COLLATE clauses to the operators when comparing columns
> > * that have different collations.
> > *----------
> > */
> > initStringInfo(&querybuf);
> > appendStringInfo(&querybuf, "SELECT ");
> > sep = "";
> > for (i = 0; i < riinfo.nkeys; i++)
> > --- 3527,3557 ----
> > * The query string built is:
> > * SELECT fk.keycols FROM ONLY relname fk
> > * LEFT OUTER JOIN ONLY pkrelname pk
> > ! * ON (pk.pkkeycol1=fk.keycol1 [AND ...]
> > ! * [AND NOT EXISTS(<RECHECK_SUBQUERY>)])
> > * WHERE pk.pkkeycol1 IS NULL AND
> > * For MATCH unspecified:
> > * (fk.keycol1 IS NOT NULL [AND ...])
> > * For MATCH FULL:
> > * (fk.keycol1 IS NOT NULL [OR ...])
> > *
> > + * In case of an EACH foreign key, a recheck subquery is added to
> > + * the join condition in order to check that every combination of keys
> > + * is actually referenced.
> > + * The RECHECK_SUBQUERY is
> > + * SELECT 1 FROM
> > + * unnest(fk.keycol1) x1(x1) [CROSS JOIN ...]
> > + * LEFT OUTER JOIN ONLY pkrelname pk
> > + * ON (pk.pkkeycol1=x1.x1 [AND ...])
> > + * WHERE pk.pkkeycol1 IS NULL AND
> > + * (fk.keycol1 IS NOT NULL [AND ...])
>
> What is a worst-case query plan for a constraint with n ordinary keys and m
> EACH keys?

The subquery complexity can be compared with the one of the main query.
Anyway, it is a one-off cost, paid at constraint creation, which I
believe is acceptable. The fact that we now implement only single-column
EACH foreign keys might suggest that we postpone this discussion at a
later time.

>
> > + *
> > * We attach COLLATE clauses to the operators when comparing columns
> > * that have different collations.
> > *----------
> > */
>
> > *** a/src/include/nodes/parsenodes.h
> > --- b/src/include/nodes/parsenodes.h
> > ***************
> > *** 570,575 **** typedef struct DefElem
> > --- 570,589 ----
> > } DefElem;
> >
> > /*
> > + * ForeignKeyColumnElem - foreign key column (used in foreign key constaint)
>
> Typo.

Fixed.

>
> > + *
> > + * For a foreign key attribute, 'name' is the name of the table column to
> > + * index, and each is true if it is an EACH fk.
> > + */
> > + typedef struct ForeignKeyColumnElem
> > + {
> > + NodeTag type;
> > + Node *name; /* name of the column, or NULL */
> > + bool each; /* true if an EACH foreign key */
> > +
> > + } ForeignKeyColumnElem;
> > +
> > + /*
> > * LockingClause - raw representation of FOR UPDATE/SHARE options
> > *
> > * Note: lockedRels == NIL means "all relations in query". Otherwise it
> > ***************
> > *** 1510,1515 **** typedef enum ConstrType /* types of constraints */
> > --- 1524,1531 ----
> > #define FKCONSTR_ACTION_CASCADE 'c'
> > #define FKCONSTR_ACTION_SETNULL 'n'
> > #define FKCONSTR_ACTION_SETDEFAULT 'd'
> > + #define FKCONSTR_ACTION_ARRCASCADE 'C'
> > + #define FKCONSTR_ACTION_ARRSETNULL 'N'
>
> These names need update for the array -> each terminology change.

Fixed.

> I consider these the core changes needed to reach Ready for Committer:
>
> - Fix crash in array_replace(arr, null, null).
> - Don't look through the domain before calling can_coerce_type().
> - Compare operator family of pfeqop and TYPECACHE_EQ_OPR at creation.
> - Move post-processing from gram.y and remove "t"/"f" magic values.

All done.

Attachment Content-Type Size
EACH-foreign-key.v4.patch.bz2 application/x-bzip 28.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Farina 2012-03-14 18:06:16 Faster compression, again
Previous Message Robert Haas 2012-03-14 18:02:03 Re: patch for parallel pg_dump