Re: Triggers on foreign tables

From: Noah Misch <noah(at)leadboat(dot)com>
To: Ronan Dunklau <ronan(dot)dunklau(at)dalibo(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>
Subject: Re: Triggers on foreign tables
Date: 2014-01-30 19:05:08
Message-ID: 20140130190508.GA2198486@tornado.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jan 23, 2014 at 03:17:35PM +0100, Ronan Dunklau wrote:
> > > - for after triggers, the whole queuing mechanism is bypassed for foreign
> > > tables. This is IMO acceptable, since foreign tables cannot have
> > > constraints or constraints triggers, and thus have not need for
> > > deferrable execution. This design avoids the need for storing and
> > > retrieving/identifying remote tuples until the query or transaction end.
> >
> > Whether an AFTER ROW trigger is deferred determines whether it runs at the
> > end of the firing query or at the end of the firing query's transaction.
> > In all cases, every BEFORE ROW trigger of a given query fires before any
> > AFTER ROW trigger of the same query. SQL requires that. This proposal
> > would give foreign table AFTER ROW triggers a novel firing time; let's not
> > do that.
> >
> > I think the options going forward are either (a) design a way to queue
> > foreign table AFTER ROW triggers such that we can get the old and/or new
> > rows at the end of the query or (b) not support AFTER ROW triggers on
> > foreign tables for the time being.
> >
>
> I did not know this was mandated by the standard.
>
> The attached patch tries to solve this problem by allocating a tuplestore
> in the global afterTriggers structure. This tuplestore is used for the whole
> transaction, and uses only work_mem per transaction.
>
> Both old and new tuples are stored in this tuplestore. Some additional
> bookkeeping is done on the afterTriggers global structure, to keep track of
> the number of inserted tuples, and the current read pointer position. The
> tuples are identified by their order of insertion during the transaction.
> I think this could benefit from some support in the tuplestore API, by
> allowing arbitrary seek without the need to store more ReadPointers.
>
> I initially tried to keep track of them by allocating read pointers on the
> tuple store, but it turned out to be so expensive that I had to find another
> way (24bytes per stored tuple, which are not reclaimable until the end of the
> transaction).
>
> What do you think about this approach ? Is there something I missed which
> would make it not sustainable ?

Seems basically reasonable. I foresee multiple advantages from having one
tuplestore per query level as opposed to one for the entire transaction. You
would remove the performance trap of backing up the tuplestore by rescanning.
It permits reclaiming memory and disk space in AfterTriggerEndQuery() rather
than at end of transaction. You could remove ate_ptr1 and ate_ptr2 from
AfterTriggerEventDataFDW and just store the flags word: depending on
AFTER_TRIGGER_2CTIDS, grab either the next one or the next two tuples from the
tuplestore. Using work_mem per AfterTriggerBeginQuery() instead of per
transaction is no problem. What do you think of that design change?

If you do pursue that change, make sure the code still does the right thing
when it drops queued entries during subxact abort.

> I do not have access to the standard specification, any advice regarding
> specs compliance would be welcomed.

http://wiki.postgresql.org/wiki/Developer_FAQ#Where_can_I_get_a_copy_of_the_SQL_standards.3F

> > > There is still one small issue with the attached patch: modifications to
> > > the tuple performed by the foreign data wrapper (via the returned
> > > TupleTableSlot in ExecForeignUpdate and ExecForeignInsert hooks) are not
> > > visible to the AFTER trigger. This could be fixed by merging the planslot
> > > containing the resjunk columns with the returned slot before calling the
> > > trigger, but I'm not really sure how to safely perform that. Any advice ?
> >
> > Currently, FDWs are permitted to skip returning columns not actually
> > referenced by any RETURNING clause. I would change that part of the API
> > contract to require returning all columns when an AFTER ROW trigger is
> > involved. You can't get around doing that by merging old column values,
> > because, among other reasons, an INSERT does not have those values at all.
>
> I'm not sure this should be part of the API contract: it would make
> implementing a FDW more complicated than it is now. The attached patch hooks
> on rewriteTargetListIU to add the missing targets to the returning clause,
> when needed.

You're effectively faking the presence of a RETURNING list so today's
conforming FDWs will already do the right thing? Clever.

> > Please don't change unrelated whitespace.
>
> > Please use pgindent to fix the formatting of your new code. It's fine to
> > introduce occasional whitespace errors, but they're unusually-plentiful
> > here.
>
> I think its done now. One problem I have with running pgindent is that I
> accidentally add chunks that were modified only by pgindent.

Yep; that is a pain.

Note that pgindent can't fix many unrelated whitespace changes. For example,
if you add or remove a blank line, pgindent won't interfere. (We would not
want it to interfere, because the use of blank lines is up to the code
author.) You will still need to read through your diff for such things.

On Wed, Jan 29, 2014 at 12:44:16PM +0100, Ronan Dunklau wrote:
> Le mercredi 29 janvier 2014 09:13:36 Kouhei Kaigai a écrit :
> > It may make sense to put a check fdw_nextwrite is less than INT_MAX. :-)
>
> The attached patch checks this, and add documentation for this limitation.
> I'm not really sure about how to phrase that correctly in the error message
> and the documentation. One can store at most INT_MAX foreign tuples, which
> means that at most INT_MAX insert or delete or "half-updates" can occur. By
> half-updates, I mean that for update two tuples are stored whereas in contrast
> to only one for insert and delete trigger.
>
> Besides, I don't know where this disclaimer should be in the documentation.
> Any advice here ?

I wouldn't mention that limitation.

> @@ -3390,34 +3497,72 @@ AfterTriggerExecute(AfterTriggerEvent event,
> /*
> * Fetch the required tuple(s).
> */
> - if (ItemPointerIsValid(&(event->ate_ctid1)))
> + if (event->ate_flags & AFTER_TRIGGER_FDW)
> {
> - ItemPointerCopy(&(event->ate_ctid1), &(tuple1.t_self));
> - if (!heap_fetch(rel, SnapshotAny, &tuple1, &buffer1, false, NULL))
> + AfterTriggerEventDataFDW *fdwevent = (AfterTriggerEventDataFDW *) event;
> +
> + if (afterTriggers->fdw_lastread > fdwevent->ate_ptr1)
> + {
> + tuplestore_rescan(afterTriggers->fdwtuplestore);
> + afterTriggers->fdw_lastread = 0;
> + }
> + while (afterTriggers->fdw_lastread < fdwevent->ate_ptr1)
> + {
> + tuplestore_advance(afterTriggers->fdwtuplestore, true);
> + afterTriggers->fdw_lastread++;
> + }

This is the performance trap I mentioned above; it brings potential O(n^2)
complexity to certain AFTER trigger execution scenarios.

> @@ -3709,6 +3862,9 @@ AfterTriggerBeginXact(void)
> afterTriggers->depth_stack = NULL;
> afterTriggers->firing_stack = NULL;
> afterTriggers->maxtransdepth = 0;
> + afterTriggers->fdwtuplestore = tuplestore_begin_heap(true, false, work_mem);

Probably best to create the tuplestore lazily, similar to how we initialize
afterTriggers->event_cxt. tuplestore_begin_heap() is almost cheap enough to
call unconditionally, but relatively few queries will use this.

> @@ -986,7 +986,18 @@ ExecModifyTable(ModifyTableState *node)
> }
> else if (relkind == RELKIND_FOREIGN_TABLE)
> {
> - /* do nothing; FDW must fetch any junk attrs it wants */
> + /*
> + * If the junkAttNo is valid, then it identifies the
> + * wholerow attribute. This is the case when there is an
> + * UPDATE or DELETE trigger.
> + */
> + if (AttributeNumberIsValid(junkfilter->jf_junkAttNo))
> + {
> + datum = ExecGetJunkAttribute(slot,
> + junkfilter->jf_junkAttNo,
> + &isNull);
> + oldtuple = DatumGetHeapTupleHeader(datum);

Check "isNull", just in case. See similar code elsewhere in this file.

> + }
> }
> else
> {
> @@ -1334,7 +1345,15 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
> }
> else if (relkind == RELKIND_FOREIGN_TABLE)
> {
> - /* FDW must fetch any junk attrs it wants */
> + /*
> + * FDW must fetch any junk attrs it wants When there
> + * is an AFTER trigger, there should be a wholerow
> + * attribute.

This comment edit looks half-done.

--
Noah Misch
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2014-01-30 19:07:37 Re: Re: Patch to add support of "IF NOT EXISTS" to others "CREATE" statements
Previous Message Amit Kapila 2014-01-30 19:03:21 Re: Performance Improvement by reducing WAL for Update Operation