Re: Triggers on foreign tables

From: Noah Misch <noah(at)leadboat(dot)com>
To: Ronan Dunklau <ronan(dot)dunklau(at)dalibo(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>
Subject: Re: Triggers on foreign tables
Date: 2014-03-17 18:17:01
Message-ID: 20140317181701.GA3824224@tornado.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I hacked on this for awhile, but there remain two matters on which I'm
uncertain about the right way forward.

(1) To acquire the old tuple for UPDATE/DELETE operations, the patch closely
parallels our handling for INSTEAD OF triggers on views. It adds a wholerow
resjunk attribute, from which it constructs a HeapTuple before calling a
trigger function. This loses the system columns, an irrelevant consideration
for views but meaningful for foreign tables. postgres_fdw maintains the
"ctid" system column (t_self), but triggers will always see an invalid t_self
for the old tuple. One way to fix this is to pass around the old tuple data
as ROW(ctid, oid, xmin, cmin, xmax, cmax, tableoid, wholerow). That's fairly
close to sufficient, but it doesn't cover t_ctid. Frankly, I would like to
find a principled excuse to not worry about making foreign table system
columns accessible from triggers. Supporting system columns dramatically
affects the mechanism, and what trigger is likely to care? Unfortunately,
that argument seems too weak. Does anyone have a cleaner idea for keeping
track of the system column values or a stronger argument for not bothering?

(2) When a foreign table has an AFTER ROW trigger, the FDW's
ExecForeign{Insert,Update,Delete} callbacks must return a slot covering all
columns. Current FDW API documentation says things like this:

The data in the returned slot is used only if the INSERT query has a
RETURNING clause. Hence, the FDW could choose to optimize away returning
some or all columns depending on the contents of the RETURNING clause.

Consistent with that, postgres_fdw inspects the returningLists of the
ModifyTable node to decide which columns are necessary. This patch has
rewriteTargetListIU() add a resjunk wholerow var to the returningList of any
query that will fire an AFTER ROW trigger on a foreign table. That avoids the
need to change the FDW API contract or any postgres_fdw code. I was pleased
about that for a time, but on further review, I'm doubting that the benefit
for writable FDWs justifies muddying the definition of returningList; until
now, returningList has been free from resjunk TLEs. For example, callers of
FetchStatementTargetList() may be unprepared to see an all-resjunk list,
instead of NIL, for a data modification statement that returns nothing.

If we do keep the patch's approach, I'm inclined to rename returningList.
However, I more lean toward adding a separate flag to indicate the need to
return a complete tuple regardless of the RETURNING list. The benefits of
overloading returningList are all short-term benefits. We know that the FDW
API is still converging, so changing it seems eventually-preferable to, and
safer than, changing the name or meaning of returningList. Thoughts?

On Thu, Mar 06, 2014 at 09:11:19AM +0100, Ronan Dunklau wrote:
> Le mercredi 5 mars 2014 22:36:44 Noah Misch a écrit :
> > Agreed. More specifically, I see only two scenarios for retrieving tuples
> > from the tuplestore. Scenario one is that we need the next tuple (or pair
> > of tuples, depending on the TriggerEvent). Scenario two is that we need
> > the tuple(s) most recently retrieved. If that's correct, I'm inclined to
> > rearrange afterTriggerInvokeEvents() and AfterTriggerExecute() to remember
> > the tuple or pair of tuples most recently retrieved. They'll then never
> > call tuplestore_advance() just to reposition. Do you see a problem with
> > that?
>
> I don't see any problem with that. I don't know how this would be implemented,
> but it would make sense to avoid those scans, as long as a fresh copy is
> passed to the trigger: modifications to a tuple performed in an after trigger
> should not be visible to the next one.

Trigger functions are not permitted to modify tg_trigtuple or tg_newtuple;
notice that, for non-foreign triggers, we pass shared_buffers-backed tuples in
those fields. Therefore, no copy is necessary.

> > I was again somewhat tempted to remove ate_tupleindex, perhaps by defining
> > the four flag bits this way:
> >
> > #define AFTER_TRIGGER_DONE 0x10000000
> > #define AFTER_TRIGGER_IN_PROGRESS 0x20000000
> > /* two bits describing the size of and tuple sources for this event */
> > #define AFTER_TRIGGER_TUP_BITS 0xC0000000
> > #define AFTER_TRIGGER_FDW_REUSE 0x00000000
> > #define AFTER_TRIGGER_FDW_FETCH 0x40000000
> > #define AFTER_TRIGGER_1CTID 0x80000000
> > #define AFTER_TRIGGER_2CTID 0xC0000000
> >
> > AFTER_TRIGGER_FDW_FETCH and AFTER_TRIGGER_FDW_REUSE correspond to the
> > aforementioned scenarios one and two, respectively. I think, though, I'll
> > rate this as needless micro-optimization and not bother; opinions welcome.
> > (The savings is four bytes per foreign table trigger event.)
>
> I was already happy with having a lower footprint for foreign table trigger
> events than for regular trigger events, but if we remove the need for seeking
> in the tuplestore entirely, it would make sense to get rid of the index.

I'm pleased with how this turned out. Besides the memory savings in question,
this removed the INT_MAX limitation and simplified the code overall. I did
not observe a notable runtime improvement, though that's unsurprising.

Other notable changes in the attached revision:

1. UPDATE/DELETE row-level triggers on foreign tables and INSTEAD OF triggers
on views have a similar requirement to generate a HeapTuple representing the
old row. View triggers did so in nodeModifyTable.c, while foreign table
triggers did so in trigger.c. Both were valid choices, but the code siting
should not be relkind-dependent without good reason. I centralized this in
ExecModifyTable().

2. Made CREATE TRIGGER forbid INSTEAD OF and TRUNCATE triggers on foreign
tables. The documentation already claimed they were unavailable.

3. Fixed pointer arithmetic in AfterTriggerBeginQuery()'s MemSet() call.

4. Modified GetCurrentFDWTuplestore() to allocate the tuplestore in
TopTransactionContext. We explicitly put the events list there (specifically,
in a documentation-only child of that context), so it seemed more consistent
to do the same for the associated foreign tuples. I did not find any live bug
from the previous coding, because CurrentMemoryContext always happened to be
one that survived past AfterTriggerEndQuery().

5. Updated comments and documentation that still reflected earlier versions of
the patch, as well as older comments obsoleted by the patch.

6. Reverted cosmetic changes, like addition of braces and blank lines, to
passages of code not otherwise changing. Please see:
https://wiki.postgresql.org/wiki/Creating_Clean_Patches

--
Noah Misch
EnterpriseDB http://www.enterprisedb.com

Attachment Content-Type Size
foreign_trigger_v9.patch text/plain 68.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Oleg Bartunov 2014-03-17 18:20:30 Re: jsonb status
Previous Message Tom Lane 2014-03-17 18:16:41 Re: First-draft release notes for next week's releases