Re: Event Triggers: adding information

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Event Triggers: adding information
Date: 2013-01-18 01:13:16
Message-ID: CA+TgmoYxRtC-w1R1HTLqE3s0CB6NtRY04JK+TSM1cCLbtrQAZg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jan 17, 2013 at 4:43 PM, Dimitri Fontaine
<dimitri(at)2ndquadrant(dot)fr> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> Goal: Every time an ALTER command is used on object *that actually
>> exists*, we will call some user-defined function and pass the object
>> type, the OID of the object, and some details about what sort of
>> alteration the user has requested.
>
> Ok, in current terms of the proposed patch, it's a ddl_command_end event
> trigger, and you can choose when to fire it in utility.c. The current
> place is choosen to be just after the AlterTable() call,

/me scratches my head.

Depends. What I described would fire before the command was executed,
not after, so it's not quite the same thing, but it might be just as
good for your purposes. It wasn't so much intended as "we should do
this exact thing" as much as "this is the kind of thought process that
you need to go through if you want to safely run an event trigger in
the middle of another command".

Now, maybe we don't actually need to do that to serve your purposes.
As already discussed, it seems like passing information to
ddl_command_end is for most purposes sufficient for what you need -
and we can do that without any special gymnastics, because that way it
runs completely after the command, not in the middle. If I understand
correctly, and I may not, there are basically two problems with that
approach:

(1) It's not exactly clear how to pass the information about the
statement through to the ddl_command_end trigger. I know you've
proposed a couple of things, but none of them seem completely
satisfying. At least not to me.

(2) If the method of transmission is "the OIDs of the affected
objects", which I believe is your current proposal, the
ddl_command_end trigger can't go look those objects up in the catalog,
because the catalog entries are already gone by that point.

It's possible we could solve both of those problems without running
event triggers in the middle of commands at all. In which case, my
whole example is moot for now. But I think it would be smart to get
ddl_command_end (without any additional information) committed first,
and then argue about these details, so that we at least make some
definable progress here.

> because that
> sounded logical if you believe in CommandCounterIncrement.

I'm somewhat bemused by this comment. I don't think
CommandCounterIncrement() is an article of faith. If you execute a
command to completion, and do a CommandCounterIncrement(), then
whatever you do next will look just like a new command, so it should
be safe to run user-provided code there. But if you stop in the
middle of a command, do a CommandCounterIncrement(), run some
user-provided code, and then continue on, the
CommandCounterIncrement() by itself protects you from nothing. If the
code is not expecting arbitrary operations to be executed at that
point, and you execute them, you get to keep both pieces. Indeed,
there are some places in the code where inserting a
CommandCounterIncrement() *by itself* could be unsafe. I don't
believe that's a risk in ProcessUtility(), but that doesn't mean there
aren't any risks in ProcessUtility().

>> So, instead, what we need to do is go into each function that
>> implements ALTER, and modify it so that after it does the dance where
>> we check permissions, take locks, and look up names, we call the
>> trigger function. That's a bit of a nuisance, because we probably
>> have a bunch of call sites to go fix, but in theory it should be
>> doable. The problem of course, is that it's not intrinsically safe to
>> call user-defined code there. If it drops the object and creates a
>> new one, say, hilarity will probably ensue.
>
> You're trying to find a dangerous point when to fire the trigger here,

Yes, I am. I'm not asking you to implement what I proposed there -
I'm just giving an example of how to make a dangerous place safe.
You've ALSO found a dangerous point when to fire the trigger. The
difference is that my example dangerous point is probably fixable to
be safe, whereas your actual dangerous point is probably unfixable,
because there are many current paths of execution that go through that
function in an extremely ad-hoc fashion, and there may be more in the
future. ddl_command_start and ddl_command_end are safe enough *when
restricted to toplevel commands*, but that's about as far as it goes.
Perhaps there are other special cases that are also safe, but just
throwing everything into the pot with no analysis and no
future-proofing certainly isn't.

>> Now, there is a further problem: all of the information about the
>> ALTER statement that we want to provide to the trigger may not be
>> available at that point. Simple cases like ALTER .. RENAME won't
>> require anything more than that, but things like ALTER TABLE .. ADD
>> FOREIGN KEY get tricky, because while at this point we have a solid
>> handle on the identity of the relation to which we're adding the
>> constraint, we do NOT yet have knowledge of or a lock on the TARGET
>> relation, whose OID could still change on us up to much later in the
>> computation. To get reliable information about *that*, we'll need to
>> refactor the sequence of events inside ALTER TABLE.
>
> The only valid answer here has already been given by Tom. You can only
> provide the information if it's available in the catalogs. So again,
> it's a ddl_command_end event. It can not happen before.

I don't see why you say that. It's perfectly possible to have that
information available at the right time; the work just hasn't been
done yet.

>> Hopefully you can see what I'm driving toward here. In a design like
>> this, we can pretty much prove - after returning from the event
>> trigger - that we're in a state no different from what would have been
>> created by executing a series of commands - lock table, then SELECT
>> event_trigger_func(), then the actual ALTER in sequence. Maybe
>> there's a flaw in that analysis - that's what patch review is for -
>> but it sounds pretty darn tight to me.
>
> The only case when we need to do a lookup BEFORE actually running the
> command is when that command is a DROP, because that's the only timing
> when the information we want is still in the catalogs.
>
> So that's the only case where we do a double object lookup, and we take
> a ShareUpdateExclusiveLock lock on the object when doing so, so that we
> can't lose it from another concurrent activity.

As noted, I don't think that's sufficient to really guarantee no shenanigans.

One idea would be to save off all the names of the objects actually
dropped at the time the drops are done, and then pass it to the event
trigger afterward. But frankly I think that's way beyond the scope of
what we should try to get done for 9.3.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2013-01-18 01:23:51 Re: Event Triggers: adding information
Previous Message Daniel Farina 2013-01-18 01:07:03 Re: patch to add \watch to psql