Event Triggers: adding information

Lists: pgsql-hackers
From: Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>
To: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Event Triggers: adding information
Date: 2012-12-11 21:47:13
Message-ID: m2txrsdzxa.fsf@2ndQuadrant.fr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

Please find attached v3 of the patch to add information to event
triggers, including command string. This work has mostly been part of
the original goal and its design is the result of much reviewing here,
in particular from Robert and Tom (rewriting a command before is done is
called magic, not code, and that's why we have ddl_command_trace now).
Many thanks.

I hope that Robert will be in a position to continue reviewing and
taking care of that patch series. He's clearly interested but really
busy these days, so if another commiter wants to see for himself, please
don't be shy!

The current patch provides the following features:

- New events: ddl_command_end and ddl_command_trace

- New information in the TG_VARIABLE:
object name, schema name, object id, object kind,
operation (CREATE|ALTER|DROP), context (GENERATED|TOPLEVEL|…)
normalized command string

- New event filtering on CONTEXT (see after sig for an example)

+ create event trigger regress_event_trigger_end on ddl_command_end
+ when context in ('toplevel', 'generated', 'query', 'subcommand')
+ execute procedure test_event_trigger();

- Documentation of listed features

- pg_dump support of the new features (context support)

The command string normalization is a necessary feature and takes most
of this patch footprint. The main use case of this feature is in-core
logical replication support, the main non-core users will be the usual
replication suspects (Slony, Londiste, Bucardo) and the audit systems.
We might be able to also support some advanced Extension capabilities if
we get "command diverting" support in next commit fest, and we could
already deprecate the extension whitelisting code I had to do for 9.1
and 9.2.

As we need to support all of CREATE, ATLER and DROP cases, I can't see
another way than to work from a very loose notion of the parse tree. Up
to now I've seen 3 cases worth reporting:

- simple commands, where you can rely on the parsetree as is, and that
need no transform stage (create extension, drop object, alter
sequence, text search *, most commands really)

- medium complex commands, which transform the parsetree in some ways
but for which it still is easy to use it directly if you take care
of only using the transformed parse tree (create table, create
domain, create view — oh, the work was already done)

- complex commands where you need to hack them to return the actual
parse tree like structure they prepared out of the transformed parse
tree so that you can get to kmow what they've just done, and that's
only ALTER TABLE really, up to now

So my guess is that it will get real easier from now, and I will add
some more de parsing support with stats to have some ideas about that.

I've been told that the maintenance cost is going to be so great as to
limit our abilities to continue adding features to existing DDLs and
coming up with new ones. I have two answers: first, I don't suppose
we're going to include logical replication without DDL support in core
and I don't see another realistic way to implement it, second, let's
have some data to think about the case at hand.

For example, I added CREATE SCHEMA objectid and command normalisation
and here's the footprint:

git diff --stat
src/backend/commands/schemacmds.c | 6 ++++--
src/backend/tcop/utility.c | 4 ++--
src/backend/utils/adt/ddl_rewrite.c | 31 +++++++++++++++++++++++++++++++
src/include/commands/schemacmds.h | 2 +-
4 files changed, 38 insertions(+), 5 deletions(-)

For the whole DefineStmt command set (CREATE AGGREGATE, OPERATOR, TYPE,
TEXT SEARCH *, COLLATION), we're talking about:

11 files changed, 186 insertions(+), 70 deletions(-)
95 lines of those in ddl_rewrite.c

For the normalisation of CREATE CONVERSATION command string, this time
it's

4 files changed, 40 insertions(+), 5 deletions(-)

And CREATE DOMAIN, which is more involved as you need to take care of
rewriting DEFAULT and CHECK constraints, we have

4 files changed, 163 insertions(+), 3 deletions(-)

The other commands I need to be working on if we want more data are
ALTER kind of commands, and I've already worked on the most complex one
of those, namely ALTER TABLE. I don't expect most DROP commands to offer
any difficulty here, as I already have support for DropStmt.

The question for DropStmt is what to do about object specific
information when a single command can have as many targets as you want
to? Current patch is filling-in information for the "main" target which
would be the first object in the sequence, but that's a very weak
position and I would easily agree to just leave the fields NULL in that
case.

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

A mildly interesting example that you can have now:

create schema baz
authorization dim
create table distributors
(did serial primary key,
name varchar(40),
f2 text check (upper(f2) = f2),
unique(name) with (fillfactor=70)
)
with (fillfactor=70);

I added the blank lines to help the reader in that console paste:

NOTICE: snitch event: ddl_command_end, context: GENERATED, tag: CREATE SEQUENCE, operation: CREATE, type: SEQUENCE
NOTICE: oid: 41633, schema: baz, name: distributors_did_seq
NOTICE: command: CREATE SEQUENCE baz.distributors_did_seq;

NOTICE: snitch event: ddl_command_end, context: SUBCOMMAND, tag: CREATE TABLE, operation: CREATE, type: TABLE
NOTICE: oid: 41635, schema: baz, name: distributors
NOTICE: command: CREATE TABLE baz.distributors (did integer, name pg_catalog.varchar, f2 text, CHECK ((upper(f2) = f2))) WITH (fillfactor=70);

NOTICE: snitch event: ddl_command_end, context: GENERATED, tag: CREATE INDEX, operation: CREATE, type: INDEX
NOTICE: oid: 41643, schema: baz, name: distributors_pkey
NOTICE: command: CREATE UNIQUE INDEX distributors_pkey ON baz.distributors USING btree (did);

NOTICE: snitch event: ddl_command_end, context: GENERATED, tag: CREATE INDEX, operation: CREATE, type: INDEX
NOTICE: oid: 41645, schema: baz, name: distributors_name_key
NOTICE: command: CREATE UNIQUE INDEX distributors_name_key ON baz.distributors USING btree (name) WITH (fillfactor=70);

NOTICE: snitch event: ddl_command_end, context: GENERATED, tag: ALTER SEQUENCE, operation: ALTER, type: SEQUENCE
NOTICE: oid: 41633, schema: baz, name: distributors_did_seq
NOTICE: command: ALTER SEQUENCE baz.distributors_did_seq OWNED BY baz.distributors.did;

NOTICE: snitch event: ddl_command_end, context: TOPLEVEL, tag: CREATE SCHEMA, operation: CREATE, type: SCHEMA
NOTICE: oid: 41632, schema: <NULL>, name: baz
NOTICE: command: CREATE SCHEMA baz AUTHORIZATION dim;
CREATE SCHEMA

Attachment Content-Type Size
event_trigger_infos.3.patch.gz application/octet-stream 39.1 KB

From: Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>
To: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Event Triggers: adding information
Date: 2012-12-12 21:47:23
Message-ID: m2zk1j9c44.fsf@2ndQuadrant.fr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi again,

The previously attached patch already needs a rebase since Tom fixed the
single user mode where you're not supposed to access potentially
corrupted system indexes. Please find attached a new version of the
patch that should applies cleanly to master (I came to trust git).

Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr> writes:
> The current patch provides the following features:
>
> - New events: ddl_command_end and ddl_command_trace
>
> - New information in the TG_VARIABLE:
> object name, schema name, object id, object kind,
> operation (CREATE|ALTER|DROP), context (GENERATED|TOPLEVEL|…)
> normalized command string
>
> - New event filtering on CONTEXT (see after sig for an example)
>
> + create event trigger regress_event_trigger_end on ddl_command_end
> + when context in ('toplevel', 'generated', 'query', 'subcommand')
> + execute procedure test_event_trigger();
>
> - Documentation of listed features
>
> - pg_dump support of the new features (context support)

Still the same, plus some more command rewriting support so that I have
some more data points to offer about the code maintainance burden we all
want to limit on that feature.

> For example, I added CREATE SCHEMA objectid and command normalisation
> and here's the footprint:
> 4 files changed, 38 insertions(+), 5 deletions(-)
>
> For the whole DefineStmt command set (CREATE AGGREGATE, OPERATOR, TYPE,
> TEXT SEARCH *, COLLATION), we're talking about:
> 11 files changed, 186 insertions(+), 70 deletions(-)
> 95 lines of those in ddl_rewrite.c
>
> For the normalisation of CREATE CONVERSATION command string, this time
> 4 files changed, 40 insertions(+), 5 deletions(-)
>
> And CREATE DOMAIN, which is more involved as you need to take care of
> rewriting DEFAULT and CHECK constraints, we have
> 4 files changed, 163 insertions(+), 3 deletions(-)

The whole set of RenameStmt commands:
32 files changed, 382 insertions(+), 119 deletions(-)

Then after that, sharing a lot of the previous code, support for both
AlterOwnerStmt and AlterObjectSchemaStmt rewriting:
1 file changed, 102 insertions(+), 17 deletions(-)

As you can guess I separated away the EventTriggerTargetOid tracking:
20 files changed, 96 insertions(+), 57 deletions(-)

Again, those data point are not commented here, it's only to get some
better idea of what we are talking about as the cost of maintaining the
new command normalisation feature that's brewing in this patch.

I would like that we commit to that idea and the current implementation
of it, and include the set of commands I already had time to prepare in
the current submission, knowing that I will continue adding commands
support up to the next commitfest with the goal to have a full coverage.

I also will be working on more features (complete PL support, some kind
of command diverting, a "table_rewrite" event, some ready to use event
triggers functions).

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

Attachment Content-Type Size
event_trigger_infos.4.patch.gz application/octet-stream 48.1 KB

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Event Triggers: adding information
Date: 2012-12-21 14:48:22
Message-ID: CA+Tgmob6kmAySE5yXvd3uE9LifHunufMg9iU3GmbJ5bpGPiHjQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Dec 12, 2012 at 4:47 PM, Dimitri Fontaine
<dimitri(at)2ndquadrant(dot)fr> wrote:
> The previously attached patch already needs a rebase since Tom fixed the
> single user mode where you're not supposed to access potentially
> corrupted system indexes. Please find attached a new version of the
> patch that should applies cleanly to master (I came to trust git).

Sorry for the delay - I'm looking at this now.

My first observation (a.k.a. gripe) is this patch touches an awful lot
of code all over the backend. We just did a big old refactoring to
try to get all the rename and alter owner commands to go through the
same code path, but if this patch is any indication it has not done us
a whole lot of good. The whole idea of all that work is that when
people wanted to make systematic changes to those operations (like
involving sepgsql, or returning the OID) there would not be 47 places
to change. Apparently, we aren't there yet. Maybe we need some more
refactoring of that stuff before tackling this?

Does anyone object to the idea of making every command that creates a
new SQL object return the OID of an object, and similarly for RENAME /
ALTER TO? If so, maybe we ought to go through and do those things
first, as separate patches, and then rebase this over those changes
for simplicity. I can probably do that real soon now, based on this
patch, if there are no objections, but I don't want to do the work and
then have someone say, ack, what have you done?

I'm basically implacably opposed to the ddl_rewrite.c part of this
patch. I think that the burden of maintaining reverse-parsing code
for all the DDL we support is an unacceptable level of maintenance
overhead for this feature. It essentially means that every future
change to gram.y that affects any of the supported commands will
require a compensating change to ddl_rewrite.c. That is a whole lot
of work and it is almost guaranteed that future patch authors will
sometimes fail to do it correctly. At an absolute bare minimum I
think we would need some kind of comprehensive test suite for this
functionality, as opposed to the less than 60 lines of new test cases
this patch adds which completely fail to test this code in any way at
all, but really I think we should just not have it. It WILL get
broken (if it isn't already) and it WILL slow down future development
of SQL features. It also WILL have edge cases where it doesn't work
properly, such as where the decision of the actual index name to use
is only decided at execution time and cannot be inferred from the
parse tree. I know that you feel that all of these are tolerable
costs, but I 100% disgaree. I predict that if we commit this, it will
becomes a never-ending and irrecoverable font of trouble.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Event Triggers: adding information
Date: 2012-12-21 16:35:52
Message-ID: m2k3sbe51z.fsf@2ndQuadrant.fr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> Sorry for the delay - I'm looking at this now.

Thanks very much!

> My first observation (a.k.a. gripe) is this patch touches an awful lot
> of code all over the backend. We just did a big old refactoring to
> try to get all the rename and alter owner commands to go through the
> same code path, but if this patch is any indication it has not done us
> a whole lot of good. The whole idea of all that work is that when
> people wanted to make systematic changes to those operations (like
> involving sepgsql, or returning the OID) there would not be 47 places
> to change. Apparently, we aren't there yet. Maybe we need some more
> refactoring of that stuff before tackling this?

It's hard to devise exactly what kind of refactoring we need to do
before trying to write a patch that benefits from it, in my experience.
In some cases the refactoring will make things more complex, not less.

> Does anyone object to the idea of making every command that creates a
> new SQL object return the OID of an object, and similarly for RENAME /
> ALTER TO? If so, maybe we ought to go through and do those things
> first, as separate patches, and then rebase this over those changes
> for simplicity. I can probably do that real soon now, based on this
> patch, if there are no objections, but I don't want to do the work and
> then have someone say, ack, what have you done?

That's a good idea. I only started quite late in that patch to work on
the ObjectID piece of information, that's why I didn't split it before
doing so.

We might want to commit the other parts of the new infrastructure and
information first then get back on ObjectID and command string, what do
you think? Do you want me to prepare a patch that way, or would you
rather hack your way around the ObjectID APIs then I would rebase the
current patch?

Also, keep in mind we want the ObjectID in all CREATE, ALTER and DROP
statements, so my current patch is still some bricks shy of a load… (I
added ObjectID only in the commands I added rewrite support for, apart
from DROP).

> I'm basically implacably opposed to the ddl_rewrite.c part of this
> patch. I think that the burden of maintaining reverse-parsing code
> for all the DDL we support is an unacceptable level of maintenance
> overhead for this feature. It essentially means that every future

I hear you. That feature is still required for any automatic consumption
of the command string because you want to resolve schema names, index
and constraint names, type name lookups and some more.

I think that this feature is also not an option for in-core logical
replication to support DDL at all. What alternative do you propose?

> change to gram.y that affects any of the supported commands will
> require a compensating change to ddl_rewrite.c. That is a whole lot
> of work and it is almost guaranteed that future patch authors will
> sometimes fail to do it correctly. At an absolute bare minimum I
> think we would need some kind of comprehensive test suite for this
> functionality, as opposed to the less than 60 lines of new test cases
> this patch adds which completely fail to test this code in any way at
> all, but really I think we should just not have it. It WILL get

I had a more complete test suite last rounds and you did oppose to it
for good reasons. I'm ok to revisit that now, and I think the test case
should look like this:

- install an auditing command trigger that will insert any DDL command
string into a table with a sequence ordering

- select * from audit

- \d… of all objects created

- drop schema cascade

- DO $$ loop for sql in select command from audit do execute sql …

- \d… of all objects created again

Then any whacking of the grammar should alter the output here and any
case that's not supported will fail too.

We might be able to have a better way of testing the feature here, but I
tried to stay into the realms of what I know pg_regress able to do.

> broken (if it isn't already) and it WILL slow down future development
> of SQL features. It also WILL have edge cases where it doesn't work
> properly, such as where the decision of the actual index name to use
> is only decided at execution time and cannot be inferred from the
> parse tree. I know that you feel that all of these are tolerable

The way to solve that, I think, as Tom said, is to only rewrite the
command string when the objects exist in the catalogs. That's why we now
have ddl_command_trace which is an alias to either start or end
depending on whether we're doing a DROP or a CREATE or ALTER operation.

> costs, but I 100% disgaree. I predict that if we commit this, it will
> becomes a never-ending and irrecoverable font of trouble.

I'm not saying the cost of doing things that way are easy to swallow,
I'm saying that I don't see any other way to get that feature reliably
enough for in-core logical replication to just work with DDLs. Do you?

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Event Triggers: adding information
Date: 2012-12-21 21:30:46
Message-ID: 20121221213046.GB26107@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

On 2012-12-21 09:48:22 -0500, Robert Haas wrote:
> On Wed, Dec 12, 2012 at 4:47 PM, Dimitri Fontaine
> <dimitri(at)2ndquadrant(dot)fr> wrote:
> > The previously attached patch already needs a rebase since Tom fixed the
> > single user mode where you're not supposed to access potentially
> > corrupted system indexes. Please find attached a new version of the
> > patch that should applies cleanly to master (I came to trust git).
>
> Sorry for the delay - I'm looking at this now.
>
> My first observation (a.k.a. gripe) is this patch touches an awful lot
> of code all over the backend. We just did a big old refactoring to
> try to get all the rename and alter owner commands to go through the
> same code path, but if this patch is any indication it has not done us
> a whole lot of good. The whole idea of all that work is that when
> people wanted to make systematic changes to those operations (like
> involving sepgsql, or returning the OID) there would not be 47 places
> to change. Apparently, we aren't there yet. Maybe we need some more
> refactoring of that stuff before tackling this?

Its hard to do such refactorings without very concrete use-cases, so I
think its not unlikely that this patch points out some things that can
be committed independently.
ISTM that a good part of the "return oid;" changes are actually such a
part.

> Does anyone object to the idea of making every command that creates a
> new SQL object return the OID of an object, and similarly for RENAME /
> ALTER TO? If so, maybe we ought to go through and do those things
> first, as separate patches, and then rebase this over those changes
> for simplicity. I can probably do that real soon now, based on this
> patch, if there are no objections, but I don't want to do the work and
> then have someone say, ack, what have you done?

FWIW I am all for it.

> I'm basically implacably opposed to the ddl_rewrite.c part of this
> patch. I think that the burden of maintaining reverse-parsing code
> for all the DDL we support is an unacceptable level of maintenance
> overhead for this feature. It essentially means that every future
> change to gram.y that affects any of the supported commands will
> require a compensating change to ddl_rewrite.c. That is a whole lot
> of work and it is almost guaranteed that future patch authors will
> sometimes fail to do it correctly.

Thats - unsurprisingly - where I have a different opinion. ruleutils.c
does that for normal SQL and while DDL is a bit bigger in the grammar
than the DQL support I would say the data structures of of the latter
are far more complex than for DDL.
If you look at what changes were made to ruleutils.c in the last years
the rate of bugs isn't, as far as I can see, higher than in other areas
of the code.
The ruleutils.c precedent also means that patch authors *have* to be
aware of it already. Not too many features get introduced that only
change DDL.

> At an absolute bare minimum I
> think we would need some kind of comprehensive test suite for this
> functionality, as opposed to the less than 60 lines of new test cases
> this patch adds which completely fail to test this code in any way at
> all, but really I think we should just not have it.

Absolutely aggreed.

> It WILL get
> broken (if it isn't already) and it WILL slow down future development
> of SQL features. It also WILL have edge cases where it doesn't work
> properly, such as where the decision of the actual index name to use
> is only decided at execution time and cannot be inferred from the
> parse tree.

That obviousl needs to be solved, but I think the idea of invoking this
command trigger whenever its appropriate (which I think was actually
your idea?) should take care of most of the problems around this.

> I know that you feel that all of these are tolerable costs, but I 100%
> disgaree.

I think that missing DDL support is one of the major weaknesses of all
potential "logical" replication solutions. Be it slonly, londiste, BDR,
or even something what some day may be in core.
I just don't see any more realistic way to fix that besides supporting
something like this.

> I predict that if we commit this, it will
> becomes a never-ending and irrecoverable font of trouble.

Why? Its really not that different from whats done currently?

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Event Triggers: adding information
Date: 2012-12-23 22:09:14
Message-ID: CA+TgmoZjOACnHOKrz-eucj3FJXrYJzA-_yjSqajOQR8qkjCi=w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Dec 21, 2012 at 11:35 AM, Dimitri Fontaine
<dimitri(at)2ndquadrant(dot)fr> wrote:
> It's hard to devise exactly what kind of refactoring we need to do
> before trying to write a patch that benefits from it, in my experience.
> In some cases the refactoring will make things more complex, not less.

Perhaps. I have a feeling there's some more simplification that can
be done here, somehow.

> That's a good idea. I only started quite late in that patch to work on
> the ObjectID piece of information, that's why I didn't split it before
> doing so.

That's fine. I've committed that part. I think the remainder of the
patch is really several different features that ought to be split
apart and considered independently. We may want some but not others,
or some may be ready to go in but not others.

> Also, keep in mind we want the ObjectID in all CREATE, ALTER and DROP
> statements, so my current patch is still some bricks shy of a load… (I
> added ObjectID only in the commands I added rewrite support for, apart
> from DROP).

I shall rely on you to provide those bricks which are still missing.

>> change to gram.y that affects any of the supported commands will
>> require a compensating change to ddl_rewrite.c. That is a whole lot
>> of work and it is almost guaranteed that future patch authors will
>> sometimes fail to do it correctly. At an absolute bare minimum I
>> think we would need some kind of comprehensive test suite for this
>> functionality, as opposed to the less than 60 lines of new test cases
>> this patch adds which completely fail to test this code in any way at
>> all, but really I think we should just not have it. It WILL get
>
> I had a more complete test suite last rounds and you did oppose to it
> for good reasons. I'm ok to revisit that now, and I think the test case
> should look like this:
>
> - install an auditing command trigger that will insert any DDL command
> string into a table with a sequence ordering
>
> - select * from audit
>
> - \d… of all objects created
>
> - drop schema cascade
>
> - DO $$ loop for sql in select command from audit do execute sql …
>
> - \d… of all objects created again
>
> Then any whacking of the grammar should alter the output here and any
> case that's not supported will fail too.
>
> We might be able to have a better way of testing the feature here, but I
> tried to stay into the realms of what I know pg_regress able to do.

I was thinking that we might need to go beyond what pg_regress can do
in this case. For example, one idea would be to install an audit
trigger that records all the DDL that happens during the regression
tests. Then, afterwards, replay that DDL into a new database. Then
do a schema-only dump of the old and new databases and diff the dump
files. That's a little wacky by current community standards but FWIW
EDB has a bunch of internal tests that we run to check our proprietary
stuff; some of them work along these lines and it's pretty effective
at shaking out bugs.

In this particular case, it would significantly reduce the maintenance
burden of putting a proper test framework in place, because we can
hope that the regression tests create (and leave around) at least one
object of any given type, and that any new DDL features will be
accompanied by similar regression tests. We already rely on the
regression database for pg_upgrade testing, so if it's not complete,
it impacts pg_upgrade test coverage as well.

>> broken (if it isn't already) and it WILL slow down future development
>> of SQL features. It also WILL have edge cases where it doesn't work
>> properly, such as where the decision of the actual index name to use
>> is only decided at execution time and cannot be inferred from the
>> parse tree. I know that you feel that all of these are tolerable
>
> The way to solve that, I think, as Tom said, is to only rewrite the
> command string when the objects exist in the catalogs. That's why we now
> have ddl_command_trace which is an alias to either start or end
> depending on whether we're doing a DROP or a CREATE or ALTER operation.

Hmm. I have to study that more, maybe. I certainly agree that if you
can look at the catalogs, you should be able to reliably reconstruct
what happened - which isn't possible just looking at the parse tree.
However, it feels weird to me to do something that's partly based on
the parse tree and partly based on the catalog contents. Moreover,
the current pre-create hook isn't the right place to gather
information from the catalogs anyway as it happens before locks have
been taken.

Now, there's another thing that is bothering me here, too. How
complete is the support you've implemented in this patch? Does it
cover ALL CREATE/ALTER/DROP commands, including all options to and all
variants of those commands? Because while I'm not very sanguine about
doing this at all, it somehow feels like doing it only halfway would
be even worse.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Event Triggers: adding information
Date: 2012-12-25 15:34:09
Message-ID: m21uee6t8u.fsf@2ndQuadrant.fr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> That's a good idea. I only started quite late in that patch to work on
>> the ObjectID piece of information, that's why I didn't split it before
>> doing so.
>
> That's fine. I've committed that part. I think the remainder of the
> patch is really several different features that ought to be split
> apart and considered independently. We may want some but not others,
> or some may be ready to go in but not others.

Thank you for this partial commit, and Simon and Andres to fill in the
gaps. I should mention that the missing header parts were all in my
patch, and that headers hacking is proving suprisingly uneasy.

I've been having several rounds of problems with the gcc tool chain when
modifying headers. Worst case has been a successful compiling followed
by random errors. Then gdb was not giving any clue (botched memory when
returned from a function, palloc'ed memory not freed yet). After
spending more time on it that I care to admit, I did a `make
maintainer-clean`, built again and the problem disappeared all by
itself.

The gcc tool chain is not my favorite developer environment.

>> Also, keep in mind we want the ObjectID in all CREATE, ALTER and DROP
>> statements, so my current patch is still some bricks shy of a load… (I
>> added ObjectID only in the commands I added rewrite support for, apart
>> from DROP).
>
> I shall rely on you to provide those bricks which are still missing.

Cool, will prepare a separate patch implementing that API now, and base
the following patches on top of that. My thinking is to do as following:

- API patch to get Oid from all CREATE/ALTER commands
- API patch for getting the Oid(s) in (before) DROP commands
- Event Trigger Infos patch, depending on the previous ones
- Command String Rewrite and its publishing in Event Triggers

Of course for the DROP parts, we'd better have the Oid and use it before
the command runs through completion or it will not be usable from the
event trigger (the target is ddl_command_start in that case).

>> We might be able to have a better way of testing the feature here, but I
>> tried to stay into the realms of what I know pg_regress able to do.
>
> I was thinking that we might need to go beyond what pg_regress can do
> in this case. For example, one idea would be to install an audit
> trigger that records all the DDL that happens during the regression
> tests. Then, afterwards, replay that DDL into a new database. Then
> do a schema-only dump of the old and new databases and diff the dump
> files. That's a little wacky by current community standards but FWIW
> EDB has a bunch of internal tests that we run to check our proprietary
> stuff; some of them work along these lines and it's pretty effective
> at shaking out bugs.

Are you in a position to contribute those parts to the community?
Implementing them again then having to support two different variants of
them does not look like the best use of both our times.

> Hmm. I have to study that more, maybe. I certainly agree that if you
> can look at the catalogs, you should be able to reliably reconstruct
> what happened - which isn't possible just looking at the parse tree.

Well in fact we're looking at the parsetree once edited to host the
exact data that goes into the catalogs. In most cases that data is
easily available to further consumption, or it's possible to use the
transform API without touching catalogs again.

In some places I'm doing the same processing twice, which I don't like
very much, but it's only happening if an Event Trigger is going to fire
so I though we could optimize that later. The aim here has been to limit
the changes to review, it looks like we have enough of them already.

Concerned parts are dealing with raw and cooked expressions for CHECK
and DEFAULT and WHERE clauses (create table, create index, alter table,
create constraint, create domain).

> However, it feels weird to me to do something that's partly based on
> the parse tree and partly based on the catalog contents. Moreover,
> the current pre-create hook isn't the right place to gather
> information from the catalogs anyway as it happens before locks have
> been taken.

So all the information is derived from the parse tree. The trick is that
this information is only valid once it has hit the catalogs (think of a
CHECK constraint in a CREATE TABLE statement, referencing some column
attribute, same with a DEFAULT expression depending on another col).

The case where we should certainly prefer looking at the catalog caches
are when we want to know the actual schema where the object has been
created. The current patch is deriving that information mostly from the
parse tree, using the first entry of the search_path when the schema is
not given in the command. It's ok because we are still in the same
transaction and no command has been able to run in between the user
command and our lookup, but I could easily get convinced to look up the
catalogs instead, even more so once we have the OID of the object easily
available in all places.

> Now, there's another thing that is bothering me here, too. How
> complete is the support you've implemented in this patch? Does it
> cover ALL CREATE/ALTER/DROP commands, including all options to and all
> variants of those commands? Because while I'm not very sanguine about
> doing this at all, it somehow feels like doing it only halfway would
> be even worse.

This patch is implementing complete support for all commands for most of
the information, and partial support for some information that comes
from the parse tree analysis. The goal for this commit fest is to agree
on how to do it, then finish the whole command set for the next one.

As decided at the hackers meeting, now is the time to address anything
controversial, once done we can "return with feedback". So that's my
goal for this commit fest, concerning the Normalized Command String.

Given your review, I think the way forward is pretty clear now, and my
plan is as following:

- commit Oid related API changes in this commit fest
- commit Event Trigger Information changes in this commit fest
- agree on how to tackle exposing the normalized command string

Merry Christmas!
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


From: Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Event Triggers: adding information
Date: 2012-12-25 18:42:24
Message-ID: m2bodi55yn.fsf@2ndQuadrant.fr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> Also, keep in mind we want the ObjectID in all CREATE, ALTER and DROP
>> statements, so my current patch is still some bricks shy of a load… (I
>> added ObjectID only in the commands I added rewrite support for, apart
>> from DROP).
>
> I shall rely on you to provide those bricks which are still missing.

Please find attached a patch to change most functions called from
standard_ProcessUtility() to return an Oid (passes `make
maintainer-clean; configure; make install check`). Most of them only,
because it only make sense for functions touching an object that exists
in the catalogs and have a distinct Oid.

That completes ALTER and CREATE ObjectID support, I did nothing about
the DROP case in the attached. The way I intend to solve that problem is
using get_object_address() and do an extra lookup from within the Event
Trigger code path.

Yes that's an extra lookup, the only alternative that I see would be
another giant refactoring so that all and any DROP support function is
split in a lookup and lock part first, then the actual DROP. I don't see
that as a particular win either, as the advantage of doing a separate
(extra) lookup in the ddl_command_start Event Trigger is that the actual
DROP code then will check that the target still exists.

The attached patch is known to miss support for ExecCreateTableAs
because I wanted to keep the ball rolling and couldn't figure out where
to find the target relation id in time. If you feel like filling that
gap that would be awesome, if not I will take care of that later, either
in a version v1 of that patch, or in a follow-up patch, or embedded into
the next patch to come, the cleaned up Event Trigger Info patch without
Normalized Command String support. As you see fit.

Merry Christmas All,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

Attachment Content-Type Size
oid-api.0.patch.gz application/octet-stream 10.7 KB

From: Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Event Triggers: adding information
Date: 2012-12-27 16:22:36
Message-ID: m2obhfwjlf.fsf@2ndQuadrant.fr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr> writes:
> Cool, will prepare a separate patch implementing that API now, and base
> the following patches on top of that. My thinking is to do as following:
>
> - API patch to get Oid from all CREATE/ALTER commands
> - API patch for getting the Oid(s) in (before) DROP commands
> - Event Trigger Infos patch, depending on the previous ones
> - Command String Rewrite and its publishing in Event Triggers
>
> Of course for the DROP parts, we'd better have the Oid and use it before
> the command runs through completion or it will not be usable from the
> event trigger (the target is ddl_command_start in that case).

Please find attached two versions of the same patch implementing added
information for Event Triggers, without any support for command strings
normalizing.

- event_trigger_infos.5.patch.gz
applies on top of current's master branch

- event_trigger_infos_partial.5.patch.gz
applies on top of oid-api.0.patch.gz applied to current's master
branch

I guess you're going to use the partial one, but if you want to run a
quick testing the aggregated patch might make sense.

The way I've been adding schema and object names is to do extra lookups
to the system caches (or the catalogs when we don't have caches), so
that the information is know to be correct in all cases. That means I
had to add a CommandCounterIncrement(), and that the information is only
available when the object actually exists (use ddl_command_trace).

Also, in the DROP case, the extra lookup that happens before the DROP
command is done with a ShareUpdateExclusiveLock, so that the object
doesn't disappear from concurrent activity while we run the User Defined
Trigger. Should we be concerned about deadlock possibilities here when
later the transaction will need to upgrade its Lock?

Given lock upgrade concerns, I guess there's not so much we can do apart
from taking an AccessExclusiveLock from the get go in the case of a DROP
Event Trigger at ddl_command_start, right?

> Are you in a position to contribute those parts to the community?
> Implementing them again then having to support two different variants of
> them does not look like the best use of both our times.

I didn't include new testing in that patch version, waiting for your
answer.

> As decided at the hackers meeting, now is the time to address anything
> controversial, once done we can "return with feedback". So that's my
> goal for this commit fest, concerning the Normalized Command String.
>
> Given your review, I think the way forward is pretty clear now, and my
> plan is as following:
>
> - commit Oid related API changes in this commit fest
> - commit Event Trigger Information changes in this commit fest
> - agree on how to tackle exposing the normalized command string

Including this email now, the first two items are getting addressed
(pending review then commit, at least that's the goal here). Let's not
forget the third point: we need to reach an agreement on how to address
the Command String Normalisation, too.

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

Attachment Content-Type Size
event_trigger_infos.5.patch.gz application/octet-stream 30.4 KB
event_trigger_infos_partial.5.patch.gz application/octet-stream 20.0 KB

From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Event Triggers: adding information
Date: 2012-12-27 18:42:12
Message-ID: CA+U5nMJ3Vn0weepNRvEyLVi8W6NdcapTE62nd+CW9dXxtAgrpQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 21 December 2012 14:48, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> I predict that if we commit this, it will
> becomes a never-ending and irrecoverable font of trouble.

Event triggers are a new feature with many powerful uses: replication,
online upgrade, data transport.

New features require extra code. To decide how big a maintenance
burden, we need to look at whether we want the feature, and if we have
it, how much code it would be to maintain it.

pg_dump needs to dump what is there, so dumping "current state"
information is required. That can be accessed from catalog, but
requires significant extra code to bring that all back together into
DDL text. pg_dump is useful for schema backups and pg_upgrade. This
code already exists in core.

Replication needs to see "delta" information, so we can track changes
as they happen. Trying to deduce exact deltas by diffing two "current
state" dumps is close to impossible. The only way is a "delta" stream.
This would be used by replication and on-line upgrade. This code can
be added to core by the patches under discussion.

The code is broadly comparable between "current state" and "delta".
It's basically just assembling the text of DDL statements from
available information. In one case from the catalog, in the other case
from the command parse/execution trees. It's not especially hard to
code, nor will it be fiddly or difficult to understand - just
developing normalised text versions of the change in progress.
Normalisation is important and useful; we spent time in 9.2 doing
things the right way for pg_stat_statements.

In both cases, the code really should live in core. So we can tie the
code directly to each release. Realistically, doing things outside
core means there are often holes or problems that cannot be easily
solved. That seems almost certain to be the case here.

pg_dump maintenance is required but its not a huge burden. pg_dump
hasn't slowed down the addition of new DDL features. It just requires
a little extra code, mostly very simple.

The PostgreSQL project needs rewrite support for DDL, so it will come
to exist inside, or outside core. Knowing it is there, being able to
rely on that, means we'll make faster progress towards the new
features we are building. The approach proposed here has been
discussed over many years, minuted in detail at cluster hackers
meetings: "DDL Triggers".

My thoughts are that the approach here has been well thought through
and is exactly what we need. It *is* extra code, but not complex code,
and it is code designed to bring us closer to our goals, not to hamper
the development of new DDL statements. With the right tests, the
changes required would be easy to spot and add code for them. It looks
to me that we already have a good suggestion on exactly how to do that
(from Robert).

So overall, it certainly is worth thinking hard about whether we
should include the rewrite code. Balancing the costs and the benefits,
I think we should include.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Event Triggers: adding information
Date: 2012-12-29 12:48:36
Message-ID: CA+TgmobrEg1jWH6K0tHpb0O1=NvTV-+g9g=MCeLF03w5M39x9Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Dec 25, 2012 at 10:34 AM, Dimitri Fontaine
<dimitri(at)2ndquadrant(dot)fr> wrote:
> Thank you for this partial commit, and Simon and Andres to fill in the
> gaps. I should mention that the missing header parts were all in my
> patch, and that headers hacking is proving suprisingly uneasy.

Apologies to all about the missed bits. I believe that I had them in
my version of the patch as well, but I think the changes ended up
unstaged in my repository rather than folded into the patch. I'm
usually smarter than that, but, obviously, not this time.

>> I was thinking that we might need to go beyond what pg_regress can do
>> in this case. For example, one idea would be to install an audit
>> trigger that records all the DDL that happens during the regression
>> tests. Then, afterwards, replay that DDL into a new database. Then
>> do a schema-only dump of the old and new databases and diff the dump
>> files. That's a little wacky by current community standards but FWIW
>> EDB has a bunch of internal tests that we run to check our proprietary
>> stuff; some of them work along these lines and it's pretty effective
>> at shaking out bugs.
>
> Are you in a position to contribute those parts to the community?
> Implementing them again then having to support two different variants of
> them does not look like the best use of both our times.

If I thought there were some useful code, I would try to see if we
could contribute it, but I'm pretty sure there isn't. We have a bunch
of pg_dump tests, but their intended purpose is to verify that our
proprietary stuff dumps and reloads properly, which is not what we
need here. The reason I mentioned it was just to make the point that
it is possible to have a large suite of tests that goes beyond what is
possible with pg_regress, and that this can be a useful way of finding
out whether you've broken things. In our case, we actually cheat
pretty hard by using things like \! pg_dump in the .sql files, so it
actually ends up going through pg_regress after all, technically
anyway. I'm not quite sure what to think about that approach from a
community point of view. It certainly ends up reducing the number of
new things that you need to invent in order to do new kinds of
testing, but it's still not as flexible as I would like - for example,
I don't see a way to do the exact kind of testing we need here in a
cross-platform way using that infrastructure.

I'm tempted to propose that we define a Perl-based testing API for
"additional regression tests" that can't be handled using pg_regress.
Like: we put a bunch of Perl modules in a certain directory, and then
write a script that goes through and do

require $module;
$module->regress;

...on each one, based on some schedule file. If a test passes or
fails, it calls some API that we provide to report the result. That
way, whatever tests we write for this effort can potentially become
the core of a larger test suite, and by basing it on Perl we add no
new tool dependencies and can (hopefully) run on both Linux and
Windows. I'm open up to other ideas, of course.

> The case where we should certainly prefer looking at the catalog caches
> are when we want to know the actual schema where the object has been
> created. The current patch is deriving that information mostly from the
> parse tree, using the first entry of the search_path when the schema is
> not given in the command. It's ok because we are still in the same
> transaction and no command has been able to run in between the user
> command and our lookup, but I could easily get convinced to look up the
> catalogs instead, even more so once we have the OID of the object easily
> available in all places.

I haven't looked at the code, but it seems to me that there have got
to be edge cases where that will fail. For one thing, we only do
pretty minimal locking on namespaces, so I'm not at all sure that
someone couldn't (for example) rename the old namespace out of the
way, thus causing one search_path lookup to find the first schema in
the search_path and the other lookup to find the second schema in the
search_path. Frankly, I think we have some hazards of that type in
the DDL code as it stands, so it's not like nobody's ever made that
mistake before, but it would be nice not to propagate it.

Really, the only safe way to do these things is to have any given name
lookup done in one and only one place. We're some distance from there
and there and I don't know that this patch should be required to carry
the burden of nailing down all the loose ends in that area, but I
think it's reasonable to insist that it shouldn't do anything which
makes it impossible for someone else to nail down those loose ends,
which is still fairly high on my personal to-do list.

The other danger here is - what exactly do you mean by "no command has
been able to run between the user command and our lookup"? Because
you can do stupid things with functions like set_config(). This would
only be safe if no user-provided expressions can possibly get
evaluated between point A and point B, and that strikes me as the sort
of thing that could easily be false unless this is all done VERY close
to the start of processing.

A belated Merry Christmas to you as well.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Event Triggers: adding information
Date: 2012-12-29 13:11:00
Message-ID: CA+TgmoZTxXGAkoyQD0aNOyju1X19SnVfk7KV9fsJCv63bS-C4g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Dec 25, 2012 at 1:42 PM, Dimitri Fontaine
<dimitri(at)2ndquadrant(dot)fr> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>>> Also, keep in mind we want the ObjectID in all CREATE, ALTER and DROP
>>> statements, so my current patch is still some bricks shy of a load… (I
>>> added ObjectID only in the commands I added rewrite support for, apart
>>> from DROP).
>>
>> I shall rely on you to provide those bricks which are still missing.
>
> Please find attached a patch to change most functions called from
> standard_ProcessUtility() to return an Oid (passes `make
> maintainer-clean; configure; make install check`). Most of them only,
> because it only make sense for functions touching an object that exists
> in the catalogs and have a distinct Oid.

OK, I committed this.

> That completes ALTER and CREATE ObjectID support, I did nothing about
> the DROP case in the attached. The way I intend to solve that problem is
> using get_object_address() and do an extra lookup from within the Event
> Trigger code path.

An extra lookup that occurs always, or only when event triggers are in
use? A re-resolution of the name, or some other kind of lookup?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Event Triggers: adding information
Date: 2012-12-29 13:41:48
Message-ID: m2r4m9ou03.fsf@2ndQuadrant.fr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> If I thought there were some useful code, I would try to see if we
> could contribute it, but I'm pretty sure there isn't. We have a bunch

Oh. Too bad.

[... description of the tool ...]
> I don't see a way to do the exact kind of testing we need here in a
> cross-platform way using that infrastructure.

Well, do we have \! in psql in windows at all? does it work in a similar
way to the unix one? I just don't know.

> I'm tempted to propose that we define a Perl-based testing API for
> "additional regression tests" that can't be handled using pg_regress.

I know that our build requirements are pretty stable for very good
reasons and I appreciate that. Now, I don't do perl. I'll happily use
whatever you come up with, though.

>> command and our lookup, but I could easily get convinced to look up the
>> catalogs instead, even more so once we have the OID of the object easily
>> available in all places.
>
> I haven't looked at the code, but it seems to me that there have got
> to be edge cases where that will fail. For one thing, we only do

In between that email and the latest patch I've removed any processing
of the parse tree, because without the ddl_rewrite module it didn't make
much sense anyway, and I still want to be sure we have ObjectID, Name
and Schema as a base information set.

> Really, the only safe way to do these things is to have any given name
> lookup done in one and only one place. We're some distance from there
> and there and I don't know that this patch should be required to carry
> the burden of nailing down all the loose ends in that area, but I
> think it's reasonable to insist that it shouldn't do anything which
> makes it impossible for someone else to nail down those loose ends,
> which is still fairly high on my personal to-do list.

The current version of the patch now does its own lookup, and only does
so at a time when the object exists in the catalogs. So I think we're
now good on that front in the current Event Triggers patch, even if it's
not refactoring all and any DDL to be able to add hooks in between
lookups and locks, for example.

Also, several cases appear to need an extra lookup anyway, such as ALTER
OBJECT … RENAME TO … or ALTER OBJECT … SET SCHEMA … when you want to
feed information to a ddl_command_end Event Trigger…

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


From: Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Event Triggers: adding information
Date: 2012-12-29 14:02:41
Message-ID: m27go1ot1a.fsf@2ndQuadrant.fr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> OK, I committed this.

Thanks. Please find attached a rebased patch, v6. I think it's looking
more like what you would expect now:

COLUMNS=70 git diff --stat postgres/master..
doc/src/sgml/event-trigger.sgml | 196 +++-
doc/src/sgml/plpgsql.sgml | 95 +-
doc/src/sgml/ref/create_event_trigger.sgml | 20 +-
src/backend/catalog/objectaddress.c | 22 +-
src/backend/commands/event_trigger.c | 870 ++++++++++++++++-
src/backend/commands/trigger.c | 39 +
src/backend/commands/typecmds.c | 2 +-
src/backend/tcop/utility.c | 385 +++++---
src/backend/utils/cache/evtcache.c | 15 +
src/bin/pg_dump/pg_dump.c | 21 +-
src/bin/pg_dump/pg_dump.h | 1 +
src/bin/psql/describe.c | 9 +-
src/include/catalog/objectaddress.h | 21 +
src/include/catalog/pg_event_trigger.h | 2 +
src/include/commands/event_trigger.h | 47 +-
src/include/commands/trigger.h | 1 +
src/include/utils/evtcache.h | 6 +-
src/pl/plpgsql/src/pl_comp.c | 48 +
src/pl/plpgsql/src/pl_exec.c | 57 +-
src/pl/plpgsql/src/plpgsql.h | 6 +
src/test/regress/expected/event_trigger.out | 53 +-
src/test/regress/sql/event_trigger.sql | 51 +-
22 files changed, 1700 insertions(+), 267 deletions(-)

>> That completes ALTER and CREATE ObjectID support, I did nothing about
>> the DROP case in the attached. The way I intend to solve that problem is
>> using get_object_address() and do an extra lookup from within the Event
>> Trigger code path.
>
> An extra lookup that occurs always, or only when event triggers are in
> use? A re-resolution of the name, or some other kind of lookup?

Only when event triggers are in use, and when the object is known to
exists in the catalogs (ddl_command_start for a DROP operation, or
ddl_command_end for a CREATE or ALTER operation). And it's only a name
resolution using the catcache when it exists, or a catalog index scan
when we have to. The key we have is the OID.

The OID is filled in from utility.c in all CREATE and ALTER operations
that we are interested into, and in the case of a DROP operation we do
either a RangeVarGetRelid for relations or get_object_address() for the
other object kinds.

Given the OID, we use the ObjectProperty[] structure to know which cache
or catalog to scan in order to get the name and namespace of the object.

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

Attachment Content-Type Size
event_trigger_infos.6.patch.gz application/octet-stream 20.0 KB

From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Event Triggers: adding information
Date: 2012-12-29 14:47:26
Message-ID: 50DF027E.10303@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On 12/29/2012 08:41 AM, Dimitri Fontaine wrote:
> Well, do we have \! in psql in windows at all? does it work in a similar
> way to the unix one? I just don't know.
>
>

Of course we do, Why ever would you think we don't? The command must be
written in the Windows shell language, but the behaviour is the same. If
system() and popen() didn't work on Windows we'd be in serious trouble
having a Windows port at all.

cheers

andrew


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Event Triggers: adding information
Date: 2012-12-29 16:29:49
Message-ID: 11134.1356798589@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> The other danger here is - what exactly do you mean by "no command has
> been able to run between the user command and our lookup"? Because
> you can do stupid things with functions like set_config(). This would
> only be safe if no user-provided expressions can possibly get
> evaluated between point A and point B, and that strikes me as the sort
> of thing that could easily be false unless this is all done VERY close
> to the start of processing.

To me, the largest single risk of the whole event triggers feature is
precisely that it will result in random user-provided code getting
executed in fairly random places, thus breaking assumptions of this type
that may be hard or impossible to fix. But given that allowing that
is more or less exactly the point of the feature, I'm not sure why
you're blaming the patch for it. It should have been rejected on day
one if you're not willing to have that type of risk.

regards, tom lane


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Event Triggers: adding information
Date: 2012-12-30 04:13:47
Message-ID: 1356840827.20017.15.camel@vanquo.pezone.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, 2012-12-29 at 08:11 -0500, Robert Haas wrote:
> On Tue, Dec 25, 2012 at 1:42 PM, Dimitri Fontaine
> <dimitri(at)2ndquadrant(dot)fr> wrote:
> > Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> >>> Also, keep in mind we want the ObjectID in all CREATE, ALTER and DROP
> >>> statements, so my current patch is still some bricks shy of a load… (I
> >>> added ObjectID only in the commands I added rewrite support for, apart
> >>> from DROP).
> >>
> >> I shall rely on you to provide those bricks which are still missing.
> >
> > Please find attached a patch to change most functions called from
> > standard_ProcessUtility() to return an Oid (passes `make
> > maintainer-clean; configure; make install check`). Most of them only,
> > because it only make sense for functions touching an object that exists
> > in the catalogs and have a distinct Oid.
>
> OK, I committed this.

There is a new compiler warning coming from this, I believe:

copy.c: In function ‘DoCopy’:
copy.c:835:2: error: ‘relid’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
return relid;
^
copy.c:753:6: note: ‘relid’ was declared here
Oid relid;
^


From: Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Event Triggers: adding information
Date: 2012-12-30 12:18:32
Message-ID: m28v8fn36v.fsf@2ndQuadrant.fr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

Peter Eisentraut <peter_e(at)gmx(dot)net> writes:
> There is a new compiler warning coming from this, I believe:

I don't get this warning here, at least not in -O2, and I did forget to
make maintainer-clean then rebuild with -O0 just in case gcc is now
complaining about something else. I wish this situation could be fixed.

Now, this warning needs to be fixed, too, and here's a patch for that.

Thanks, regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

Attachment Content-Type Size
copy-warning-fix.patch text/x-patch 487 bytes

From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Event Triggers: adding information
Date: 2012-12-31 05:15:38
Message-ID: 1356930938.4258.0.camel@vanquo.pezone.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, 2012-12-30 at 13:18 +0100, Dimitri Fontaine wrote:
> Peter Eisentraut <peter_e(at)gmx(dot)net> writes:
> > There is a new compiler warning coming from this, I believe:
>
> I don't get this warning here, at least not in -O2, and I did forget to
> make maintainer-clean then rebuild with -O0 just in case gcc is now
> complaining about something else. I wish this situation could be fixed.

I get this warning in Debian squeeze and wheezy with fairly standard
build options. I don't think -O0 will give you any more warnings.

> Now, this warning needs to be fixed, too, and here's a patch for that.

Thanks, I fixed it slightly differently.


From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Event Triggers: adding information
Date: 2013-01-04 20:52:51
Message-ID: 20130104205251.GB9191@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Dimitri Fontaine escribió:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> > OK, I committed this.
>
> Thanks. Please find attached a rebased patch, v6. I think it's looking
> more like what you would expect now:

I gave this patch a look. I'm not done with it; I mostly gave the
utility.c changes some love so that the end result is not as cluttered.
I did this by creating a couple of macros that call the Start() thing,
then set the OID, then call the End() thing. This made pretty obvious
the places that were missing to set the object OID; I changed the CREATE
TABLE AS code to return it, for instance. The patch I attach applies on
top of Dimitri's v6 patch. With these changes, it's much easier to
review the big standard_ProcessUtility() switch and verify cases that
are being handled correctly. (A few places cannot use the macros I
defined because they use more complex arrangements. This is fine, I
think, because they are few enough that can be verified manually.)

I also noticed that ALTER DEFAULT PRIVILEGES was trying to fire event
triggers, even though we don't support GRANT and REVOKE; it seems to me
that the three things out to be supported together. I'm told David
Fetter would call this "DCL support", and we're not supporting DCL at
this stage. So DEFAULT PRIVILEGES ought to be not supported.

I also observed that the SECURITY LABEL commands does not fire event
triggers; I think it should. But then maybe this can be rightly called
DCL as well, so perhaps it's all right to leave it out.

DROP OWNED is not firing event triggers. There is a comment therein
that says "we don't fire them for shared objects", but this is wrong:
this command is dropping local objects, not shared, so it seems
necessary to me that triggers are fired.

I also noticed that some cases such as DROP <whatever> do not report the
OIDs of all the objects that are being dropped, only one of them. This
seems bogus to me; if you do "DROP TABLE foo,bar" then both tables
should be reported.

> Given the OID, we use the ObjectProperty[] structure to know which cache
> or catalog to scan in order to get the name and namespace of the object.

The changes to make ObjectPropertyType visible to code outside
objectaddress.c look bogus to me. I think we should make this new code
use the interfaces we already have.

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Event Triggers: adding information
Date: 2013-01-04 20:57:16
Message-ID: 20130104205716.GA14134@perhan.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Alvaro Herrera escribió:

> I gave this patch a look. I'm not done with it; I mostly gave the
> utility.c changes some love so that the end result is not as cluttered.
> I did this by creating a couple of macros that call the Start() thing,
> then set the OID, then call the End() thing. This made pretty obvious
> the places that were missing to set the object OID; I changed the CREATE
> TABLE AS code to return it, for instance. The patch I attach applies on
> top of Dimitri's v6 patch.

... and here's the attachment.

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment Content-Type Size
event_trigger_infos.6.alvherre1.patch text/plain 27.4 KB

From: Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Event Triggers: adding information
Date: 2013-01-04 21:49:52
Message-ID: m28v88pqin.fsf@2ndQuadrant.fr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> writes:
> I gave this patch a look. I'm not done with it; I mostly gave the
> utility.c changes some love so that the end result is not as cluttered.

Thanks Álvaro!

> I did this by creating a couple of macros that call the Start() thing,
> then set the OID, then call the End() thing. This made pretty obvious
> the places that were missing to set the object OID; I changed the CREATE
> TABLE AS code to return it, for instance. The patch I attach applies on
> top of Dimitri's v6 patch. With these changes, it's much easier to
> review the big standard_ProcessUtility() switch and verify cases that
> are being handled correctly. (A few places cannot use the macros I
> defined because they use more complex arrangements. This is fine, I
> think, because they are few enough that can be verified manually.)

That sounds great. I'm not at ease with creating avdanved C macros and
I'm very happy you did it :)

> I also noticed that ALTER DEFAULT PRIVILEGES was trying to fire event
> triggers, even though we don't support GRANT and REVOKE; it seems to me
> that the three things out to be supported together. I'm told David
> Fetter would call this "DCL support", and we're not supporting DCL at
> this stage. So DEFAULT PRIVILEGES ought to be not supported.

Agreed. Do you want me to edit my patch to remove support for that
command and the other arrangement we are talking about, or do you want
to continue having at it?

> I also observed that the SECURITY LABEL commands does not fire event
> triggers; I think it should. But then maybe this can be rightly called
> DCL as well, so perhaps it's all right to leave it out.

I would agree with calling that a DCL.

> DROP OWNED is not firing event triggers. There is a comment therein
> that says "we don't fire them for shared objects", but this is wrong:
> this command is dropping local objects, not shared, so it seems
> necessary to me that triggers are fired.

I think the best way to address DROP OWNED is the same as to address
DROP CASCADE: have the inner code (doDeletion, I think) prepare another
DropStmt and give control back to ProcessUtility() with a context of
PROCESS_UTILITY_CASCADE.

The good news is that the unified DropStmt support allows us to think
about that development, even if it still is a non trivial refactoring.

I would like to hear that we can consider the current patch first then
refactor the drop cascade operations so that we automatically have full
support for DROP OWNED.

> I also noticed that some cases such as DROP <whatever> do not report the
> OIDs of all the objects that are being dropped, only one of them. This
> seems bogus to me; if you do "DROP TABLE foo,bar" then both tables
> should be reported.

When we have the DROP CASCADE refactoring, we could maybe implement drop
on a list of objects as getting back to ProcessUtility() also, with yet
another context (SUBCOMMAND or maybe GENERATED would do here).

The other way to address the information problem would be to expose it
as a relation (resultset, setof record, cursor, you name it) variable to
PLpgSQL.

Again, I hope we can decide on the approach now, commit what we have and
expand on it on the next commit fest.

>> Given the OID, we use the ObjectProperty[] structure to know which cache
>> or catalog to scan in order to get the name and namespace of the object.
>
> The changes to make ObjectPropertyType visible to code outside
> objectaddress.c look bogus to me. I think we should make this new code
> use the interfaces we already have.

We need to move the new fonction get_object_name_and_namespace() into
the objectaddress.c "module" then, and decide about returning yet
another structure (because we have two values to return) or to pass that
function a couple of char **. Which devil do you prefer?

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Event Triggers: adding information
Date: 2013-01-09 16:58:29
Message-ID: 20130109165829.GB4490@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Dimitri Fontaine escribió:
> Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> writes:

> >> Given the OID, we use the ObjectProperty[] structure to know which cache
> >> or catalog to scan in order to get the name and namespace of the object.
> >
> > The changes to make ObjectPropertyType visible to code outside
> > objectaddress.c look bogus to me. I think we should make this new code
> > use the interfaces we already have.
>
> We need to move the new fonction get_object_name_and_namespace() into
> the objectaddress.c "module" then, and decide about returning yet
> another structure (because we have two values to return) or to pass that
> function a couple of char **. Which devil do you prefer?

I made some changes to this, and I think the result (attached) is
cleaner overall.

Now, this review is pretty much unfinished as far as I am concerned;
mainly I've been trying to figure out how it all works and improving
some stuff as I go, and I haven't looked at the complete patch yet. We
might very well doing some things quite differently; for example I'm not
really sure of the way we handle CommandTags, or the way we do object
lookups at the event trigger stage, only to have it repeated later when
the actual deletion takes place. In particular, since event_trigger.c
does not know the lock strength that's going to be taken at deletion, it
only grabs ShareUpdateExclusiveLock; so there is lock escalation here
which could lead to deadlocks. This might need to be rethought. I
added a comment somewhere, noting that perhaps it's ProcessUtility's job
to set the object classId (or at least the ObjectType) at the same time
the TargetOid is being set, rather than have event_trigger.c figure it
out from the parse node. And so on. I haven't looked at plpgsql or
pg_dump yet, either.

We've been discussing on IM the handling of DROP in general. The
current way it works is that the object OID is reported only if the
toplevel command specifies to drop a single object (so no OID if you do
DROP TABLE foo,bar); and this is all done by standard_ProcessUtility.
This seems bogus to me, and it's also problematic for things like DROP
SCHEMA, as well as DROP OWNED BY (which, as noted upthread, is not
handled at all but should of course be). I think the way to do it is
have performDeletion and performMultipleDeletions (dependency.c) call
into the event trigger code, by doing something like this:

1. look up all objects to drop (i.e. resolve the list of objects
specified by the user, and complete with objects dependent on those)

2. iterate over this list, firing DROP at-start event triggers

3. do the actual deletion of objects (i.e. deleteOneObject, I think)

4. iterate again over the list firing DROP at-end event triggers

We would have one or more event triggers being called with a context of
TOPLEVEL (for objects directly mentioned in the command), and then some
more calls with a context of CASCADE. Exactly how should DROP OWNED BY
be handled is not clear; perhaps we should raise one TOPLEVEL event for
the objects directly owned by the role? Maybe we should just do CASCADE
for all objects? This is not clear yet.

Thoughts?

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment Content-Type Size
event_trigger_infos.7.alvherre1.patch.gz application/octet-stream 20.8 KB

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Event Triggers: adding information
Date: 2013-01-16 13:51:42
Message-ID: CA+TgmobykpymdGyfbCDEfS6+-50X-AKYTdK88fjMS4-mY5SBdw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Jan 9, 2013 at 11:58 AM, Alvaro Herrera
<alvherre(at)2ndquadrant(dot)com> wrote:
> I made some changes to this, and I think the result (attached) is
> cleaner overall.
>
> Now, this review is pretty much unfinished as far as I am concerned;
> mainly I've been trying to figure out how it all works and improving
> some stuff as I go, and I haven't looked at the complete patch yet. We
> might very well doing some things quite differently; for example I'm not
> really sure of the way we handle CommandTags, or the way we do object
> lookups at the event trigger stage, only to have it repeated later when
> the actual deletion takes place. In particular, since event_trigger.c
> does not know the lock strength that's going to be taken at deletion, it
> only grabs ShareUpdateExclusiveLock; so there is lock escalation here
> which could lead to deadlocks. This might need to be rethought. I
> added a comment somewhere, noting that perhaps it's ProcessUtility's job
> to set the object classId (or at least the ObjectType) at the same time
> the TargetOid is being set, rather than have event_trigger.c figure it
> out from the parse node. And so on. I haven't looked at plpgsql or
> pg_dump yet, either.

I think this points to a couple of problems: this patch isn't
well-enough thought out, and it's got several features jammed into a
single patch. This should really be split up into several patches and
each one submitted separately.

I think that ddl_command_trace is an unnecessary frammish that should
be ripped out entirely. It is easy enough to accomplish the same
thing with ddl_command_start and ddl_command_end, and so we're just
increasing the number of cases we have to support in the future for no
real benefit.

> We've been discussing on IM the handling of DROP in general. The
> current way it works is that the object OID is reported only if the
> toplevel command specifies to drop a single object (so no OID if you do
> DROP TABLE foo,bar); and this is all done by standard_ProcessUtility.
> This seems bogus to me, and it's also problematic for things like DROP
> SCHEMA, as well as DROP OWNED BY (which, as noted upthread, is not
> handled at all but should of course be). I think the way to do it is
> have performDeletion and performMultipleDeletions (dependency.c) call
> into the event trigger code, by doing something like this:
>
> 1. look up all objects to drop (i.e. resolve the list of objects
> specified by the user, and complete with objects dependent on those)
>
> 2. iterate over this list, firing DROP at-start event triggers
>
> 3. do the actual deletion of objects (i.e. deleteOneObject, I think)
>
> 4. iterate again over the list firing DROP at-end event triggers

This gets right back to an argument Dimitri and I have been having
since v1 of this patch, which is whether these are command triggers or
event triggers. I think we ought to support both, but they are not
the same thing. I think the design you are proposing is just fine for
an event called sql_drop, but it makes no sense for an event called
ddl_command_end, which needs to be called once per DDL statement - at
the end. Not once per object dropped.

> We would have one or more event triggers being called with a context of
> TOPLEVEL (for objects directly mentioned in the command), and then some
> more calls with a context of CASCADE. Exactly how should DROP OWNED BY
> be handled is not clear; perhaps we should raise one TOPLEVEL event for
> the objects directly owned by the role? Maybe we should just do CASCADE
> for all objects? This is not clear yet.

TOPLEVEL is supposed to correspond to a complete SQL statement entered
by the user:

typedef enum
{
PROCESS_UTILITY_TOPLEVEL, /* toplevel
interactive command */
PROCESS_UTILITY_QUERY, /* a complete query,
but not toplevel */
PROCESS_UTILITY_SUBCOMMAND, /* a piece of a query */
PROCESS_UTILITY_GENERATED /* internally
generated node query node */
} ProcessUtilityContext;

IMHO, "DROP OWNED BY" probably shouldn't fire ddl_command_start/end.
I excluded it from ddl_command_start in the initial commit because
it's such a weird case, and it didn't seem to cleanly fit in the
framework. Now, that could be changed, but surely that should be a
separate patch if we're going to do it rather than lumped in with a
bunch of other dubious changes, and more than that, it just
underscores, again, the fact that we're pouring a lot of effort into
fleshing out ddl_command_start/end instead of doing what we probably
should be doing, which is adding events that are actually targeting
what we want to do, like sql_drop.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Event Triggers: adding information
Date: 2013-01-16 21:16:03
Message-ID: m2ehhkrfqk.fsf@2ndQuadrant.fr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> I think this points to a couple of problems: this patch isn't
> well-enough thought out, and it's got several features jammed into a
> single patch. This should really be split up into several patches and
> each one submitted separately.

Ok. Now I want to talk about our process a little. That's a 2 paragraphs
diversion, after that it's getting back to technical matters.

There's a difference between "it's not the way I would have done it" and
"the author didn't think about what he's doing". That's also the reason
why it's very hard to justify sending a polished enough patch as a non
commiter.

And then this patch is like the next one in a long series that is
lasting for about 2 years now, and spliting it is just more work for
everybody, and then you take the risk that the next commiter who looks
at the patch prefers to see a complete enough view of the goal you're
trying to reach.

> I think that ddl_command_trace is an unnecessary frammish that should
> be ripped out entirely. It is easy enough to accomplish the same
> thing with ddl_command_start and ddl_command_end, and so we're just
> increasing the number of cases we have to support in the future for no
> real benefit.

I think you might want to review the use case behing ddl_command_trace,
that has been asked by who users wanted to see their use case supported
in some easier way than just what you're talking about here.

> This gets right back to an argument Dimitri and I have been having
> since v1 of this patch, which is whether these are command triggers or
> event triggers. I think we ought to support both, but they are not
> the same thing. I think the design you are proposing is just fine for
> an event called sql_drop, but it makes no sense for an event called
> ddl_command_end, which needs to be called once per DDL statement - at
> the end. Not once per object dropped.

What I think you're missing here is the proposal flying around to have
drop operation get back to ProcessUtility so that C hooks and event
triggers both can have at it. It's been debated on the list earlier, way
before we talked about event triggers, also as a way to clean up things,
IIRC.

I agree that if we keep the code as it is to implement cascading
behavior, a sql_drop event makes more sense.

> TOPLEVEL is supposed to correspond to a complete SQL statement entered
> by the user:

Having the DROP command get back to ProcessUtility would allow us to
have the nested events as GENERATED, but CASCADE would allow an easier
way to just match that if that's what users are interested into.

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Event Triggers: adding information
Date: 2013-01-16 21:59:47
Message-ID: CA+TgmobfeB_YpgMv5NjW52tG36Hg_DY=jq0XDkE5zwCstPQ4pg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Jan 16, 2013 at 4:16 PM, Dimitri Fontaine
<dimitri(at)2ndquadrant(dot)fr> wrote:
> Ok. Now I want to talk about our process a little. That's a 2 paragraphs
> diversion, after that it's getting back to technical matters.
>
> There's a difference between "it's not the way I would have done it" and
> "the author didn't think about what he's doing". That's also the reason
> why it's very hard to justify sending a polished enough patch as a non
> commiter.

There's no rule that anyone's got to agree with my opinion on
anything, but the idea that a patch should do one thing and do it well
is not a novel concept on this mailing list, however much you may feel
otherwise. This patch:

- adds ddl_command_trace and ddl_command_end events
- causes those events to be called not only for actual SQL commands
but also for things that happen, at present, to go through the same
code path
- adds additional magic variables to PL/pgsql to expose new
information not previously exposed
- adds CONTEXT as an additional event-trigger-filter variable, along with TAG

In my mind, that's four or five separate patches. You don't have to
agree, but that's what I think, and I'm not going to apologize for
thinking it. Moreover, I think you will struggle to find examples of
patches committed in the last three years that made as many different,
only loosely-related changes as you're proposing for this one.

As for the patch not being well-thought-out, I'm sticking by that,
too. Alvaro's comments about lock escalations should be enough to
tickle your alarm bells, but there are plenty of other problems here,
too.

> I think you might want to review the use case behing ddl_command_trace,
> that has been asked by who users wanted to see their use case supported
> in some easier way than just what you're talking about here.

That argument carries no water with me. You're asking every
PostgreSQL user in the universe to carry the overhead of a feature
that 90% of them will not use. That is mighty expensive syntactic
sugar. I am not talking about code-complexity, either. It is pretty
obvious that there is run-time overhead of having this feature. I am
not aware of any case where we have accepted run-time overhead for a
feature not mandated by the SQL standard. Given the rest of what you
have here, it is quite simple to arrange for the same function to be
called after a create or alter and before a drop. Being able to do
that in one command instead of two is not a sufficient reason to add
another event type.

>> This gets right back to an argument Dimitri and I have been having
>> since v1 of this patch, which is whether these are command triggers or
>> event triggers. I think we ought to support both, but they are not
>> the same thing. I think the design you are proposing is just fine for
>> an event called sql_drop, but it makes no sense for an event called
>> ddl_command_end, which needs to be called once per DDL statement - at
>> the end. Not once per object dropped.
>
> What I think you're missing here is the proposal flying around to have
> drop operation get back to ProcessUtility so that C hooks and event
> triggers both can have at it. It's been debated on the list earlier, way
> before we talked about event triggers, also as a way to clean up things,
> IIRC.

I'm very much opposed to that proposal, as I am to your proposal to
expose internal and generated events to users. Recursing into
ProcessUtility is a nasty, messy hook that is responsible for subtle
bugs and locking problems in our current DDL implementation. We
should be trying to refactor that to clean it up, not exposing it as a
user-visible detail. I do NOT want a future refactoring effort to
clean up race conditions and duplicate name lookups in the DDL code to
be blocked because someone is relying on the details of the existing
implementation in their event trigger implementation.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Event Triggers: adding information
Date: 2013-01-16 23:14:41
Message-ID: 4717.1358378081@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr> writes:
> There's a difference between "it's not the way I would have done it" and
> "the author didn't think about what he's doing". That's also the reason
> why it's very hard to justify sending a polished enough patch as a non
> commiter.

> And then this patch is like the next one in a long series that is
> lasting for about 2 years now, and spliting it is just more work for
> everybody, and then you take the risk that the next commiter who looks
> at the patch prefers to see a complete enough view of the goal you're
> trying to reach.

What was discussed at the last dev meeting was assigning a committer to
each large patch to start with, which would reduce the risk of the
goalposts moving that way. It seems to me that Robert's at least
unofficially taken that role for event triggers. You should be happy,
because if I were reviewing it I'd likely bounce the whole thing.
I'm not convinced this will *ever* be a stable feature that doesn't
create more problems than it fixes.

> What I think you're missing here is the proposal flying around to have
> drop operation get back to ProcessUtility so that C hooks and event
> triggers both can have at it.

I've seen no such proposal, and it seems like a nonstarter just from the
idea. dependency.c doesn't have a syntax tree to describe each object
that it finds to drop; creating one, and then doing a lookup to re-find
the object, is just going to make drops hugely slower and buggier. Not
to mention the large amount of code that would have to be added and
maintained. Not to mention that the objects dependency.c works with
aren't necessarily all that interesting from the user's level --- for
instance, do you really want to see each column default expression
dropped individually? Not to mention that the permissions
considerations are different from a standalone DROP.

The bigger picture there is that it's taken us years, and multiple major
iterations, to get cascaded drops to work properly and reliably. I'm
disinclined to rip that code open and rewrite it completely; let alone
add hooks that might inject random operations at any point in the
process.

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Event Triggers: adding information
Date: 2013-01-16 23:42:33
Message-ID: CA+TgmoZd+ggy4GmWC4k7TwwnfWT3cD2k4sKyv_516ohodZ2HVQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Jan 16, 2013 at 6:14 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> What was discussed at the last dev meeting was assigning a committer to
> each large patch to start with, which would reduce the risk of the
> goalposts moving that way. It seems to me that Robert's at least
> unofficially taken that role for event triggers. You should be happy,
> because if I were reviewing it I'd likely bounce the whole thing.
> I'm not convinced this will *ever* be a stable feature that doesn't
> create more problems than it fixes.

And speaking of the goalposts moving...

I don't think that's the problem, here. Rather, I think the problem
is that the design is ardently refusing to move. It might be a slight
overstatement to say that every review I've ever posted for this patch
has complained about design decisions that expose implementation
details to the user that we might want to change later, but not by
much. And yet, two years on, we've got proposals on the table to
artificially force *more* things through ProcessUtility(). There's no
particularly consistency to which things do and don't go through that
function today, and no reason whatsoever to try to force everything to
go through there. I agree with everything you say in the portion of
the email I didn't quote, and I'm pretty sure I've made similar points
more than once in the past.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Event Triggers: adding information
Date: 2013-01-17 10:18:38
Message-ID: m2pq14m7sx.fsf@2ndQuadrant.fr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> - adds ddl_command_trace and ddl_command_end events
> - causes those events to be called not only for actual SQL commands
> but also for things that happen, at present, to go through the same
> code path
> - adds additional magic variables to PL/pgsql to expose new
> information not previously exposed
> - adds CONTEXT as an additional event-trigger-filter variable, along with TAG
>
> In my mind, that's four or five separate patches. You don't have to
> agree, but that's what I think, and I'm not going to apologize for
> thinking it. Moreover, I think you will struggle to find examples of
> patches committed in the last three years that made as many different,
> only loosely-related changes as you're proposing for this one.

So, say I do that. Now we have 5 patches to review. They all are against
the same set of files, so there's a single possible ordering to apply
them, and so to review them. When we reach to an agreement out-of-order
(we will), it takes a sensible amount of time to rebuild the patch set
in a different order. I don't think we have the time to do that.

Now, if that's what it takes, I'll spend time on it. In which exact
order do you want to be reviewing and applying that series of patches?

> As for the patch not being well-thought-out, I'm sticking by that,
> too. Alvaro's comments about lock escalations should be enough to
> tickle your alarm bells, but there are plenty of other problems here,
> too.

Here's the comment in the code where the lock problems are discussed:

/*
* Fill in the EventTriggerTargetOid for a DROP command and a
* "ddl_command_start" Event Trigger: the lookup didn't happen yet and we
* still want to provide the user with that information when possible.
*
* We only do the lookup if the command contains a single element, which is
* often forced to be the case as per the grammar (see the drop_type:
* production for a list of cases where more than one object per command is
* allowed).
*
* We take no exclusive lock here, the main command will have to do the
* name lookup all over again and take appropriate locks down after the
* User Defined Code runs anyway, and the commands executed by the User
* Defined Code will take their own locks. We only lock the objects here so
* that they don't disapear under us in another concurrent transaction,
* hence using ShareUpdateExclusiveLock. XXX this seems prone to lock
* escalation troubles.
*/

My thinking is that the user code can do anything, even run a DROP
command against the object that we are preparing ourselves to be
dropping. I don't think we can bypass doing the lookup twice.

The way not to have to do any lookup in our code is not to report the
OID of the Object being dropped. Then, the event trigger code will do
that lookup as its first thing, certainly. The lookup will happen.

RAISE NOTICE 'drop table % (%)', tg_objectname::regclass,
tg_objectname::regclass::oid;

That is, at least, the angle I considered when crafting this patch. I'm
happy to change that angle if needed and adjust to things I missed and
you will tell me about.

>> I think you might want to review the use case behing ddl_command_trace,
>> that has been asked by who users wanted to see their use case supported
>> in some easier way than just what you're talking about here.
>
> […]
> Being able to do
> that in one command instead of two is not a sufficient reason to add
> another event type.

That feature was proposed in past october (with a patch) and received no
comment, so I went on with it. The performance costs of this patch is
another couple of cache lookups when doing a DDL command, which I saw as
acceptable.

http://www.postgresql.org/message-id/m28vbgcc30.fsf@2ndQuadrant.fr

I'm not so attached to it that I will put the rest of the patch in
danger with it, I'm fine about removing that part if we have a good
reason to. And yes, you (politely) saying "I don't like it" is a good
enough reason to.

Our process for a non-commiter proposal seems to me that you have to
send a patch showing what you're talking about for a thread to form and
to get some comments on the idea, either rejecting it or reaching to a
consensus about it. So the opening of that thread was in october and the
discussion happens now: it's very fine, and I could be happy and
thankfull about this fact with some subtle changes in the form of the
messages.

> I'm very much opposed to that proposal, as I am to your proposal to
> expose internal and generated events to users. Recursing into
> ProcessUtility is a nasty, messy hook that is responsible for subtle
> bugs and locking problems in our current DDL implementation. We

We are publishing 2 pieces of information tied to the implementation in
the current patch:

- the parse tree, only available to C code, and without any API
stability promises

- the fact that somme commands are running nested commands: serial does
a create sequence, create schema create table issues a create table.

My understanding of your proposal not to call them ddl_command_* events
is that you are exposing exactly the same set of implementation
dependent information to the users, just under another name.

What I mean is that if a user wants to log any sequence creation, he
will have to either know that some of them happen within the CONTEXT
named 'GENERATED', or to know that some of them happen as ddl_command_*
events and some of them happen as sequence_for_serial_* events, say.

In other words, while I can follow your general approach to things, I
don't see any practical impact that makes me think it's a much cleaner
proposal in terms of not publishing the implementation details, because
I don't think that it is possible to both give the users the information
they need and hide those implementation details.

> should be trying to refactor that to clean it up, not exposing it as a
> user-visible detail. I do NOT want a future refactoring effort to
> clean up race conditions and duplicate name lookups in the DDL code to
> be blocked because someone is relying on the details of the existing
> implementation in their event trigger implementation.

That problem is going to happen in some way or another. It's not about
avoiding the problem completely, but choosing a set of compromises. I'm
yet to understand what your set of compromises is in detailed enough
practical implementation terms (not just "it's another set of events")
and how it's a better compromise over all (I trust your judgement on
that, I still want to understand).

After all, the whole point of the Event Trigger patch is to expose some
level of implementation details.

Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
> What was discussed at the last dev meeting was assigning a committer to
> each large patch to start with, which would reduce the risk of the
> goalposts moving that way. It seems to me that Robert's at least
> unofficially taken that role for event triggers. You should be happy,
> because if I were reviewing it I'd likely bounce the whole thing.

As Robert, I don't think the problem is about moving goal posts here,
it's more about how we want to spend our time. I don't want to be
refactoring 5 branches each time we reach on a new consensus only to
realise that of course the inter-dependent set of patches just need to
be redone from scratch because we will be applying another one first.

As for the nested commands design, see above for why I'm not already
buying into his counter proposal. Hint: I don't think it solves the
problem Robert is complaining about. Details: I certainly don't
understand what problem he is complaining about really, and I know I've
been trying to get that information before. It must be something
obvious.

> I'm not convinced this will *ever* be a stable feature that doesn't
> create more problems than it fixes.

I'm still open to hearing about another way to implement

- ddl support for logical replication

- extensions to ddl behaviour

I want to be able to install an extension that will change the way some
DDL are run, and I know that what I want to do here will never get
accepted for the core version of PostgreSQL. That's a very fine position
that I understand well enough to work on a patch series where the only
goal is that I can do my hacking outside of PostgreSQL core.

> I've seen no such proposal, and it seems like a nonstarter just from the
> idea. dependency.c doesn't have a syntax tree to describe each object
> that it finds to drop; creating one, and then doing a lookup to re-find
> the object, is just going to make drops hugely slower and buggier. Not

We wouldn't want to be doing it that way, of course. One idea was to
create a new parse node (CascadeDropStmt, say) that gets the OID of the
object to drop, and only does another schema/name lookup when there's an
event trigger to fire.

So the only obvious part of that proposal is that the problem is complex
enough and the solution not ready yet. That's also why that problem is
not addressed in any patch I sent to this day. The status of that part
of the problem is still to find together a good design to implement.

That part of the discussion really should live in a separate thread.

> to mention the large amount of code that would have to be added and
> maintained. Not to mention that the objects dependency.c works with
> aren't necessarily all that interesting from the user's level --- for
> instance, do you really want to see each column default expression
> dropped individually? Not to mention that the permissions
> considerations are different from a standalone DROP.

No, we don't want to see all the details. Yes, we will need to come up
with a list of the ones we want to see. As I'm currently working on the
DDL part of the Event Triggers, my current thinking is to only include
objects that we could have been DROPing with a TOPLEVEL command.

> The bigger picture there is that it's taken us years, and multiple major
> iterations, to get cascaded drops to work properly and reliably. I'm
> disinclined to rip that code open and rewrite it completely; let alone
> add hooks that might inject random operations at any point in the
> process.

Fair enough. If that's the consensus, there's nothing to change in my
current patch, or maybe a paragraph or two in the docs to set users
expectations.

Maybe we should ask the logical replication guys their opinion about how
to solve DROP CASCADE support for them, so that we all agree that not
doing anything is a good enough answer.

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Event Triggers: adding information
Date: 2013-01-17 16:15:24
Message-ID: CA+TgmoYEW583z+oisToCYQMz2grzpavdXrCi3-W2D_+nucVhzQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Jan 17, 2013 at 5:18 AM, Dimitri Fontaine
<dimitri(at)2ndquadrant(dot)fr> wrote:
> Now, if that's what it takes, I'll spend time on it. In which exact
> order do you want to be reviewing and applying that series of patches?

Let's agree on which things we even want to do first. Here's my take:

- adds ddl_command_trace and ddl_command_end events

I think ddl_command_end is OK and I'm willing to commit that if
extracted as its own patch. I think ddl_command_trace is unnecessary
syntactic sugar.

- causes those events to be called not only for actual SQL commands
but also for things that happen, at present, to go through the same
code path

I think this is a bad idea, not only because, as I said before, it
exposes internal implementation details, but also because it's
probably not safe. As Noah (I believe, or maybe Tom), observed in a
previous post, we've gone to a lot of work to make sure that nothing
you do inside a trigger can crash the server, corrupt the catalogs,
trigger elog() messages, etc. That applies in spades to DDL.

Despite Tom's skepticism, I'm pretty sure that it's safe for us to
execute trigger function calls either completely before or completely
after we execute the DDL command. It should be no different than
executing them as separate commands. But doing something in the
middle of a command is something else altogether. Consider, for
example, that ALTER TABLE does a bunch of crap internally to itself,
and then recurses into ProcessUtility to do some more stuff. This is
a terrible design, but it's what we've got. Are you absolutely sure
that the intermediate state that exists after that first bunch of
stuff is done and before those other things are done is a safe place
to execute arbitrary user-provided code? I'm not. And the more
places we add that recurse into ProcessUtility, or indeed, add event
triggers in general, the more places we're going to add new failure
modes.

Fixing that is part of the price of adding a new event trigger and I'm
OK with that. But I'm not OK with baking into the code the assumption
that any current or future callers of ProcessUtility() should be
prepared for arbitrary code execution. The code wasn't written with
that in mind, and it will not end well.

- adds additional magic variables to PL/pgsql to expose new
information not previously exposed

I'm not sure that I agree with the implementation of this, but I think
I'm OK with the concept. I will review it when it is submitted in a
form not intertwined with other things.

- adds CONTEXT as an additional event-trigger-filter variable, along with TAG

I'm OK with adding new event-trigger-filter variables, provided that
they are done without additional run-time overhead for people who are
not using event triggers; I think it's nice that we can support that.
But I think this particular proposal is DOA because nothing except
TOPLEVEL is actually safe.

>> As for the patch not being well-thought-out, I'm sticking by that,
>> too. Alvaro's comments about lock escalations should be enough to
>> tickle your alarm bells, but there are plenty of other problems here,
>> too.
>
> Here's the comment in the code where the lock problems are discussed:
>
> /*
> * Fill in the EventTriggerTargetOid for a DROP command and a
> * "ddl_command_start" Event Trigger: the lookup didn't happen yet and we
> * still want to provide the user with that information when possible.
> *
> * We only do the lookup if the command contains a single element, which is
> * often forced to be the case as per the grammar (see the drop_type:
> * production for a list of cases where more than one object per command is
> * allowed).
> *
> * We take no exclusive lock here, the main command will have to do the
> * name lookup all over again and take appropriate locks down after the
> * User Defined Code runs anyway, and the commands executed by the User
> * Defined Code will take their own locks. We only lock the objects here so
> * that they don't disapear under us in another concurrent transaction,
> * hence using ShareUpdateExclusiveLock. XXX this seems prone to lock
> * escalation troubles.
> */
>
> My thinking is that the user code can do anything, even run a DROP
> command against the object that we are preparing ourselves to be
> dropping. I don't think we can bypass doing the lookup twice.

Sure, but there's also the issue of getting the right answer. For
example, suppose you are planning to drop a function. You do a name
lookup on the name and call the event trigger. Meanwhile, someone
else renames the function and creates a new one with the same name.
After the event trigger returns, the actual drop code latches onto the
one with the new name. Now, the event trigger ran with OID A and the
actual object dropped was one with OID B. It's possible that KaiGai's
RENAME refactoring in 9.3 has strengthened the locking enough that
this particular failure mode is no longer possible in that specific
scenario, but I am fairly confident it would have been an issue in
9.2, and it may still be. In general, the only way to be sure that
things like that can't happen is to only do the name lookup once. Our
locking on non-table objects is still pretty flim-flammy, and there
are a lot of concurrent scenarios where things that seem like they
shouldn't change under you actually can.

There's another issue here, too, which is that this change
reintroduces a failure mode that I spent a lot of time cleaning up
during the 9.2 cycle - namely, taking locks on a table before
permissions have been checked, thus providing new methods for
unprivileged users to block operations on tables they don't have
rights to.

>> I'm very much opposed to that proposal, as I am to your proposal to
>> expose internal and generated events to users. Recursing into
>> ProcessUtility is a nasty, messy hook that is responsible for subtle
>> bugs and locking problems in our current DDL implementation. We
>
> We are publishing 2 pieces of information tied to the implementation in
> the current patch:
>
> - the parse tree, only available to C code, and without any API
> stability promises

I'm OK with that. People who program in C are taking their life in
their hands anyway as far as new versions are concerned.

> - the fact that somme commands are running nested commands: serial does
> a create sequence, create schema create table issues a create table.
>
> My understanding of your proposal not to call them ddl_command_* events
> is that you are exposing exactly the same set of implementation
> dependent information to the users, just under another name.

What I'm proposing is that we have a separate set of events that get
invoked every time an object is created, altered, or dropped,
regardless of the cause. Yes, the name will be different, but the
point is not to only change the name, but rather to also change the
definition. There are two separate things you might want here:

- to get a callback every time a command is executed (e.g. so you can
log it, check permissions, or reject the entire statement)
- to get a callback every time a certain kind of SQL object is
created, altered, or destroyed (e.g. so you can replicate those
operations on another server)

For example, suppose a user executes the command "DROP OWNED BY fred".
For a logging hook, you want the first thing: one callback, saying,
hey, the user is trying to run DROP OWNED BY fred. You log it once,
and go home. For a replication hook, you want the second thing: one
callback per object, with details about each one, so you can drop them
on the remote side. Both things are useful, but they are different.

As a further example, suppose that in 9.4 (or 9.17) we add a command
"DROP TABLES IN SCHEMA fred WHERE name LIKE 'bob%'". Well, the
logging trigger still just works (because it's only writing the
statement, without caring about its contents). And the replication
trigger still just works too (because it will still get a callback for
every table that gets dropped, even though it knows nothing about the
new syntax). That's very powerful. Of course, there will still be
cases where things have to change, but you can minimize it by clean
definitions.

It pains me that I've evidently failed to communicate this concept
clearly despite a year or more of trying. Does that make sense? Is
there some way I can make this more clear? The difference seems very
clear-cut to me, but evidently I'm not conveying it well.

> That problem is going to happen in some way or another. It's not about
> avoiding the problem completely, but choosing a set of compromises. I'm
> yet to understand what your set of compromises is in detailed enough
> practical implementation terms (not just "it's another set of events")
> and how it's a better compromise over all (I trust your judgement on
> that, I still want to understand).
>
> After all, the whole point of the Event Trigger patch is to expose some
> level of implementation details.

I don't really agree with that. I think the point is to expose what
the system is doing to the DBA. I'm OK with exposing the fact that
creating a table with a serial column also creates a sequence. There
is no problem with that - it's all good. What we DON'T want to do is
set up a system where the fact that it creates a sequence is exposed
because that happens to go through ProcessUtility() and the fact that
it creates a constraint is not because that happens not to go through
ProcessUtility(). That is not the sort of quality that our users have
come to expect from PostgreSQL.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Event Triggers: adding information
Date: 2013-01-17 16:31:51
Message-ID: 20130117163151.GB22844@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2013-01-17 11:15:24 -0500, Robert Haas wrote:
> As a further example, suppose that in 9.4 (or 9.17) we add a command
> "DROP TABLES IN SCHEMA fred WHERE name LIKE 'bob%'". Well, the
> logging trigger still just works (because it's only writing the
> statement, without caring about its contents). And the replication
> trigger still just works too (because it will still get a callback for
> every table that gets dropped, even though it knows nothing about the
> new syntax). That's very powerful. Of course, there will still be
> cases where things have to change, but you can minimize it by clean
> definitions.
>
> It pains me that I've evidently failed to communicate this concept
> clearly despite a year or more of trying. Does that make sense? Is
> there some way I can make this more clear? The difference seems very
> clear-cut to me, but evidently I'm not conveying it well.

Well, I didn't manage to follow the topic with the level of detail I
would have liked to/should have so its very well possible that I missed
a proposal of how to implement that concept in a way you like?
With some squinting you can actually see the proposed ddl_trace callsite
as exactly that kind of replication trigger, but with a worse name...

Without piggy-backing on the internal commands generated - which
currently seems to be achieved by passing everything through
ProcessUtility, but could work in a quite different way - and
reconstructing sql from that I don't immediately see how you want to do
this?

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Event Triggers: adding information
Date: 2013-01-17 17:06:50
Message-ID: m2k3rbhh79.fsf@2ndQuadrant.fr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> - adds ddl_command_trace and ddl_command_end events
>
> I think ddl_command_end is OK and I'm willing to commit that if
> extracted as its own patch. I think ddl_command_trace is unnecessary
> syntactic sugar.

Ok. Will prepare a non controversial patch for ddl_command_end. After
having played with the patch, I happen to quite like ddl_command_trace,
but I won't try to protect it from its death any more than that.

> - causes those events to be called not only for actual SQL commands
> but also for things that happen, at present, to go through the same
> code path
>
> I think this is a bad idea, not only because, as I said before, it
> exposes internal implementation details, but also because it's
> probably not safe. As Noah (I believe, or maybe Tom), observed in a
> previous post, we've gone to a lot of work to make sure that nothing
> you do inside a trigger can crash the server, corrupt the catalogs,
> trigger elog() messages, etc. That applies in spades to DDL.

I naively though that doing CommandCounterIncrement() before running the
event trigger would go a long enough way towards making the operation
safe when the current code is calling back into ProcessUtility().

> Despite Tom's skepticism, I'm pretty sure that it's safe for us to
> execute trigger function calls either completely before or completely
> after we execute the DDL command. It should be no different than
> executing them as separate commands. But doing something in the
> middle of a command is something else altogether. Consider, for
> example, that ALTER TABLE does a bunch of crap internally to itself,
> and then recurses into ProcessUtility to do some more stuff. This is
> a terrible design, but it's what we've got. Are you absolutely sure
> that the intermediate state that exists after that first bunch of
> stuff is done and before those other things are done is a safe place
> to execute arbitrary user-provided code? I'm not. And the more
> places we add that recurse into ProcessUtility, or indeed, add event
> triggers in general, the more places we're going to add new failure
> modes.

While I hear you, I completely fail to understand which magic is going
to make your other idea safer than this one. Namely, calling an event
trigger named sql_drop rather than ddl_command_start from the same place
in the code is certainly not helping at all.

If your answer to that is how to implement it (e.g. keeping a queue of
events for which to fire Event Triggers once we reach a safe point in
the code), then why couldn't we do exactly the same thing under the
other name?

> Fixing that is part of the price of adding a new event trigger and I'm
> OK with that. But I'm not OK with baking into the code the assumption
> that any current or future callers of ProcessUtility() should be
> prepared for arbitrary code execution. The code wasn't written with
> that in mind, and it will not end well.

Again, while I agree with the general principle, it happens that the
calls to the arbitrary code are explicit in ProcessUtility(), so that if
the place is known to be unsafe it's easy enough to refrain from calling
the user code. We do that already for CREATE INDEX CONCURRENTLY at
ddl_command_end.

> - adds additional magic variables to PL/pgsql to expose new
> information not previously exposed
>
> I'm not sure that I agree with the implementation of this, but I think
> I'm OK with the concept. I will review it when it is submitted in a
> form not intertwined with other things.

Fair enough.

> - adds CONTEXT as an additional event-trigger-filter variable, along with TAG
>
> I'm OK with adding new event-trigger-filter variables, provided that
> they are done without additional run-time overhead for people who are
> not using event triggers; I think it's nice that we can support that.
> But I think this particular proposal is DOA because nothing except
> TOPLEVEL is actually safe.

Yeah, let's first see about that.

>> My thinking is that the user code can do anything, even run a DROP
>> command against the object that we are preparing ourselves to be
>> dropping. I don't think we can bypass doing the lookup twice.
>
> Sure, but there's also the issue of getting the right answer. For
> example, suppose you are planning to drop a function. You do a name
> lookup on the name and call the event trigger. Meanwhile, someone
> else renames the function and creates a new one with the same name.
> After the event trigger returns, the actual drop code latches onto the
> one with the new name. Now, the event trigger ran with OID A and the
> actual object dropped was one with OID B. It's possible that KaiGai's

Is your scenario even possible when we take a ShareUpdateExclusiveLock
on he object before running the Event Trigger, in order to protect
against exactly that scenario, as said in the comment? Maybe another
lock should be taken. Which one?

> There's another issue here, too, which is that this change
> reintroduces a failure mode that I spent a lot of time cleaning up
> during the 9.2 cycle - namely, taking locks on a table before
> permissions have been checked, thus providing new methods for
> unprivileged users to block operations on tables they don't have
> rights to.

Aren't Event Triggers superuser only? Do your concern means that we
should force Event Triggers Functions to be SECURITY DEFINER?

>> - the parse tree, only available to C code, and without any API
>> stability promises
>
> I'm OK with that. People who program in C are taking their life in
> their hands anyway as far as new versions are concerned.

Cool.

> What I'm proposing is that we have a separate set of events that get
> invoked every time an object is created, altered, or dropped,
> regardless of the cause. Yes, the name will be different, but the
> point is not to only change the name, but rather to also change the
> definition. There are two separate things you might want here:

In the rest of your email, I can't decide what kind of information you
want those new events to give to the Event Trigger User Code. The other
thing is, as I said before, that I can't see why those new events would
suddenly make the call point safer.

Other than that, I must say that your proposal is sound to me and that I
like it too. I just fail to understand how it makes any difference other
than in naming things, really.

Why would a command that the system executes for you has to be
considered such a different beast from the same command when you happen
to have sent explicitely? A GENERATED context sounds better to me than a
"sql_create" event as a user interface, obviously, and implementing the
later has the *exact* same set of challenges to address, as far as I
understand.

> For example, suppose a user executes the command "DROP OWNED BY fred".
> For a logging hook, you want the first thing: one callback, saying,
> hey, the user is trying to run DROP OWNED BY fred. You log it once,
> and go home. For a replication hook, you want the second thing: one

Maybe it's only me, but I'm really annoyed that there's currently no way
to get the list of objects actually dropped other than parsing the logs.
So I know that I would want my audit trigger to see them all.

> callback per object, with details about each one, so you can drop them
> on the remote side. Both things are useful, but they are different.

Sure. And you have exactly that feature in my patch, where by default
CONTEXT is TOPLEVEL and you see only the main command, but you can also
register against both TOPLEVEL and GENERATED, say CASCADE in the future,
and have all the details. Or only register against CASCADE, too.

> As a further example, suppose that in 9.4 (or 9.17) we add a command
> "DROP TABLES IN SCHEMA fred WHERE name LIKE 'bob%'". Well, the
> logging trigger still just works (because it's only writing the
> statement, without caring about its contents). And the replication
> trigger still just works too (because it will still get a callback for
> every table that gets dropped, even though it knows nothing about the
> new syntax). That's very powerful. Of course, there will still be
> cases where things have to change, but you can minimize it by clean
> definitions.

Your use case here is exactly the reason why we proposed to have the
main DROP command implemented via ProcessUtility, so that an Event
Trigger called with a CASCADE context works the same way whatever the
TOPLEVEL command looks like, and always against a single object.

> It pains me that I've evidently failed to communicate this concept
> clearly despite a year or more of trying. Does that make sense? Is
> there some way I can make this more clear? The difference seems very
> clear-cut to me, but evidently I'm not conveying it well.

Communication is hard, and apparently even more so for computer-people,
or they wouldn't prefer to spend their jobs talking to a computer rather
than talking to people. I've heard that so many times that it almost
could make it true, sometimes…

>> After all, the whole point of the Event Trigger patch is to expose some
>> level of implementation details.
>
> I don't really agree with that. I think the point is to expose what
> the system is doing to the DBA. I'm OK with exposing the fact that
> creating a table with a serial column also creates a sequence. There
> is no problem with that - it's all good. What we DON'T want to do is
> set up a system where the fact that it creates a sequence is exposed
> because that happens to go through ProcessUtility() and the fact that
> it creates a constraint is not because that happens not to go through
> ProcessUtility(). That is not the sort of quality that our users have
> come to expect from PostgreSQL.

I see what you mean. I don't see any CREATE CONSTRAINT command in the
following reference, though.

http://www.postgresql.org/docs/9.2/static/sql-commands.html

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Event Triggers: adding information
Date: 2013-01-17 19:44:12
Message-ID: CA+U5nM+Uuh7cyHSJ5wN54twBUGSncJzqpZB83xuZw4VP7UCziQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 17 January 2013 16:15, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> As a further example, suppose that in 9.4 (or 9.17) we add a command
> "DROP TABLES IN SCHEMA fred WHERE name LIKE 'bob%'". Well, the
> logging trigger still just works (because it's only writing the
> statement, without caring about its contents). And the replication
> trigger still just works too (because it will still get a callback for
> every table that gets dropped, even though it knows nothing about the
> new syntax). That's very powerful. Of course, there will still be
> cases where things have to change, but you can minimize it by clean
> definitions.
>
> It pains me that I've evidently failed to communicate this concept
> clearly despite a year or more of trying. Does that make sense? Is
> there some way I can make this more clear? The difference seems very
> clear-cut to me, but evidently I'm not conveying it well.
>
>> That problem is going to happen in some way or another. It's not about
>> avoiding the problem completely, but choosing a set of compromises. I'm
>> yet to understand what your set of compromises is in detailed enough
>> practical implementation terms (not just "it's another set of events")
>> and how it's a better compromise over all (I trust your judgement on
>> that, I still want to understand).
>>
>> After all, the whole point of the Event Trigger patch is to expose some
>> level of implementation details.
>
> I don't really agree with that. I think the point is to expose what
> the system is doing to the DBA. I'm OK with exposing the fact that
> creating a table with a serial column also creates a sequence. There
> is no problem with that - it's all good. What we DON'T want to do is
> set up a system where the fact that it creates a sequence is exposed
> because that happens to go through ProcessUtility() and the fact that
> it creates a constraint is not because that happens not to go through
> ProcessUtility(). That is not the sort of quality that our users have
> come to expect from PostgreSQL.

"The point", i.e. the main use case, is to replicate the DDL in a useful form.

Discussions and reasoning need to focus on the main use case, not on
weird futures or qualitative points. Yes, the discussion has gone on
for years and it really should come to a useful conclusion sometime
soon.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, Alvaro Herrera <alvherre(at)2ndQuadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Event Triggers: adding information
Date: 2013-01-17 20:24:38
Message-ID: 16675.1358454278@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
> On 17 January 2013 16:15, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> It pains me that I've evidently failed to communicate this concept
>> clearly despite a year or more of trying. Does that make sense? Is
>> there some way I can make this more clear? The difference seems very
>> clear-cut to me, but evidently I'm not conveying it well.

> "The point", i.e. the main use case, is to replicate the DDL in a useful form.

> Discussions and reasoning need to focus on the main use case, not on
> weird futures or qualitative points.

I have to completely disagree with that. If we don't want PostgreSQL
to soon subside into an unfixable morass, as I think Brooks puts it,
we must *not* simply patch things in a way that expediently provides
an approximation of some desired feature. We have to consider, and put
substantial weight on, having a clean and maintainable system design.

This is particularly the case if we're talking about a feature that
is going to expose backend internals to users to any degree. We're
either going to be constrained to not change those internals any more,
or expect to break users' applications when we do change them, and
neither result is very appealing. Especially not when we're talking
about the structure of Postgres' DDL code, which can most charitably
be described as "it just grew", not as something that has any clear
architecture to it.

Now if we could quantify these things, that would be great, but
AFAICS "qualitative points" is all we've got to go on.

I think that we're not realistically going to be able to introduce
event triggers in very many of the places we'd like to have them
without first doing a lot of fundamental refactoring. And that is
hard, dangerous, slow work that tends to break things in itself.
As we've been repeatedly reminded in connection with KaiGai-san's
refactoring patches.

So my opinion is that most of what we'd like to have here is material
for 9.4 or 9.5 or even further out. If we can get the event trigger
catalog infrastructure and some *very* basic events, like
end-of-toplevel-command, into place for 9.3, we'll have done well.

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Event Triggers: adding information
Date: 2013-01-17 20:56:24
Message-ID: CA+TgmoZYFhn=DyenBLr8KUZXYHtgjo-O8YE-7y61WKR6F05UQA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Jan 17, 2013 at 12:06 PM, Dimitri Fontaine
<dimitri(at)2ndquadrant(dot)fr> wrote:
> Ok. Will prepare a non controversial patch for ddl_command_end.

Thanks. I will make a forceful effort to review that in a timely
fashion when it's posted.

>> I think this is a bad idea, not only because, as I said before, it
>> exposes internal implementation details, but also because it's
>> probably not safe. As Noah (I believe, or maybe Tom), observed in a
>> previous post, we've gone to a lot of work to make sure that nothing
>> you do inside a trigger can crash the server, corrupt the catalogs,
>> trigger elog() messages, etc. That applies in spades to DDL.
>
> I naively though that doing CommandCounterIncrement() before running the
> event trigger would go a long enough way towards making the operation
> safe when the current code is calling back into ProcessUtility().

It's something, but I'd be shocked if it covers all the bases.

Let me try to give a concrete example of how I think another firing
point could be made to work along the lines I'm suggesting. I'm not
going to use DROP as an example because I think upon reflection that
will be a particularly challenging case to make work correctly.
Instead, let me use the example of ALTER.

Goal: Every time an ALTER command is used on object *that actually
exists*, we will call some user-defined function and pass the object
type, the OID of the object, and some details about what sort of
alteration the user has requested.

Where shall we put the code to do this? Clearly, ProcessUtility is
too early, because at that point we have not done the name lookup,
locked the object, or checked permissions. So the OID of the target
object is not and cannot be known at that point. We cannot do an
extra name lookup at that point without taking a lock, because the
answer might change, or, due to non-MVCC catalog access, the lookup
might fail to find an object altogether. And we can't take a lock
without checking permissions first. And the permissions-checking
logic can't be done inside ProcessUtility(), because it's object-type
specific and we don't want to duplicate it.

So, instead, what we need to do is go into each function that
implements ALTER, and modify it so that after it does the dance where
we check permissions, take locks, and look up names, we call the
trigger function. That's a bit of a nuisance, because we probably
have a bunch of call sites to go fix, but in theory it should be
doable. The problem of course, is that it's not intrinsically safe to
call user-defined code there. If it drops the object and creates a
new one, say, hilarity will probably ensue.

But that should be a fixable problem. The only things we've done at
this point are check permissions, lock, and look up names. I think we
can assume that if the event trigger changes permissions, it's safe
for the event trigger to ignore that. The trigger cannot have
released the lock, so we're OK there. The only thing it can really
have done that's problematic is change the results of the name lookup,
say by dropping or renaming the object. But that's quite simple to
protect against: after firing the trigger, we do a
CommandCounterIncrement() and repeat the name lookup, and if we get a
different answer, then we bail out with an error and tell the user
they're getting far too cute. Then we're safe.

Now, there is a further problem: all of the information about the
ALTER statement that we want to provide to the trigger may not be
available at that point. Simple cases like ALTER .. RENAME won't
require anything more than that, but things like ALTER TABLE .. ADD
FOREIGN KEY get tricky, because while at this point we have a solid
handle on the identity of the relation to which we're adding the
constraint, we do NOT yet have knowledge of or a lock on the TARGET
relation, whose OID could still change on us up to much later in the
computation. To get reliable information about *that*, we'll need to
refactor the sequence of events inside ALTER TABLE.

Hopefully you can see what I'm driving toward here. In a design like
this, we can pretty much prove - after returning from the event
trigger - that we're in a state no different from what would have been
created by executing a series of commands - lock table, then SELECT
event_trigger_func(), then the actual ALTER in sequence. Maybe
there's a flaw in that analysis - that's what patch review is for -
but it sounds pretty darn tight to me.

I could write more, but I have to take the kids to dance class, and
this email may be too long already. Does that help at all to clarify
the lines along which I am thinking? Note that the above is clearly
not ddl_command_start but something else, and the different firing
point and additional safety cross-checks are what enable us to pass
additional information to the trigger without fear of breaking either
the trigger or the world.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Event Triggers: adding information
Date: 2013-01-17 21:43:14
Message-ID: m24nifeb9p.fsf@2ndQuadrant.fr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> Goal: Every time an ALTER command is used on object *that actually
> exists*, we will call some user-defined function and pass the object
> type, the OID of the object, and some details about what sort of
> alteration the user has requested.

Ok, in current terms of the proposed patch, it's a ddl_command_end event
trigger, and you can choose when to fire it in utility.c. The current
place is choosen to be just after the AlterTable() call, because that
sounded logical if you believe in CommandCounterIncrement.

> So, instead, what we need to do is go into each function that
> implements ALTER, and modify it so that after it does the dance where
> we check permissions, take locks, and look up names, we call the
> trigger function. That's a bit of a nuisance, because we probably
> have a bunch of call sites to go fix, but in theory it should be
> doable. The problem of course, is that it's not intrinsically safe to
> call user-defined code there. If it drops the object and creates a
> new one, say, hilarity will probably ensue.

You're trying to find a dangerous point when to fire the trigger here,
rather than trying to solve the problem you're describing. For the
problem you're describing, either you want the Object Specific
Information and the trigger fires at ddl_command_end, or you don't and
you can register your trigger at ddl_command_start.

If you want something else, well, it won't ship in 9.3.

> Now, there is a further problem: all of the information about the
> ALTER statement that we want to provide to the trigger may not be
> available at that point. Simple cases like ALTER .. RENAME won't
> require anything more than that, but things like ALTER TABLE .. ADD
> FOREIGN KEY get tricky, because while at this point we have a solid
> handle on the identity of the relation to which we're adding the
> constraint, we do NOT yet have knowledge of or a lock on the TARGET
> relation, whose OID could still change on us up to much later in the
> computation. To get reliable information about *that*, we'll need to
> refactor the sequence of events inside ALTER TABLE.

The only valid answer here has already been given by Tom. You can only
provide the information if it's available in the catalogs. So again,
it's a ddl_command_end event. It can not happen before.

> Hopefully you can see what I'm driving toward here. In a design like
> this, we can pretty much prove - after returning from the event
> trigger - that we're in a state no different from what would have been
> created by executing a series of commands - lock table, then SELECT
> event_trigger_func(), then the actual ALTER in sequence. Maybe
> there's a flaw in that analysis - that's what patch review is for -
> but it sounds pretty darn tight to me.

The only case when we need to do a lookup BEFORE actually running the
command is when that command is a DROP, because that's the only timing
when the information we want is still in the catalogs.

So that's the only case where we do a double object lookup, and we take
a ShareUpdateExclusiveLock lock on the object when doing so, so that we
can't lose it from another concurrent activity.

> Note that the above is clearly
> not ddl_command_start but something else, and the different firing
> point and additional safety cross-checks are what enable us to pass
> additional information to the trigger without fear of breaking either
> the trigger or the world.

That's the reason why we are only going to have ddl_command_start and
ddl_command_end in 9.3, and also the reason why the extra information is
only provided when this very information still exists in the catalogs at
the moment when we want to expose it.

And that's the reason why ddl_command_trace is attractive too: you won't
ever get Object OID and Name and Schema from the event where the object
does not exists, and maybe you want an easy way to tell the system
you're interested into an event *with* the information, and be done,
don't think.

Again, I don't care much about keeping ddl_command_trace because it's
not interesting much for implementing DDL support in logical
replication and my other use cases, but still, I though I would explain
it now that we're talking about it.

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


From: Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Simon Riggs <simon(at)2ndQuadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndQuadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Event Triggers: adding information
Date: 2013-01-17 21:51:03
Message-ID: m2hamfcwc8.fsf@2ndQuadrant.fr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
> I have to completely disagree with that. If we don't want PostgreSQL
> to soon subside into an unfixable morass, as I think Brooks puts it,
> we must *not* simply patch things in a way that expediently provides
> an approximation of some desired feature. We have to consider, and put
> substantial weight on, having a clean and maintainable system design.

When in Ottawa next may, I will have to buy you a glass of your favorite
wine and explain to you my main use case and vision. You will then
realize that all this time that I've been spending on Event Triggers is
a sideway to build something else entirely, because I know that it's the
only way to get the feature I want in core PostgreSQL.

So yes, I know about the clean and maintainable system design
constraint. I've a lot to learn still, and I appreciate your help very
much. Rest assured that the path you describe is the one I'm taking.

> I think that we're not realistically going to be able to introduce
> event triggers in very many of the places we'd like to have them
> without first doing a lot of fundamental refactoring.

We're only talking about ddl_command_start and ddl_command_end now. The
table_rewrite event is still on the horizon, but is not realistic to get
in 9.3 anymore, I think :(

So we're talking about adding some call points only in utility.c, only
before or after a ddl command is run, including nested sub-commands.

> So my opinion is that most of what we'd like to have here is material
> for 9.4 or 9.5 or even further out. If we can get the event trigger
> catalog infrastructure and some *very* basic events, like
> end-of-toplevel-command, into place for 9.3, we'll have done well.

My feeling is that I'm sending patches that only implement the *very*
basic events here, and nothing more. Most of the heat of this thread
came from a discussion about a feature that's very hard to design and
that is not in my patch series to review.

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Event Triggers: adding information
Date: 2013-01-17 22:08:30
Message-ID: CA+U5nMLxO9gi862cToRTPGeyASchzi1855=x-NGjBfwZBaKNzw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 17 January 2013 20:24, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
>> On 17 January 2013 16:15, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>>> It pains me that I've evidently failed to communicate this concept
>>> clearly despite a year or more of trying. Does that make sense? Is
>>> there some way I can make this more clear? The difference seems very
>>> clear-cut to me, but evidently I'm not conveying it well.
>
>> "The point", i.e. the main use case, is to replicate the DDL in a useful form.
>
>> Discussions and reasoning need to focus on the main use case, not on
>> weird futures or qualitative points.
>
> I have to completely disagree with that. If we don't want PostgreSQL
> to soon subside into an unfixable morass, as I think Brooks puts it,
> we must *not* simply patch things in a way that expediently provides
> an approximation of some desired feature. We have to consider, and put
> substantial weight on, having a clean and maintainable system design.

So why does focusing on a use case make this turn into an unfixable
morass? The two things are completely unrelated.

If the patch is anywhere close to an unfixable morass, let's just
reject it now. But Robert's arguments were much more tenuous than
that.

My comments were in response to this

> I don't really agree with that. I think the point is to expose what
> the system is doing to the DBA. I'm OK with exposing the fact that
> creating a table with a serial column also creates a sequence. There
> is no problem with that - it's all good. What we DON'T want to do is
> set up a system where the fact that it creates a sequence is exposed
> because that happens to go through ProcessUtility() and the fact that
> it creates a constraint is not because that happens not to go through
> ProcessUtility(). That is not the sort of quality that our users have
> come to expect from PostgreSQL.

The above functionality is sufficient to allow DDL replication. What
else do we want to do that requires some other capability?

Discussing things in terms of what we will actually use a feature for
is how we know we've got it right. And if it does that, we don't need
anything else.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Event Triggers: adding information
Date: 2013-01-17 22:09:36
Message-ID: 18499.1358460576@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> Let me try to give a concrete example of how I think another firing
> point could be made to work along the lines I'm suggesting.
> [ snip description of how an event trigger might safely be fired just
> after identification and locking of the target object for an ALTER ]

Reading that and reflecting on my gripe about the lack of any clear
architecture for our DDL code, I had the beginnings of an idea.

Perhaps it would improve matters if we refactored DDL processing so that
there were separate "parse analysis" and "execution" phases, where parse
analysis is (perhaps among other responsibilities) responsible for
identifying and locking all objects to be used in the command. I note
that locking the referenced tables is the responsibility of the parse
analysis step in DML processing, so there's solid precedent for this.
Also, we have some of this approach already for certain commands such
as CREATE TABLE, cf parse_utilcmd.c.

If we did that, then it'd be feasible to fire event triggers after the
parse analysis step, and the rechecking that Robert describes could be
encapsulated as "redo the parse analysis and see if the result changed".

It's not clear to me just how this ought to extend to the cascaded-DROP
or inherited-table-ALTER cases, but hey, it's only the beginnings of
an idea.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>
Cc: Simon Riggs <simon(at)2ndQuadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndQuadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Event Triggers: adding information
Date: 2013-01-17 22:35:51
Message-ID: 18941.1358462151@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr> writes:
> Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
>> I think that we're not realistically going to be able to introduce
>> event triggers in very many of the places we'd like to have them
>> without first doing a lot of fundamental refactoring.

> We're only talking about ddl_command_start and ddl_command_end now. The
> table_rewrite event is still on the horizon, but is not realistic to get
> in 9.3 anymore, I think :(

> So we're talking about adding some call points only in utility.c, only
> before or after a ddl command is run, including nested sub-commands.

Well, that's already a problem, because as Robert keeps saying, what
goes through utility.c and what doesn't is pretty random right at the
moment, and we shouldn't expose that behavior to users for fear of not
being able to change it later.

I think we could possibly ship event triggers defined as start and end
of a *top level* command and have them working reliably in 9.3. If you
want users to be looking at subcommands, there is a lot more work to do
there than we have any chance of getting done for 9.3 (if it is to ship
in 2013, that is).

Alternatively, if you want to get something into 9.3 that has not
necessarily got a long-term-stable API, I'd be inclined to suggest that
we forget about a SQL-level event trigger facility for now, and just put
in some hook call points. It's pretty well established that we're
willing to change the API seen by hook functions across major releases.

TBH this might be the best thing anyway if your long-term goals have to
do with replication, because it'd be a lot cheaper to get to the point
where you could write the replication code and see if it all actually
works. I'm fairly concerned that we might spend many man-months getting
event triggers with definitions A,B,C into the system, only to find out
later that what is really needed for logical replication is definitions
D,E,F.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Event Triggers: adding information
Date: 2013-01-17 22:47:27
Message-ID: 19133.1358462847@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Simon Riggs <simon(at)2ndquadrant(dot)com> writes:
> On 17 January 2013 20:24, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> My comments were in response to this

>> I don't really agree with that. I think the point is to expose what
>> the system is doing to the DBA. I'm OK with exposing the fact that
>> creating a table with a serial column also creates a sequence. There
>> is no problem with that - it's all good. What we DON'T want to do is
>> set up a system where the fact that it creates a sequence is exposed
>> because that happens to go through ProcessUtility() and the fact that
>> it creates a constraint is not because that happens not to go through
>> ProcessUtility(). That is not the sort of quality that our users have
>> come to expect from PostgreSQL.

> The above functionality is sufficient to allow DDL replication. What
> else do we want to do that requires some other capability?

I think you and Robert are talking past each other. Whether it is or
is not sufficient for DDL replication is not what he is concerned about;
what he's worried about is whether we can refactor the backend in future
without causing a user-visible change in the events that event triggers
see. I think that that is an entirely valid concern, and it's not going
to go away on the strength of "but this is enough to let us do
replication".

The other point I'd make here is what I just said to Dimitri: it's far
from proven, AFAIK, that this actually *is* sufficient to allow DDL
replication. I'd be a lot happier if we had a working proof of that
before we lock down a SQL-level facility that we're going to have a hard
time changing the details of. Maybe we should just introduce some
C-level hooks and let you guys go off and code replication using the
hooks, before we put in a lot of expensive infrastructure that might
turn out to be Not Quite The Right Thing. After we *know* the hooks
are in the right places and supply the right information, we can look at
replacing them with some more formalized facility.

regards, tom lane


From: Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Simon Riggs <simon(at)2ndQuadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndQuadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Event Triggers: adding information
Date: 2013-01-17 22:48:23
Message-ID: m2ehhjbf48.fsf@2ndQuadrant.fr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
> Well, that's already a problem, because as Robert keeps saying, what
> goes through utility.c and what doesn't is pretty random right at the
> moment, and we shouldn't expose that behavior to users for fear of not
> being able to change it later.

It didn't feel that random to me. In most cases, the rule is that if the
system wants to execute a command that you could have typed as a user in
the prompt, then it hacks together a parsenode and call ProcessUtility.

The main exception to that rule is the CASCADE implementation, for
reasons that you detailed in that thread earlier.

> I think we could possibly ship event triggers defined as start and end
> of a *top level* command and have them working reliably in 9.3. If you
> want users to be looking at subcommands, there is a lot more work to do
> there than we have any chance of getting done for 9.3 (if it is to ship
> in 2013, that is).

By default, you only get top level in what I've been cooking. I think
the other contexts are needed for the logical replication, and maybe it
would be good enough if they are restricted to either only internal code
or event triggers coded in C. What do you think?

> Alternatively, if you want to get something into 9.3 that has not
> necessarily got a long-term-stable API, I'd be inclined to suggest that
> we forget about a SQL-level event trigger facility for now, and just put
> in some hook call points. It's pretty well established that we're
> willing to change the API seen by hook functions across major releases.

Or just a hook. That would want to have about the exact same amount of
information as the Event Trigger anyway, so I'm thinking we could maybe
do that the same way as the parsetree exposure?

Another idea would be yet another GUC, ala allow_system_table_mods, so
that only intrepid users can have at it. Well, as long as the logical
replication use case is covered, I'm done here, so I want to hear from
Simon and Andres on that (and maybe the Slony and Londiste guys, etc),
and from you to triage what is possible and what is crazy.

> TBH this might be the best thing anyway if your long-term goals have to
> do with replication, because it'd be a lot cheaper to get to the point
> where you could write the replication code and see if it all actually
> works. I'm fairly concerned that we might spend many man-months getting
> event triggers with definitions A,B,C into the system, only to find out
> later that what is really needed for logical replication is definitions
> D,E,F.

I'm worried about that too, and that's why I'm trying to remain general
and quite transparent about the way the system actually works.

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>
Cc: Simon Riggs <simon(at)2ndQuadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndQuadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Event Triggers: adding information
Date: 2013-01-17 23:22:10
Message-ID: 19773.1358464930@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr> writes:
> Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
>> Well, that's already a problem, because as Robert keeps saying, what
>> goes through utility.c and what doesn't is pretty random right at the
>> moment, and we shouldn't expose that behavior to users for fear of not
>> being able to change it later.

> It didn't feel that random to me. In most cases, the rule is that if the
> system wants to execute a command that you could have typed as a user in
> the prompt, then it hacks together a parsenode and call ProcessUtility.

Uh, no, there's no such "rule". In some places it was convenient to do
that so people did it --- but often it's easier to just call the
appropriate function directly, especially if you have any special
locking or permissions requirements. I don't think there's any great
consistency to it.

To be a little more concrete, I looked through backend/catalog and
backend/commands, which ought to pretty much cover all the places where
commands do things that could be considered subcommands. I find three
uses of ProcessUtility:

extension.c: uses ProcessUtility to handle non-DML commands read from an
extension script.

schemacmds.c: CreateSchemaCommand uses ProcessUtility for any subcommand
of a CREATE SCHEMA statement. This is really just direct execution of
something the user typed, it's not the system "creating" a subcommand.

trigger.c: ConvertTriggerToFK uses ProcessUtility to execute a consed-up
ALTER TABLE ADD FOREIGN KEY command in place of a series of legacy CREATE
CONSTRAINT TRIGGER commands. Now this is the very definition of ugly,
and shouldn't drive our design decisions --- but I think that any user
event trigger would be darn surprised to find this being emitted,
especially when the two preceding CREATE CONSTRAINT TRIGGERs hadn't done
any such thing, and indeed hadn't created any constraint trigger.

AFAICT, everything else in those directories is using direct calls and
not going through utility.c. So the only other cases where a trigger in
ProcessUtility would trap subcommands are where those subcommands are
things that parse_utilcmd.c generated during expansion of a CREATE TABLE
or such. And that whole area is something that I feel strongly is
implementation detail we don't particularly want to expose to users.
For instance, the fact that a UNIQUE clause in a CREATE TABLE works that
way and not at some lower level is nothing but implementation artifact.

[ pokes around a bit more... ] Having now actually looked at every call
point of ProcessUtility in the current code, I find myself agreeing very
much with Robert: we do *not* want to expose that exact set of events to
users. Except for the original top-level commands, it's almost entirely
implementation artifacts that can and will change from release to
release.

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Event Triggers: adding information
Date: 2013-01-18 01:13:16
Message-ID: CA+TgmoYxRtC-w1R1HTLqE3s0CB6NtRY04JK+TSM1cCLbtrQAZg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Jan 17, 2013 at 4:43 PM, Dimitri Fontaine
<dimitri(at)2ndquadrant(dot)fr> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> Goal: Every time an ALTER command is used on object *that actually
>> exists*, we will call some user-defined function and pass the object
>> type, the OID of the object, and some details about what sort of
>> alteration the user has requested.
>
> Ok, in current terms of the proposed patch, it's a ddl_command_end event
> trigger, and you can choose when to fire it in utility.c. The current
> place is choosen to be just after the AlterTable() call,

/me scratches my head.

Depends. What I described would fire before the command was executed,
not after, so it's not quite the same thing, but it might be just as
good for your purposes. It wasn't so much intended as "we should do
this exact thing" as much as "this is the kind of thought process that
you need to go through if you want to safely run an event trigger in
the middle of another command".

Now, maybe we don't actually need to do that to serve your purposes.
As already discussed, it seems like passing information to
ddl_command_end is for most purposes sufficient for what you need -
and we can do that without any special gymnastics, because that way it
runs completely after the command, not in the middle. If I understand
correctly, and I may not, there are basically two problems with that
approach:

(1) It's not exactly clear how to pass the information about the
statement through to the ddl_command_end trigger. I know you've
proposed a couple of things, but none of them seem completely
satisfying. At least not to me.

(2) If the method of transmission is "the OIDs of the affected
objects", which I believe is your current proposal, the
ddl_command_end trigger can't go look those objects up in the catalog,
because the catalog entries are already gone by that point.

It's possible we could solve both of those problems without running
event triggers in the middle of commands at all. In which case, my
whole example is moot for now. But I think it would be smart to get
ddl_command_end (without any additional information) committed first,
and then argue about these details, so that we at least make some
definable progress here.

> because that
> sounded logical if you believe in CommandCounterIncrement.

I'm somewhat bemused by this comment. I don't think
CommandCounterIncrement() is an article of faith. If you execute a
command to completion, and do a CommandCounterIncrement(), then
whatever you do next will look just like a new command, so it should
be safe to run user-provided code there. But if you stop in the
middle of a command, do a CommandCounterIncrement(), run some
user-provided code, and then continue on, the
CommandCounterIncrement() by itself protects you from nothing. If the
code is not expecting arbitrary operations to be executed at that
point, and you execute them, you get to keep both pieces. Indeed,
there are some places in the code where inserting a
CommandCounterIncrement() *by itself* could be unsafe. I don't
believe that's a risk in ProcessUtility(), but that doesn't mean there
aren't any risks in ProcessUtility().

>> So, instead, what we need to do is go into each function that
>> implements ALTER, and modify it so that after it does the dance where
>> we check permissions, take locks, and look up names, we call the
>> trigger function. That's a bit of a nuisance, because we probably
>> have a bunch of call sites to go fix, but in theory it should be
>> doable. The problem of course, is that it's not intrinsically safe to
>> call user-defined code there. If it drops the object and creates a
>> new one, say, hilarity will probably ensue.
>
> You're trying to find a dangerous point when to fire the trigger here,

Yes, I am. I'm not asking you to implement what I proposed there -
I'm just giving an example of how to make a dangerous place safe.
You've ALSO found a dangerous point when to fire the trigger. The
difference is that my example dangerous point is probably fixable to
be safe, whereas your actual dangerous point is probably unfixable,
because there are many current paths of execution that go through that
function in an extremely ad-hoc fashion, and there may be more in the
future. ddl_command_start and ddl_command_end are safe enough *when
restricted to toplevel commands*, but that's about as far as it goes.
Perhaps there are other special cases that are also safe, but just
throwing everything into the pot with no analysis and no
future-proofing certainly isn't.

>> Now, there is a further problem: all of the information about the
>> ALTER statement that we want to provide to the trigger may not be
>> available at that point. Simple cases like ALTER .. RENAME won't
>> require anything more than that, but things like ALTER TABLE .. ADD
>> FOREIGN KEY get tricky, because while at this point we have a solid
>> handle on the identity of the relation to which we're adding the
>> constraint, we do NOT yet have knowledge of or a lock on the TARGET
>> relation, whose OID could still change on us up to much later in the
>> computation. To get reliable information about *that*, we'll need to
>> refactor the sequence of events inside ALTER TABLE.
>
> The only valid answer here has already been given by Tom. You can only
> provide the information if it's available in the catalogs. So again,
> it's a ddl_command_end event. It can not happen before.

I don't see why you say that. It's perfectly possible to have that
information available at the right time; the work just hasn't been
done yet.

>> Hopefully you can see what I'm driving toward here. In a design like
>> this, we can pretty much prove - after returning from the event
>> trigger - that we're in a state no different from what would have been
>> created by executing a series of commands - lock table, then SELECT
>> event_trigger_func(), then the actual ALTER in sequence. Maybe
>> there's a flaw in that analysis - that's what patch review is for -
>> but it sounds pretty darn tight to me.
>
> The only case when we need to do a lookup BEFORE actually running the
> command is when that command is a DROP, because that's the only timing
> when the information we want is still in the catalogs.
>
> So that's the only case where we do a double object lookup, and we take
> a ShareUpdateExclusiveLock lock on the object when doing so, so that we
> can't lose it from another concurrent activity.

As noted, I don't think that's sufficient to really guarantee no shenanigans.

One idea would be to save off all the names of the objects actually
dropped at the time the drops are done, and then pass it to the event
trigger afterward. But frankly I think that's way beyond the scope of
what we should try to get done for 9.3.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Event Triggers: adding information
Date: 2013-01-18 01:23:51
Message-ID: CA+TgmoZJ0DUK1fN+6QtW7hVgYNa8MK=t8DUtixv+xjqaK_9mCA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Jan 17, 2013 at 5:09 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Perhaps it would improve matters if we refactored DDL processing so that
> there were separate "parse analysis" and "execution" phases, where parse
> analysis is (perhaps among other responsibilities) responsible for
> identifying and locking all objects to be used in the command. I note
> that locking the referenced tables is the responsibility of the parse
> analysis step in DML processing, so there's solid precedent for this.
> Also, we have some of this approach already for certain commands such
> as CREATE TABLE, cf parse_utilcmd.c.
>
> If we did that, then it'd be feasible to fire event triggers after the
> parse analysis step, and the rechecking that Robert describes could be
> encapsulated as "redo the parse analysis and see if the result changed".
>
> It's not clear to me just how this ought to extend to the cascaded-DROP
> or inherited-table-ALTER cases, but hey, it's only the beginnings of
> an idea.

This is pretty much where I think we should be headed, long term. I
believe it would help with the previously-stated desire to have
EXPLAIN ALTER TABLE and similar, too. I am not sure it's necessary to
go this far to get some of what Dimitri wants, but it may be a
necessary prerequisite to getting all of it.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndQuadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndQuadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Event Triggers: adding information
Date: 2013-01-18 01:33:34
Message-ID: 20130118013334.GE3074@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2013-01-17 23:48:23 +0100, Dimitri Fontaine wrote:
> Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
> > Alternatively, if you want to get something into 9.3 that has not
> > necessarily got a long-term-stable API, I'd be inclined to suggest that
> > we forget about a SQL-level event trigger facility for now, and just put
> > in some hook call points. It's pretty well established that we're
> > willing to change the API seen by hook functions across major releases.
>
> Or just a hook. That would want to have about the exact same amount of
> information as the Event Trigger anyway, so I'm thinking we could maybe
> do that the same way as the parsetree exposure?
>
> Another idea would be yet another GUC, ala allow_system_table_mods, so
> that only intrepid users can have at it. Well, as long as the logical
> replication use case is covered, I'm done here, so I want to hear from
> Simon and Andres on that (and maybe the Slony and Londiste guys, etc),
> and from you to triage what is possible and what is crazy.

> > TBH this might be the best thing anyway if your long-term goals have to
> > do with replication, because it'd be a lot cheaper to get to the point
> > where you could write the replication code and see if it all actually
> > works. I'm fairly concerned that we might spend many man-months getting
> > event triggers with definitions A,B,C into the system, only to find out
> > later that what is really needed for logical replication is definitions
> > D,E,F.

I have no problem requiring C code to use the even data, be it via hooks
or via C functions called from event triggers. The problem I have with
putting in some hooks is that I doubt that you can find sensible spots
with enough information to actually recreate the DDL for a remote system
without doing most of the work for command triggers.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Craig Ringer <craig(at)2ndQuadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Simon Riggs <simon(at)2ndQuadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, Alvaro Herrera <alvherre(at)2ndQuadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Event Triggers: adding information
Date: 2013-01-18 02:27:36
Message-ID: 50F8B318.1060407@2ndQuadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 01/18/2013 04:24 AM, Tom Lane wrote:
> Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
>> On 17 January 2013 16:15, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>>> It pains me that I've evidently failed to communicate this concept
>>> clearly despite a year or more of trying. Does that make sense? Is
>>> there some way I can make this more clear? The difference seems very
>>> clear-cut to me, but evidently I'm not conveying it well.
>> "The point", i.e. the main use case, is to replicate the DDL in a useful form.
>> Discussions and reasoning need to focus on the main use case, not on
>> weird futures or qualitative points.
> I have to completely disagree with that. If we don't want PostgreSQL
> to soon subside into an unfixable morass, as I think Brooks puts it,
> we must *not* simply patch things in a way that expediently provides
> an approximation of some desired feature. We have to consider, and put
> substantial weight on, having a clean and maintainable system design.
>
> This is particularly the case if we're talking about a feature that
> is going to expose backend internals to users to any degree. We're
> either going to be constrained to not change those internals any more,
> or expect to break users' applications when we do change them, and
> neither result is very appealing.
I actually wonder if - for now - breaking applications that use certain
limited well defined features is an acceptable outcome in exchange for
getting a usable first version in place. I'm approaching this from more
of an end user PoV: meeting user needs with the least possible pain to
users and to the Pg team.

A non-forward compatible feature wouldn't be OK to rely on in end user
applications, but it'd potentially be usable (if not ideal) for things
like replication tools, where practically anything is better than the
current situation with DDL. They could special case their integration
code by server version if they had to, adding version-specific triggers
and handling code. The user application wouldn't care about the changes,
they'd just need to make sure they upgraded Slony (or whatever) whenever
they upgraded to a new Pg major version.

I'm not suggesting that as a long term measure, but as a possible
stop-gap that'd:

- Allow replication to start using these features now, so they'd be
ready and more mature when the final version is available; and
- Permit prototyping of applications to use the new replication features

What I'm thinking about is packaging the client interface in an
extension that's explicitly documented as not cross-version compatible
and that must be explicitly loaded by named version in order to get the
feature. This would require a function-like interface for setup; if
that's not possible, an alternative might be a GUC that must be
explicitly set in postgresql.conf (only, not via SET) like
"enable_non_forward_compatible_features" or "enable_experimental_features".

I suspect that if this feature goes in, even in an ugly form, it'll help
guide the development of a final version concurrently with the cleanups
and refactoring that appears to be necessary.

Note that non-forward-compatible changes have been made repeatedly in
the system catalogs, despite the fact that end user applications are
known to use them via JDBC drivers and libpq, as well as directly. I
wonder if we can give this the same kind of compatibility requirements,
where it must not break within a given major version but may break
between them.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Craig Ringer <craig(at)2ndQuadrant(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndQuadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndQuadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Event Triggers: adding information
Date: 2013-01-18 02:31:28
Message-ID: 50F8B400.1090701@2ndQuadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 01/18/2013 09:33 AM, Andres Freund wrote:
> I have no problem requiring C code to use the even data, be it via
> hooks or via C functions called from event triggers. The problem I
> have with putting in some hooks is that I doubt that you can find
> sensible spots with enough information to actually recreate the DDL
> for a remote system without doing most of the work for command triggers.

Is that so much of a problem if it's OK to break it between major
versions? Maybe require compilation with -DPG_NON_FORWARD_COMPATIBLE to
make the hooks visible in the headers, so nobody can claim they didn't know?

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Craig Ringer <craig(at)2ndQuadrant(dot)com>
Cc: Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndQuadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndQuadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Event Triggers: adding information
Date: 2013-01-18 02:34:37
Message-ID: 20130118023437.GH3074@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2013-01-18 10:31:28 +0800, Craig Ringer wrote:
> On 01/18/2013 09:33 AM, Andres Freund wrote:
> > I have no problem requiring C code to use the even data, be it via
> > hooks or via C functions called from event triggers. The problem I
> > have with putting in some hooks is that I doubt that you can find
> > sensible spots with enough information to actually recreate the DDL
> > for a remote system without doing most of the work for command triggers.
>
> Is that so much of a problem if it's OK to break it between major
> versions? Maybe require compilation with -DPG_NON_FORWARD_COMPATIBLE to
> make the hooks visible in the headers, so nobody can claim they didn't know?

I can't follow the connection between what you quote from my mail and
what you are saying ;)

The problem I see is providing enough information to the hooks, not
cross version compatibility. If we go the hook route I don't have the
slightest problem of breaking them in the next releases.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Event Triggers: adding information
Date: 2013-01-18 02:35:18
Message-ID: 23345.1358476518@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Thu, Jan 17, 2013 at 4:43 PM, Dimitri Fontaine
> <dimitri(at)2ndquadrant(dot)fr> wrote:
>> because that
>> sounded logical if you believe in CommandCounterIncrement.

> I'm somewhat bemused by this comment. I don't think
> CommandCounterIncrement() is an article of faith. If you execute a
> command to completion, and do a CommandCounterIncrement(), then
> whatever you do next will look just like a new command, so it should
> be safe to run user-provided code there. But if you stop in the
> middle of a command, do a CommandCounterIncrement(), run some
> user-provided code, and then continue on, the
> CommandCounterIncrement() by itself protects you from nothing. If the
> code is not expecting arbitrary operations to be executed at that
> point, and you execute them, you get to keep both pieces. Indeed,
> there are some places in the code where inserting a
> CommandCounterIncrement() *by itself* could be unsafe.

I think it's quite likely that we should *not* expect that we can safely
do CommandCounterIncrement at any random place. That isn't just "x++".
It causes relcache and catcache flushes for anything affected by the
command-so-far, since it's making the command's results visible --- and
the code may not be expecting the results to become visible yet, and
almost certainly isn't expecting cache entries to disappear under it.
So a CCI could break things whether or not the event trigger itself did
anything at all.

But this is just one part of the general problem that injecting random
code mid-command is risky as hell, and isn't likely to become less so.
Frankly I don't really wish to see that happen, and don't immediately
see good end-user reasons (as opposed to structure-of-the-current-code
reasons) to need it. I'd really like to get to a point where we can
define things as happening like this:

* collect information needed to interpret the DDL command
(lookup and lock objects, etc)
* fire "before" event triggers, if any (and if so, recheck info)
* do the command
* fire "after" event triggers, if any

This might require that the command execution save aside information
that would help us fire appropriate event triggers later. But I'd much
rather pay that overhead than fire triggers mid-command just because
that's where the info is currently available.

BTW, how badly do we need the "before" case at all? If we were to
restrict our ambitions to "after" triggers, it would perhaps be
relatively simple for DDL execution to create data structures noting
what it does as it goes along, and then use those to fire "after"
triggers at the end (and provide desired info to them too). Still
a lot of new code of course, but not very much risk involved.

[ reads further... ] Robert seems to have pretty much that same idea
down at the end:

> One idea would be to save off all the names of the objects actually
> dropped at the time the drops are done, and then pass it to the event
> trigger afterward.

regards, tom lane


From: Craig Ringer <craig(at)2ndQuadrant(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndQuadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndQuadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Event Triggers: adding information
Date: 2013-01-18 02:36:04
Message-ID: 50F8B514.2050105@2ndQuadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 01/18/2013 10:34 AM, Andres Freund wrote:
> On 2013-01-18 10:31:28 +0800, Craig Ringer wrote:
>> On 01/18/2013 09:33 AM, Andres Freund wrote:
>>> I have no problem requiring C code to use the even data, be it via
>>> hooks or via C functions called from event triggers. The problem I
>>> have with putting in some hooks is that I doubt that you can find
>>> sensible spots with enough information to actually recreate the DDL
>>> for a remote system without doing most of the work for command triggers.
>> Is that so much of a problem if it's OK to break it between major
>> versions? Maybe require compilation with -DPG_NON_FORWARD_COMPATIBLE to
>> make the hooks visible in the headers, so nobody can claim they didn't know?
> I can't follow the connection between what you quote from my mail and
> what you are saying ;)
... in which case I probably misunderstood part of the discussion. I'm
trying to skim things fairly quickly in order to keep up and continue
working through the CF backlog.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Event Triggers: adding information
Date: 2013-01-18 02:48:01
Message-ID: 23579.1358477281@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andres Freund <andres(at)2ndquadrant(dot)com> writes:
> I have no problem requiring C code to use the even data, be it via hooks
> or via C functions called from event triggers. The problem I have with
> putting in some hooks is that I doubt that you can find sensible spots
> with enough information to actually recreate the DDL for a remote system
> without doing most of the work for command triggers.

"most of the work"? No, I don't think so. Here's the thing that's
bothering me about that: if the use-case that everybody is worried about
is replication, then triggers of any sort are the Wrong Thing, IMO.

The difference between a replication hook and a trigger is that a
replication hook has no business changing any state of the local system,
whereas a trigger *has to be expected to change the state of the local
system* because if it has no side-effects you might as well not bother
firing it. And the fear of having to cope with arbitrary side-effects
occuring in the midst of DDL is about 80% of my angst with this whole
concept.

If we're only interested in replication, let's put in some hooks whose
contract does not allow for side-effects on the local catalogs, and be
done. Otherwise we'll be putting in man-years of unnecessary (or at
least unnecessary for this use-case) work.

regards, tom lane


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Event Triggers: adding information
Date: 2013-01-18 03:01:25
Message-ID: 20130118030124.GI3074@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2013-01-17 21:48:01 -0500, Tom Lane wrote:
> Andres Freund <andres(at)2ndquadrant(dot)com> writes:
> > I have no problem requiring C code to use the even data, be it via hooks
> > or via C functions called from event triggers. The problem I have with
> > putting in some hooks is that I doubt that you can find sensible spots
> > with enough information to actually recreate the DDL for a remote system
> > without doing most of the work for command triggers.

> "most of the work"? No, I don't think so. Here's the thing that's
> bothering me about that: if the use-case that everybody is worried about
> is replication, then triggers of any sort are the Wrong Thing, IMO.

> The difference between a replication hook and a trigger is that a
> replication hook has no business changing any state of the local system,
> whereas a trigger *has to be expected to change the state of the local
> system* because if it has no side-effects you might as well not bother
> firing it. And the fear of having to cope with arbitrary side-effects
> occuring in the midst of DDL is about 80% of my angst with this whole
> concept.

I *personally* would be *very* happy to get the capability to do two
things:
a) get enough information to replicate DDL to another system
b) raise an error that aborts the current action

Imo b) fits well enough with not allowing arbitrary side effects.

> If we're only interested in replication, let's put in some hooks whose
> contract does not allow for side-effects on the local catalogs, and be
> done. Otherwise we'll be putting in man-years of unnecessary (or at
> least unnecessary for this use-case) work.

Its a thing of perspective I guess. I can't imagine a hook-ey solution,
without quite a bit of work, that gets enough information to regenerate
SQL that performs the same action on another system. ISTM that the
refactoring to make that consistently "easy" is the hard part, but I
hope I am wrong.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Event Triggers: adding information
Date: 2013-01-18 03:14:04
Message-ID: 24062.1358478844@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andres Freund <andres(at)2ndquadrant(dot)com> writes:
> On 2013-01-17 21:48:01 -0500, Tom Lane wrote:
>> If we're only interested in replication, let's put in some hooks whose
>> contract does not allow for side-effects on the local catalogs, and be
>> done. Otherwise we'll be putting in man-years of unnecessary (or at
>> least unnecessary for this use-case) work.

> Its a thing of perspective I guess. I can't imagine a hook-ey solution,
> without quite a bit of work, that gets enough information to regenerate
> SQL that performs the same action on another system. ISTM that the
> refactoring to make that consistently "easy" is the hard part, but I
> hope I am wrong.

The problem of how to get the information needed is real and large,
I agree. But that's not any easier for a trigger --- if anything,
it's probably harder, because then you not only need to *get* the
information but you have to figure out how to provide it in a way
that makes sense at SQL level.

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Event Triggers: adding information
Date: 2013-01-18 03:39:18
Message-ID: CA+TgmobU2xZ7hPh24HYmvSxO7kbXUtY57bePX+mpB5aywTz2HA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Jan 17, 2013 at 8:33 PM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> I have no problem requiring C code to use the even data, be it via hooks
> or via C functions called from event triggers. The problem I have with
> putting in some hooks is that I doubt that you can find sensible spots
> with enough information to actually recreate the DDL for a remote system
> without doing most of the work for command triggers.

It should be noted that the point of KaiGai's work over the last three
years has been to solve exactly this problem. Well, KaiGai wants to
check security rather than do replication, but he wants to be able
grovel through the entire node tree and make security decisions based
on stuff core PG doesn't care about, so in effect the requirements are
identical. Calling the facility "event triggers" rather than "object
access hooks" doesn't make the underlying problem any easier to solve.
I actually believe that the object access hook stuff is getting
pretty close to a usable solution if you don't mind coding in C, but
I've had trouble convincing anyone else of that.

I find it a shame that it hasn't been taken more seriously, because it
really does solve the same problem. sepgsql, for example, has no
trouble at all checking permissions for dropped objects. You can't
call procedural code from the spot where we've got that hook, but you
sure can call C code, with the usual contract that if it breaks you
get to keep both pieces. The CREATE stuff works fine too. Support
for ALTER is not all there yet, but that's because it's a hard
problem.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Event Triggers: adding information
Date: 2013-01-18 09:18:56
Message-ID: m2622u97cv.fsf@2ndQuadrant.fr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> I'm somewhat bemused by this comment. I don't think
>> CommandCounterIncrement() is an article of faith. If you execute a
>> command to completion, and do a CommandCounterIncrement(), then
>> whatever you do next will look just like a new command, so it should
>> be safe to run user-provided code there. But if you stop in the

Exactly my thinking.

>> middle of a command, do a CommandCounterIncrement(), run some
>> user-provided code, and then continue on, the
>> CommandCounterIncrement() by itself protects you from nothing. If the

Yes. That's why I've been careful not to add CommandCounterIncrement()
when adding the Event Trigger facilities. The only one we added is so
that the next trigger sees what the previous one just did. That's all.

So the patch is not adding CCI calls to pretend that an existing place
in the code is now safe, it's relying on CCI as existing protocol in the
implementation to denote existing places in the code where calling user
code should be safe, because a logical block of a command just has been
done and is finished.

>> code is not expecting arbitrary operations to be executed at that
>> point, and you execute them, you get to keep both pieces. Indeed,
>> there are some places in the code where inserting a
>> CommandCounterIncrement() *by itself* could be unsafe.

I agree, it's pretty obvious. And that's why I'm not adding any.

> I think it's quite likely that we should *not* expect that we can safely
> do CommandCounterIncrement at any random place. That isn't just "x++".

Agreed. That's what the patch is doing, relying on the ones that are
already in the code, and being very careful not to add any new one.

> It causes relcache and catcache flushes for anything affected by the
> command-so-far, since it's making the command's results visible --- and
> the code may not be expecting the results to become visible yet, and
> almost certainly isn't expecting cache entries to disappear under it.
> So a CCI could break things whether or not the event trigger itself did
> anything at all.

Yes.

> But this is just one part of the general problem that injecting random
> code mid-command is risky as hell, and isn't likely to become less so.

Yes.

That's why there's nothing in the proposed patch that does that.

The user code is run either before the command starts, the event is
called ddl_command_start, and in the case of a DROP command is has to do
an extra lookup only to guarantee that we're not trying to inject user
code in a random point in the execution of the existing code.

Or the user code is run after the command is done, when it has called
itself CommandCounterIncrement() and is ready to resume execution to its
caller, and that event is called ddl_command_end.

> Frankly I don't really wish to see that happen, and don't immediately
> see good end-user reasons (as opposed to structure-of-the-current-code
> reasons) to need it. I'd really like to get to a point where we can
> define things as happening like this:
>
> * collect information needed to interpret the DDL command
> (lookup and lock objects, etc)
> * fire "before" event triggers, if any (and if so, recheck info)
> * do the command
> * fire "after" event triggers, if any

I think that if you do the lookup and lock of objects before running the
command, then you also want to do it again after the event trigger has
run, because you just don't know what it did.

That's why I implemented that way:

* no information is given to ddl_command_start event triggers

* information is collected while the code runs normally

* the ddl_command_end event has the information and only uses it if the
object still exists

Now, the extra complexity is that we also want the information at
ddl_command_start for a DROP command, so there's an explicit extra
lookup here in *only that case* and *only when some event triggers are
registered* (which is checked from a cache lookup), and only when
there's a single object targeted by the command.

And when doing that extra lookup we take a ShareUpdateExclusiveLock on
the object so that someone else won't drop it before we get to run the
Event Trigger User Function.

> This might require that the command execution save aside information
> that would help us fire appropriate event triggers later. But I'd much
> rather pay that overhead than fire triggers mid-command just because
> that's where the info is currently available.

And that's what is implemented in my patch. The information that's kept
aside is the OID of the object being the target of the command.

Last year, in the patch that didn't make it in 9.2, I used to keep the
whole EventTriggerData structure around and fill it in (it had another
name in the 9.2 era patch) from all the commands/* implementations.
Robert didn't like that, at all, because of the code foot print IIRC.

So I've been very careful in that version of the patch not to implement
things exactly as you're saying, because what you're saying looks very
much like my first approach from last year, the one that crashed and
burnt.

For full disclosure though, IIRC, the patch from that era also did try
to call Event Trigger User Code at dangerous places in the middle of the
commands implementation. That was removed before the end of the commit
fests, though, but we then had to say no for 9.2.

> BTW, how badly do we need the "before" case at all? If we were to
> restrict our ambitions to "after" triggers, it would perhaps be
> relatively simple for DDL execution to create data structures noting
> what it does as it goes along, and then use those to fire "after"
> triggers at the end (and provide desired info to them too). Still
> a lot of new code of course, but not very much risk involved.

I personally need the ddl_command_start case for implementing some
CREATE EXTENSION features in user code. And I would need to also have
the extension's name from the parsetree at that point. In the current
state of the patch, that will mean that I need a little new function
coded in C and loaded in the backend to be able to get at it.

> [ reads further... ] Robert seems to have pretty much that same idea
> down at the end:
>
>> One idea would be to save off all the names of the objects actually
>> dropped at the time the drops are done, and then pass it to the event
>> trigger afterward.

We can go even further in simplicity.

Document that you only get information about objects that exists at the
time the event trigger user function is run. Then you don't have any
specific information for a DROP command at ddl_command_end, and you
don't need any code to make it happen. You also don't have the
information at ddl_command_start for a CREATE operation.

That happens to be what my patch implements, too.

I think that the refactoring of the DDL commands in several explicit
steps of parsing, analysis then execution is the way forward, and I
didn't see any way to provide that in 1 cycle (that was my first target,
remember), and even if I had then known that it would take 2 cycles for
the limited edition of the feature, I don't think I would have called
that path realistic. As you said, this refactoring looks more like a 3
cycles effort in itself, let alone building anything on top of it.

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Event Triggers: adding information
Date: 2013-01-18 14:07:21
Message-ID: 20130118140721.GA29501@alap2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2013-01-17 22:39:18 -0500, Robert Haas wrote:
> On Thu, Jan 17, 2013 at 8:33 PM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> > I have no problem requiring C code to use the even data, be it via hooks
> > or via C functions called from event triggers. The problem I have with
> > putting in some hooks is that I doubt that you can find sensible spots
> > with enough information to actually recreate the DDL for a remote system
> > without doing most of the work for command triggers.
>
> It should be noted that the point of KaiGai's work over the last three
> years has been to solve exactly this problem. Well, KaiGai wants to
> check security rather than do replication, but he wants to be able
> grovel through the entire node tree and make security decisions based
> on stuff core PG doesn't care about, so in effect the requirements are
> identical. Calling the facility "event triggers" rather than "object
> access hooks" doesn't make the underlying problem any easier to solve.
> I actually believe that the object access hook stuff is getting
> pretty close to a usable solution if you don't mind coding in C, but
> I've had trouble convincing anyone else of that.
>
> I find it a shame that it hasn't been taken more seriously, because it
> really does solve the same problem. sepgsql, for example, has no
> trouble at all checking permissions for dropped objects. You can't
> call procedural code from the spot where we've got that hook, but you
> sure can call C code, with the usual contract that if it breaks you
> get to keep both pieces. The CREATE stuff works fine too. Support
> for ALTER is not all there yet, but that's because it's a hard
> problem.

I don't have a problem reusing the object access infrastructure at all. I just
don't think its providing even remotely enough. You have (co-)written that
stuff, so you probably know more than I do, but could you explain to me how it
could be reused to replicate a CREATE TABLE?

Problems I see:
- afaics for CREATE TABLE the only hook is in ATExecAddColumn
- no access to the CreateStm, making it impossible to decipher whether e.g. a
sequence was created as part of this or not
- No way to regenerate the table definition for execution on the remote system
without creating libpqdump.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Event Triggers: adding information
Date: 2013-01-18 14:58:53
Message-ID: CA+TgmoYL9o2MBjWdaUcRqj_B5J=k63rq8vKkcszDyH4CDT1AKQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Jan 18, 2013 at 9:07 AM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> On 2013-01-17 22:39:18 -0500, Robert Haas wrote:
>> On Thu, Jan 17, 2013 at 8:33 PM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
>> > I have no problem requiring C code to use the even data, be it via hooks
>> > or via C functions called from event triggers. The problem I have with
>> > putting in some hooks is that I doubt that you can find sensible spots
>> > with enough information to actually recreate the DDL for a remote system
>> > without doing most of the work for command triggers.
>>
>> It should be noted that the point of KaiGai's work over the last three
>> years has been to solve exactly this problem. Well, KaiGai wants to
>> check security rather than do replication, but he wants to be able
>> grovel through the entire node tree and make security decisions based
>> on stuff core PG doesn't care about, so in effect the requirements are
>> identical. Calling the facility "event triggers" rather than "object
>> access hooks" doesn't make the underlying problem any easier to solve.
>> I actually believe that the object access hook stuff is getting
>> pretty close to a usable solution if you don't mind coding in C, but
>> I've had trouble convincing anyone else of that.
>>
>> I find it a shame that it hasn't been taken more seriously, because it
>> really does solve the same problem. sepgsql, for example, has no
>> trouble at all checking permissions for dropped objects. You can't
>> call procedural code from the spot where we've got that hook, but you
>> sure can call C code, with the usual contract that if it breaks you
>> get to keep both pieces. The CREATE stuff works fine too. Support
>> for ALTER is not all there yet, but that's because it's a hard
>> problem.
>
> I don't have a problem reusing the object access infrastructure at all. I just
> don't think its providing even remotely enough. You have (co-)written that
> stuff, so you probably know more than I do, but could you explain to me how it
> could be reused to replicate a CREATE TABLE?
>
> Problems I see:
> - afaics for CREATE TABLE the only hook is in ATExecAddColumn

No, there's one also in heap_create_with_catalog. Took me a minute to
find it, as it does not use InvokeObjectAccessHook. The idea is that
OAT_POST_CREATE fires once per object creation, regardless of the
object type - table, column, whatever.

> - no access to the CreateStm, making it impossible to decipher whether e.g. a
> sequence was created as part of this or not

Yep, that's a problem. We could of course add additional hook sites
with relevant context information - that's what this infrastructure is
supposed to allow for.

> - No way to regenerate the table definition for execution on the remote system
> without creating libpqdump.

IMHO, that is one of the really ugly problems that we haven't come up
with a good solution for yet. If you want to replicate DDL, you have
basically three choices:

1. copy over the statement text that was used on the origin server and
hope none of the corner cases bite you
2. come up with some way of reconstituting a DDL statement based on
(a) the parse tree or (b) what the server actually decided to do
3. reconstitute the state of the object from the catalogs after the
command has run

(2a) differs from (2b) for things like CREATE INDEX, where the index
name might be left for the server to determine, but when replicating
you'd like to get the same name out. (3) is workable for CREATE but
not ALTER or DROP.

The basic problem here is that (1) and (3) are not very
reliable/complete and (2) is a lot of work and introduces a huge code
maintenance burden. But it's unfair to pin that on the object-access
hook mechanism - any reverse-parsing or catalog-deconstruction
solution for DDL is going to have that problem. The decision we have
to make as a community is whether we're prepared to support and
maintain that code for the indefinite future. Although I think it's
easy to say "yes, because DDL replication would be really cool" - and
I sure agree with that - I think it needs to be thought through a bit
more deeply than that.

I have been involved in PostgreSQL development for about 4.5 years
now. This is less time than many people here, but it's still long
enough to say a whole lot of people ask for some variant of this idea,
and yet I have yet to see anybody produce a complete, working version
of this functionality and maintain it outside of the PostgreSQL tree
for one release cycle (did I miss something?). If we pull that
functionality into core, that's just a way of taking work that
nobody's been willing to do and force the responsibility to be spread
across the whole development group. Now, sometimes it is OK to do
that, but sometimes it means that we're just bolting more things onto
the already-long list of reasons for which patches can be rejected.
Anybody who is now feeling uncomfortable at the prospect of not having
that facility in core ought to think about how they'll feel about,
say, one additional patch per year not getting committed in every
future release because of this requirement. Does that feel OK? What
if the number is two, or three? What's the tipping point where you'd
say the cost is too high? (Note: If anyone reading this is tempted to
answer that there is no such tipping point, then you have got a bad
case of patch myopia.)

There's a totally legitimate debate to be had here, but my feeling
about what you're calling libpqdump is:

- It's completely separate from event triggers.
- It's completely separate from object access hooks.
- The fact that noone to my knowledge has maintained such a thing
outside of core, and that it seems like a hard project with lots of
required maintenance, makes me very wary about pushing it into core.

[ BTW: Sorry our IM session got cut off. It started erroring out
every time I messaged you. But no problem, anyway. ]

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, Robert Haas <robertmhaas(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Event Triggers: adding information
Date: 2013-01-18 15:01:48
Message-ID: CA+U5nMJHQ+ow8ySiBmJ=61ed7o_4DB-QW8tQCLKaPi_kzcms9g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 18 January 2013 02:48, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Andres Freund <andres(at)2ndquadrant(dot)com> writes:
>> I have no problem requiring C code to use the even data, be it via hooks
>> or via C functions called from event triggers. The problem I have with
>> putting in some hooks is that I doubt that you can find sensible spots
>> with enough information to actually recreate the DDL for a remote system
>> without doing most of the work for command triggers.
>
> "most of the work"? No, I don't think so. Here's the thing that's
> bothering me about that: if the use-case that everybody is worried about
> is replication, then triggers of any sort are the Wrong Thing, IMO.
>
> The difference between a replication hook and a trigger is that a
> replication hook has no business changing any state of the local system,
> whereas a trigger *has to be expected to change the state of the local
> system* because if it has no side-effects you might as well not bother
> firing it. And the fear of having to cope with arbitrary side-effects
> occuring in the midst of DDL is about 80% of my angst with this whole
> concept.
>
> If we're only interested in replication, let's put in some hooks whose
> contract does not allow for side-effects on the local catalogs, and be
> done. Otherwise we'll be putting in man-years of unnecessary (or at
> least unnecessary for this use-case) work.

I would be happy to see something called "replication triggers" or
"replication hooks" committed.

The things I want to be able to do are:
* Replicate DDL for use on other systems, in multiple ways selected by
user, possibly >1 at same time
* Put in a block to prevent all DDL, for use during upgrades or just a
general lockdown, then remove it again when complete.
* Perform auditing of DDL statements (which requires more info than
simply the text submitted)

If there are questions about wider functionality, or risks with that,
then reducing the functionality is OK, for me. The last thing we need
is a security hole introduced by this, or weird behaviour from people
interfering with things. (If you want that, write an executor plugin
and cook your own spaghetti).

We can do this by declaring clearly that there are limits on what can
be achieved with event triggers, perhaps even testing to check bad
things haven't been allowed to occur. I don't see that as translating
into massive change in the patch, just focusing on the main cases,
documenting that and perhaps introducing some restrictions on wider
usage. We will likely slowly expand the functionality of event
triggers over time, so keeping things general still works as a long
range goal.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Event Triggers: adding information
Date: 2013-01-18 15:47:32
Message-ID: 20130118154732.GE29501@alap2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2013-01-18 09:58:53 -0500, Robert Haas wrote:
> On Fri, Jan 18, 2013 at 9:07 AM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> > I don't have a problem reusing the object access infrastructure at all. I just
> > don't think its providing even remotely enough. You have (co-)written that
> > stuff, so you probably know more than I do, but could you explain to me how it
> > could be reused to replicate a CREATE TABLE?
> >
> > Problems I see:
> > - afaics for CREATE TABLE the only hook is in ATExecAddColumn
>
> No, there's one also in heap_create_with_catalog. Took me a minute to
> find it, as it does not use InvokeObjectAccessHook. The idea is that
> OAT_POST_CREATE fires once per object creation, regardless of the
> object type - table, column, whatever.

Ah. Chose the wrong term to grep for. I am tempted to suggest adding a
comment explaining why we InvokeObjectAccessHook isn't used just for
enhanced grepability.

> > - No way to regenerate the table definition for execution on the remote system
> > without creating libpqdump.
>
> IMHO, that is one of the really ugly problems that we haven't come up
> with a good solution for yet. If you want to replicate DDL, you have
> basically three choices:
>
> 1. copy over the statement text that was used on the origin server and
> hope none of the corner cases bite you
> 2. come up with some way of reconstituting a DDL statement based on
> (a) the parse tree or (b) what the server actually decided to do
> 3. reconstitute the state of the object from the catalogs after the
> command has run
>
> (2a) differs from (2b) for things like CREATE INDEX, where the index
> name might be left for the server to determine, but when replicating
> you'd like to get the same name out. (3) is workable for CREATE but
> not ALTER or DROP.
>
> The basic problem here is that (1) and (3) are not very
> reliable/complete and (2) is a lot of work and introduces a huge code
> maintenance burden.

I agree with that analysis.

> But it's unfair to pin that on the object-access
> hook mechanism

I agree. I don't think anybody has pinned it on it though.

> - any reverse-parsing or catalog-deconstruction
> solution for DDL is going to have that problem. The decision we have
> to make as a community is whether we're prepared to support and
> maintain that code for the indefinite future. Although I think it's
> easy to say "yes, because DDL replication would be really cool" - and
> I sure agree with that - I think it needs to be thought through a bit
> more deeply than that.
>
> I have been involved in PostgreSQL development for about 4.5 years
> now. This is less time than many people here, but it's still long
> enough to say a whole lot of people ask for some variant of this idea,
> and yet I have yet to see anybody produce a complete, working version
> of this functionality and maintain it outside of the PostgreSQL tree
> for one release cycle (did I miss something?).

I don't really think thats a fair argument.

For one, I didn't really ask for a libpgdump - I said that I don't see
any way to generate re-executable SQL without it if we don't get the
parse-tree but only the oid of the created object. Not to speak of the
complexities of supporting ALTER that way (which is, as you note,
basically not the way to do this).

Also, even though I don't think its the right way *for this*, I think
pg_dump and pgadmin pretty much prove that it's possible? The latter
even is out-of-core and has existed for multiple years.

Its also not really fair to compare out-of-tree effort of maintaining
such a library to in-core support. pg_dump *needs* to be maintained, so
the additional maintenance overhead once the initial development is done
shouldn't really be higher than now. Lower if anything, due to getting
rid of a good bit of ugly code ;). There's no such effect out of core.

If youre also considering parsetree->SQL transformation under the
libpgdump umbrella its even less fair. The backend already has a *lot*
of infrastructure to regenerate sql from trees, even if its mostly
centered arround around DQL. A noticeable amount of that code is
unavailable to extensions (i.e. many are static funcs).
Imo its completely unreasonable to expose that code to the outside and
expect it to have a stable interface. We *will* need to rewrite parts of
that regularly.
For those reasons I think the only reasonable way to create textual DDL
is in the backend trying to allow outside code to do that will impose
far greater limitations.

Whats the way you suggest to go on about this?

> There's a totally legitimate debate to be had here, but my feeling
> about what you're calling libpqdump is:

> - It's completely separate from event triggers.
> - It's completely separate from object access hooks.

Well, it is one of the major use-cases (or even *the* major use case)
for event triggers that I heard of. So it seems a valid point in a
dicussion on how to design the whole mechanism, doesn't it?

Some version of the event trigger patch contained partial support for
regenerating the DDL so it seems like a justified point there. Your
complained that suggestions about reusing object access hooks were
ignored, so mentioning that they currently don't provide *enough* (they
*do* provide a part, but it doesn't seem to be the most critical one)
also seems justifyable.

> [ BTW: Sorry our IM session got cut off. It started erroring out
> every time I messaged you. But no problem, anyway. ]

NP, its good to keep the technical stuff here anyway... Stupid
technology. In which business are we in again?

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Event Triggers: adding information
Date: 2013-01-18 16:42:47
Message-ID: CA+Tgmoa-p-texRBzgetFgvaKwWJJ=a3XBCWMOQdg17aWtrDPNg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Jan 18, 2013 at 10:47 AM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
>> No, there's one also in heap_create_with_catalog. Took me a minute to
>> find it, as it does not use InvokeObjectAccessHook. The idea is that
>> OAT_POST_CREATE fires once per object creation, regardless of the
>> object type - table, column, whatever.
>
> Ah. Chose the wrong term to grep for. I am tempted to suggest adding a
> comment explaining why we InvokeObjectAccessHook isn't used just for
> enhanced grepability.

I cannot parse that sentence, but I agree that the ungreppability of
it is suboptimal. I resorted to git log -SInvokeObjectAccessHook -p
to find it.

>> I have been involved in PostgreSQL development for about 4.5 years
>> now. This is less time than many people here, but it's still long
>> enough to say a whole lot of people ask for some variant of this idea,
>> and yet I have yet to see anybody produce a complete, working version
>> of this functionality and maintain it outside of the PostgreSQL tree
>> for one release cycle (did I miss something?).
>
> I don't really think thats a fair argument.
>
> For one, I didn't really ask for a libpgdump - I said that I don't see
> any way to generate re-executable SQL without it if we don't get the
> parse-tree but only the oid of the created object. Not to speak of the
> complexities of supporting ALTER that way (which is, as you note,
> basically not the way to do this).

Oh, OK. I see. Well, I've got no problem making the parse tree
available via InvokeObjectAccessHook. Isn't that just a matter of
adding a new hook type and a new call site?

> Also, even though I don't think its the right way *for this*, I think
> pg_dump and pgadmin pretty much prove that it's possible? The latter
> even is out-of-core and has existed for multiple years.

Fair point.

> Its also not really fair to compare out-of-tree effort of maintaining
> such a library to in-core support. pg_dump *needs* to be maintained, so
> the additional maintenance overhead once the initial development is done
> shouldn't really be higher than now. Lower if anything, due to getting
> rid of a good bit of ugly code ;). There's no such effect out of core.

This I don't follow. Nothing we might add to core to reverse-parse
either the catalogs or the parse tree is going to make pg_dump go
away.

> If youre also considering parsetree->SQL transformation under the
> libpgdump umbrella its even less fair. The backend already has a *lot*
> of infrastructure to regenerate sql from trees, even if its mostly
> centered arround around DQL. A noticeable amount of that code is
> unavailable to extensions (i.e. many are static funcs).
> Imo its completely unreasonable to expose that code to the outside and
> expect it to have a stable interface. We *will* need to rewrite parts of
> that regularly.
> For those reasons I think the only reasonable way to create textual DDL
> is in the backend trying to allow outside code to do that will impose
> far greater limitations.

I'm having trouble following this. Can you restate? I wasn't sure
what you meant by libpqdump. I assumed you were speaking of a
parsetree->DDL or catalog->DDL facility.

> Well, it is one of the major use-cases (or even *the* major use case)
> for event triggers that I heard of. So it seems a valid point in a
> dicussion on how to design the whole mechanism, doesn't it?

Yes, but is a separate problem from how we give control to the event
trigger (or object access hook).

> Some version of the event trigger patch contained partial support for
> regenerating the DDL so it seems like a justified point there. Your
> complained that suggestions about reusing object access hooks were
> ignored, so mentioning that they currently don't provide *enough* (they
> *do* provide a part, but it doesn't seem to be the most critical one)
> also seems justifyable.

Sure, but we could if we wanted decide to commit the partial support
for regenerating the DDL and tell people to use it via object access
hooks. Of course, that would thin out even further the number of
people who would actually be using that code, which I fear will
already be too small to achieve bug-free operation in a reasonable
time. If we add some hooks now and someone maintains the DDL
reverse-parsing code as an out-of-core replication solution for a few
releases, and that doesn't turn out to be hideous, I'd be a lot more
sanguine about incorporating it at that point. I basically don't
think that there's any way we're going to commit something along those
lines now and not have it turn out to be a far bigger maintenance
headache than anyone wants. What that tends to end up meaning in
practice is that Tom gets suckered into fixing it because nobody else
can take the time, and that's neither fair nor good for the project.

> NP, its good to keep the technical stuff here anyway... Stupid
> technology. In which business are we in again?

I don't know, maybe we can figure it out based on the objects in our
immediate vicinity. I see twelve cans of paint, most of them opened,
and something that looks sort of like a badly-made shed. Does that
help at all? :-)

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Event Triggers: adding information
Date: 2013-01-18 17:08:57
Message-ID: 20130118170857.GK29501@alap2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2013-01-18 11:42:47 -0500, Robert Haas wrote:
> On Fri, Jan 18, 2013 at 10:47 AM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> >> No, there's one also in heap_create_with_catalog. Took me a minute to
> >> find it, as it does not use InvokeObjectAccessHook. The idea is that
> >> OAT_POST_CREATE fires once per object creation, regardless of the
> >> object type - table, column, whatever.
> >
> > Ah. Chose the wrong term to grep for. I am tempted to suggest adding a
> > comment explaining why we InvokeObjectAccessHook isn't used just for
> > enhanced grepability.
>
> I cannot parse that sentence, but I agree that the ungreppability of
> it is suboptimal. I resorted to git log -SInvokeObjectAccessHook -p
> to find it.

I was thinking of adding a comment like
/* We don't use InvokeObjectAccessHook here because we need to ... */
not because the comment in itself is important but because it allows to
grep ;)

> >> I have been involved in PostgreSQL development for about 4.5 years
> >> now. This is less time than many people here, but it's still long
> >> enough to say a whole lot of people ask for some variant of this idea,
> >> and yet I have yet to see anybody produce a complete, working version
> >> of this functionality and maintain it outside of the PostgreSQL tree
> >> for one release cycle (did I miss something?).
> >
> > I don't really think thats a fair argument.
> >
> > For one, I didn't really ask for a libpgdump - I said that I don't see
> > any way to generate re-executable SQL without it if we don't get the
> > parse-tree but only the oid of the created object. Not to speak of the
> > complexities of supporting ALTER that way (which is, as you note,
> > basically not the way to do this).
>
> Oh, OK. I see. Well, I've got no problem making the parse tree
> available via InvokeObjectAccessHook. Isn't that just a matter of
> adding a new hook type and a new call site?

Sure. Its probably not easy to choose the correct callsites though, they
need to be somewhat different for create & alter in comparison to drop.

create & alter
* after the object is created/altered
drop:
* after the object is locked, but *before* its dropped

As far as I understood thats basically the behind the ddl_trace event
trigger mentioned earlier.

> > Also, even though I don't think its the right way *for this*, I think
> > pg_dump and pgadmin pretty much prove that it's possible? The latter
> > even is out-of-core and has existed for multiple years.
>
> Fair point.

Imo its already a good justification for a libpgdump, not that I am
volunteering, but ...

> > Its also not really fair to compare out-of-tree effort of maintaining
> > such a library to in-core support. pg_dump *needs* to be maintained, so
> > the additional maintenance overhead once the initial development is done
> > shouldn't really be higher than now. Lower if anything, due to getting
> > rid of a good bit of ugly code ;). There's no such effect out of core.
>
> This I don't follow. Nothing we might add to core to reverse-parse
> either the catalogs or the parse tree is going to make pg_dump go
> away.

I was still talking about the hypothetical libpgdump we don't really
need for this ;)

> > If youre also considering parsetree->SQL transformation under the
> > libpgdump umbrella its even less fair. The backend already has a *lot*
> > of infrastructure to regenerate sql from trees, even if its mostly
> > centered arround around DQL. A noticeable amount of that code is
> > unavailable to extensions (i.e. many are static funcs).
> > Imo its completely unreasonable to expose that code to the outside and
> > expect it to have a stable interface. We *will* need to rewrite parts of
> > that regularly.
> > For those reasons I think the only reasonable way to create textual DDL
> > is in the backend trying to allow outside code to do that will impose
> > far greater limitations.
>
> I'm having trouble following this. Can you restate? I wasn't sure
> what you meant by libpqdump. I assumed you were speaking of a
> parsetree->DDL or catalog->DDL facility.

Yea, that wasn't really clear, sorry for that.

What I was trying to say that I think doing parstree->text conversion
out of tree is noticeably more complex and essentially places a higher
maintenance burden on the project because
1) the core already has a lot of infrastructure for such conversions
2) any external project would either need to copy that infrastructure or
make a large number of functions externally visible
3) any refactoring in that area would now break external tools even if
it would be trivial to adjust potential in-core support for such a
conversion

> > Some version of the event trigger patch contained partial support for
> > regenerating the DDL so it seems like a justified point there. Your
> > complained that suggestions about reusing object access hooks were
> > ignored, so mentioning that they currently don't provide *enough* (they
> > *do* provide a part, but it doesn't seem to be the most critical one)
> > also seems justifyable.
>
> Sure, but we could if we wanted decide to commit the partial support
> for regenerating the DDL and tell people to use it via object access
> hooks.

Yes, thats a possibility.

> Of course, that would thin out even further the number of
> people who would actually be using that code, which I fear will
> already be too small to achieve bug-free operation in a reasonable
> time.

Well, replication solutions seem to be the main potential consumer of
such a capability and while the C level stuff wouldn't be used by many,
hopefully the stuff built on top will.

> If we add some hooks now and someone maintains the DDL
> reverse-parsing code as an out-of-core replication solution for a few
> releases, and that doesn't turn out to be hideous, I'd be a lot more
> sanguine about incorporating it at that point.

As I said above, I think due to the available infrastructure doing this
out of tree is *far* harder and I fear that a good part of that code
would turn out not to be applicable for inclusion into core.

> I basically don't
> think that there's any way we're going to commit something along those
> lines now and not have it turn out to be a far bigger maintenance
> headache than anyone wants. What that tends to end up meaning in
> practice is that Tom gets suckered into fixing it because nobody else
> can take the time, and that's neither fair nor good for the project.

Now as is in "at this point of the cycle" or now as in "without a
several year old out-of-core project"?

> > NP, its good to keep the technical stuff here anyway... Stupid
> > technology. In which business are we in again?
>
> I don't know, maybe we can figure it out based on the objects in our
> immediate vicinity. I see twelve cans of paint, most of them opened,
> and something that looks sort of like a badly-made shed. Does that
> help at all? :-)

By the smell and looks from where I am sitting I seem to be a barrista
making rather deliciously smelling coffee and my boss seems to suck at
decorating ;). Oh, and I seem to have a terrible taste in music.

Regards,

Andres
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Event Triggers: adding information
Date: 2013-01-18 17:44:13
Message-ID: 10183.1358531053@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andres Freund <andres(at)2ndquadrant(dot)com> writes:
> On 2013-01-18 11:42:47 -0500, Robert Haas wrote:
>> I'm having trouble following this. Can you restate? I wasn't sure
>> what you meant by libpqdump. I assumed you were speaking of a
>> parsetree->DDL or catalog->DDL facility.

> Yea, that wasn't really clear, sorry for that.

> What I was trying to say that I think doing parstree->text conversion
> out of tree is noticeably more complex and essentially places a higher
> maintenance burden on the project because

Ah. That's not pg_dump's job though, so you're choosing a bad name for
it. What I think you are really talking about is extending the
deparsing logic in ruleutils.c to cover utility statements. Which is
not a bad plan in principle; we've just never needed it before. It
would be quite a lot of new code though.

An issue that would have to be thought about is that there are assorted
scenarios where we generate "parse trees" that contain options that
cannot be generated from SQL, so there's no way to reverse-list them
into working SQL. But often those hidden options are essential to know
what the statement will really do, so I'm not sure ignoring them will be
a workable answer for replication purposes.

regards, tom lane


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Event Triggers: adding information
Date: 2013-01-18 18:03:26
Message-ID: 20130118180326.GM29501@alap2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2013-01-18 12:44:13 -0500, Tom Lane wrote:
> Andres Freund <andres(at)2ndquadrant(dot)com> writes:
> > On 2013-01-18 11:42:47 -0500, Robert Haas wrote:
> >> I'm having trouble following this. Can you restate? I wasn't sure
> >> what you meant by libpqdump. I assumed you were speaking of a
> >> parsetree->DDL or catalog->DDL facility.
>
> > Yea, that wasn't really clear, sorry for that.
>
> > What I was trying to say that I think doing parstree->text conversion
> > out of tree is noticeably more complex and essentially places a higher
> > maintenance burden on the project because
>
> Ah. That's not pg_dump's job though, so you're choosing a bad name for
> it.

I really only mentioned libpgdump because the object access hooks at the
moment only get passed the object oid and because Robert doubted the
need for deparsing the parsetrees in the past. Without the parsetree you
obviousl cannot deparse the parsetree ;)

I do *not* think that using something that reassembles the statement
solely based on the catalog context is a good idea.

> What I think you are really talking about is extending the
> deparsing logic in ruleutils.c to cover utility statements. Which is
> not a bad plan in principle; we've just never needed it before. It
> would be quite a lot of new code though.

Yes, I think that is the only realistic approach at providing somewhat
unambigious SQL to replicate DDL.

> An issue that would have to be thought about is that there are assorted
> scenarios where we generate "parse trees" that contain options that
> cannot be generated from SQL, so there's no way to reverse-list them
> into working SQL. But often those hidden options are essential to know
> what the statement will really do, so I'm not sure ignoring them will be
> a workable answer for replication purposes.

Dimitri's earlier patches had support for quite some commands via
deparsing and while its a noticeable amount of code it seemed to work
ok.
The last revision I could dig out is
https://github.com/dimitri/postgres/blob/d2996303c7bc6daa08cef23e3d5e07c3afb11191/src/backend/utils/adt/ddl_rewrite.c

I think some things in there would have to change (e.g. I doubt that its
good that EventTriggerData is involved at that level) but I think it
shows very roughly how it could look.

Greetings,

Andres

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Event Triggers: adding information
Date: 2013-01-18 18:22:55
Message-ID: 20130118182255.GG4063@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andres Freund escribió:
> On 2013-01-18 12:44:13 -0500, Tom Lane wrote:

> > An issue that would have to be thought about is that there are assorted
> > scenarios where we generate "parse trees" that contain options that
> > cannot be generated from SQL, so there's no way to reverse-list them
> > into working SQL. But often those hidden options are essential to know
> > what the statement will really do, so I'm not sure ignoring them will be
> > a workable answer for replication purposes.
>
> Dimitri's earlier patches had support for quite some commands via
> deparsing and while its a noticeable amount of code it seemed to work
> ok.
> The last revision I could dig out is
> https://github.com/dimitri/postgres/blob/d2996303c7bc6daa08cef23e3d5e07c3afb11191/src/backend/utils/adt/ddl_rewrite.c
>
> I think some things in there would have to change (e.g. I doubt that its
> good that EventTriggerData is involved at that level) but I think it
> shows very roughly how it could look.

I looked at that code back then and didn't like it very much. The
problem I see (as Robert does, I think) is that it raises the burden
when you change the grammar -- you now need to edit not only gram.y, but
the ddl_rewrite.c stuff to ensure your new thing can be reverse-parsed.
One idea I had was to add annotations on gram.y so that an external
(Perl?) program could generate the ddl_rewrite.c equivalent, without
forcing the developer to do it by hand.

Another problem is what Tom mentions: there are internal options that
cannot readily be turned back into SQL. We would have to expose these
at the SQL level somehow; but to me that looks painful because we would
be offering users the option to break their systems by calling commands
that do things that should only be done internally.

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Event Triggers: adding information
Date: 2013-01-18 18:26:51
Message-ID: 20130118182651.GO29501@alap2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2013-01-18 15:22:55 -0300, Alvaro Herrera wrote:
> Andres Freund escribió:
> > On 2013-01-18 12:44:13 -0500, Tom Lane wrote:
>
> > > An issue that would have to be thought about is that there are assorted
> > > scenarios where we generate "parse trees" that contain options that
> > > cannot be generated from SQL, so there's no way to reverse-list them
> > > into working SQL. But often those hidden options are essential to know
> > > what the statement will really do, so I'm not sure ignoring them will be
> > > a workable answer for replication purposes.
> >
> > Dimitri's earlier patches had support for quite some commands via
> > deparsing and while its a noticeable amount of code it seemed to work
> > ok.
> > The last revision I could dig out is
> > https://github.com/dimitri/postgres/blob/d2996303c7bc6daa08cef23e3d5e07c3afb11191/src/backend/utils/adt/ddl_rewrite.c
> >
> > I think some things in there would have to change (e.g. I doubt that its
> > good that EventTriggerData is involved at that level) but I think it
> > shows very roughly how it could look.
>
> I looked at that code back then and didn't like it very much. The
> problem I see (as Robert does, I think) is that it raises the burden
> when you change the grammar -- you now need to edit not only gram.y, but
> the ddl_rewrite.c stuff to ensure your new thing can be reverse-parsed.

Yea, but thats nothing new, you already need to do all that for normal
SQL. Changes that to the grammar that don't involve modifying ruleutils.c et
al. are mostly trivial enough that this doesn't really place that much of a
burden.

And I yet to hear any other proposal of how to do this?

> Another problem is what Tom mentions: there are internal options that
> cannot readily be turned back into SQL. We would have to expose these
> at the SQL level somehow; but to me that looks painful because we would
> be offering users the option to break their systems by calling commands
> that do things that should only be done internally.

Hm. Which options are you two thinking of right now?

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Event Triggers: adding information
Date: 2013-01-18 18:59:17
Message-ID: 11782.1358535557@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> writes:
> Andres Freund escribi:
>> Dimitri's earlier patches had support for quite some commands via
>> deparsing and while its a noticeable amount of code it seemed to work
>> ok.
>> The last revision I could dig out is
>> https://github.com/dimitri/postgres/blob/d2996303c7bc6daa08cef23e3d5e07c3afb11191/src/backend/utils/adt/ddl_rewrite.c

> I looked at that code back then and didn't like it very much. The
> problem I see (as Robert does, I think) is that it raises the burden
> when you change the grammar -- you now need to edit not only gram.y, but
> the ddl_rewrite.c stuff to ensure your new thing can be reverse-parsed.

Well, that burden already exists for non-utility statements --- why
should utility statements get a pass? Other than that we tend to invent
new utility syntax freely, which might be a good thing to discourage
anyhow.

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Event Triggers: adding information
Date: 2013-01-18 21:41:31
Message-ID: CA+TgmobaEsXomLihUy4KgTNqz95mf8S3bN3Lcs5oEddPpTkPew@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Jan 18, 2013 at 1:59 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> writes:
>> Andres Freund escribió:
>>> Dimitri's earlier patches had support for quite some commands via
>>> deparsing and while its a noticeable amount of code it seemed to work
>>> ok.
>>> The last revision I could dig out is
>>> https://github.com/dimitri/postgres/blob/d2996303c7bc6daa08cef23e3d5e07c3afb11191/src/backend/utils/adt/ddl_rewrite.c
>
>> I looked at that code back then and didn't like it very much. The
>> problem I see (as Robert does, I think) is that it raises the burden
>> when you change the grammar -- you now need to edit not only gram.y, but
>> the ddl_rewrite.c stuff to ensure your new thing can be reverse-parsed.
>
> Well, that burden already exists for non-utility statements --- why
> should utility statements get a pass? Other than that we tend to invent
> new utility syntax freely, which might be a good thing to discourage
> anyhow.

My concerns are that (1) it will slow down the addition of new
features to PostgreSQL by adding yet another barrier to commit and (2)
it won't be get enough use or regression test coverage to be, or
remain, bug-free.

There may be other concerns with respect to the patch-as-proposed, but
those are my high-level concerns.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Event Triggers: adding information
Date: 2013-01-18 22:12:49
Message-ID: 4373.1358547169@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Fri, Jan 18, 2013 at 1:59 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Well, that burden already exists for non-utility statements --- why
>> should utility statements get a pass? Other than that we tend to invent
>> new utility syntax freely, which might be a good thing to discourage
>> anyhow.

> My concerns are that (1) it will slow down the addition of new
> features to PostgreSQL by adding yet another barrier to commit and (2)
> it won't be get enough use or regression test coverage to be, or
> remain, bug-free.

Meh. The barriers to inventing new statements are already mighty tall.
As for (2), I agree there's risk of bugs, but what alternative have you
got that is likely to be less bug-prone? At least a reverse-list
capability could be tested standalone without having to set up a logical
replication configuration.

This should not be interpreted as saying I'm gung-ho to do this, mind
you. I'm just saying that if our intention is to support logical
replication of all DDL operations, none of the alternatives look cheap.

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Event Triggers: adding information
Date: 2013-01-18 22:20:51
Message-ID: CA+TgmobzEa+=2ufCJ-7SZBaRV_TSGZq2tKz0W4MdGpU=cLH2JQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Jan 18, 2013 at 5:12 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Fri, Jan 18, 2013 at 1:59 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> Well, that burden already exists for non-utility statements --- why
>>> should utility statements get a pass? Other than that we tend to invent
>>> new utility syntax freely, which might be a good thing to discourage
>>> anyhow.
>
>> My concerns are that (1) it will slow down the addition of new
>> features to PostgreSQL by adding yet another barrier to commit and (2)
>> it won't be get enough use or regression test coverage to be, or
>> remain, bug-free.
>
> Meh. The barriers to inventing new statements are already mighty tall.
> As for (2), I agree there's risk of bugs, but what alternative have you
> got that is likely to be less bug-prone? At least a reverse-list
> capability could be tested standalone without having to set up a logical
> replication configuration.
>
> This should not be interpreted as saying I'm gung-ho to do this, mind
> you. I'm just saying that if our intention is to support logical
> replication of all DDL operations, none of the alternatives look cheap.

Well, we agree on that, anyway. :-)

I honestly don't know what to do about this. I think you, Alvaro, and
I all have roughly the same opinion of this, which is that it doesn't
sound fun, but there's probably nothing better. So, what do we do
when a really cool potential feature (logical replication of DDL)
butts up against an expensive future maintenance requirement? I'm not
even sure what the criteria should be for making a decision on whether
it's "worth it".

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Event Triggers: adding information
Date: 2013-01-21 17:27:12
Message-ID: m21udesaz3.fsf@2ndQuadrant.fr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> Ok. Will prepare a non controversial patch for ddl_command_end.
>
> Thanks. I will make a forceful effort to review that in a timely
> fashion when it's posted.

Please find it attached to this email.

COLUMNS=72 git diff --stat
doc/src/sgml/event-trigger.sgml | 93 ++++++-
src/backend/commands/event_trigger.c | 122 +++++++--
src/backend/tcop/utility.c | 273 ++++++++++---------
src/backend/utils/cache/evtcache.c | 2 +
src/include/commands/event_trigger.h | 1 +
src/include/utils/evtcache.h | 3 +-
src/test/regress/expected/event_trigger.out | 6 +-
src/test/regress/sql/event_trigger.sql | 4 +
8 files changed, 359 insertions(+), 145 deletions(-)

What is the next part you want to see separated away?

We have a choice of:

- Context exposing and filtering

My preference for that point is to include it and limit it to event
triggers written in C, leaving the door open to add new events in
the future, and allowing existing consumers of that framework see
through implementation details should they need it.

It's better than just a ProcessUtility_hook because you can see it
in \dy, decide to disable it with ALTER EVENT TRIGGER (e.g. in a
transcation to avoid recursive calling into itself), and it won't
run in single user mode even when in shared_preload_libraries.

Another idea is to protect it with a GUC.

Yet another idea is to add a new property in the CREATE EVENT
TRIGGER command so that user can request explicitely to have that
and know the feature will break at next release and all the ones
after that:

CREATE EVENT TRIGGER foo ON ddl_command_start
WHEN show_internals in ('Yes I know what I''m doing',
'Thanks')
EXECUTE PROCEDURE …();

That particular phrasing and option choice might be revisited before
commit, though, I'm not wedded to it.

- Additional Information to PL/pgSQL

We're talking about :

- operation (CREATE, ALTER, DROP)
- object type (TABLE, FUNCTION, TEXT SEARCH DICTIONARY, …)
- object OID (when there's a single object target)
- object shema name (when there's a single object target)
- object name (when there's a single object target)

- Additional Information to PL/*

It's separated away because you did opt-out from reviewing that
part, so I guess we will continue to make that a separate patch, and
it will still need to happen. I've been said that I shouldn't be on
the hook to provide for that, and I wonder, but anyway I did already
implement it in a previous version so I can do it again as soon as
we know what we expose.

- Command String Normalisation

I removed it already from the main patch because we needed to talk
about it more, and it's kind of a big part in itself. Meanwhile it
seems to me that an agreement has been reached and that I will be
able to follow the consensus and resume working on that part.

- ddl_event_trace

Now is the time to kill it (for 9.3) or allow it in.

It means another cache lookup at DDL Event time (so 2 cache lookups
per DDL in 9.3); and allow users who just want to have the
information not to have to care about when it is available.

- Non DDL related Events

Extra bonus items, a table_rewrite event. Unlikely for 9.3 at best,
unless someone else wants to have at it, maybe?

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

Attachment Content-Type Size
ddl_command_end.v0.patch.gz application/octet-stream 5.2 KB

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Event Triggers: adding information
Date: 2013-01-21 23:07:15
Message-ID: CA+TgmoZPhj6Rw-DottwresCJnOmRewbOxp2AABwyxP2k7vZf-Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Jan 21, 2013 at 12:27 PM, Dimitri Fontaine
<dimitri(at)2ndquadrant(dot)fr> wrote:
> Please find it attached to this email.

Nice clean patch, thanks!

Committed, after tinkering with the CommandCounterIncrement() stuff a bit.

I will respond to the rest of your email later. Reading through this
patch left me with a slight concern regarding both ddl_command_start
and ddl_command_end: what happens if there's more than one event
trigger scheduled to fire, and one of them does something like drop
(with cascade) the function that a later one uses? Admittedly, that
seems like an unlikely case, but we probably want to check that
nothing too awful happens (e.g. crashing the server) and maybe add a
regression test to cover this scenario.

Another thing is that we might want to document that if a command
errors out, ddl_command_end will never be reached; and perhaps also
that if ddl_command_start errors out, the command itself will never be
reached. Perhaps this is so obvious as to not bear mentioning, I
don't know, but the thought crossed my mind that someone might fail to
realize it.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: "Dickson S(dot) Guedes" <listas(at)guedesoft(dot)net>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Event Triggers: adding information
Date: 2013-01-21 23:29:34
Message-ID: CAHHcrep1YjPLFWONsjeCtucBHkMPvmhaFQ_3LaezTWvRE7zJYg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2013/1/21 Robert Haas <robertmhaas(at)gmail(dot)com>:
> Another thing is that we might want to document that if a command
> errors out, ddl_command_end will never be reached; and perhaps also
> that if ddl_command_start errors out, the command itself will never be
> reached. Perhaps this is so obvious as to not bear mentioning, I
> don't know, but the thought crossed my mind that someone might fail to
> realize it.

I think that should be a mention about that in docs, someone could
expect that ddl_command_end be reached even when
ddl_command_start erros out, and try to use it in some way.

Regards,
--
Dickson S. Guedes
mail/xmpp: guedes(at)guedesoft(dot)net - skype: guediz
http://github.com/guedes - http://guedesoft.net
http://www.postgresql.org.br


From: Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Event Triggers: adding information
Date: 2013-01-22 12:02:04
Message-ID: m2622pmnnn.fsf@2ndQuadrant.fr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> Nice clean patch, thanks!
>
> Committed, after tinkering with the CommandCounterIncrement() stuff a bit.

Cool. thanks!

> I will respond to the rest of your email later. Reading through this
> patch left me with a slight concern regarding both ddl_command_start
> and ddl_command_end: what happens if there's more than one event
> trigger scheduled to fire, and one of them does something like drop
> (with cascade) the function that a later one uses? Admittedly, that
> seems like an unlikely case, but we probably want to check that
> nothing too awful happens (e.g. crashing the server) and maybe add a
> regression test to cover this scenario.

I added some in the attached patch.

doc/src/sgml/event-trigger.sgml | 10 ++
src/backend/commands/event_trigger.c | 6 +-
src/test/regress/expected/event_trigger.out | 106 +++++++++++++++++++
src/test/regress/sql/event_trigger.sql | 47 ++++++++
4 files changed, 167 insertions(+), 2 deletions(-)

I had to make quiet a part of the regression tests because of the ERROR
message containing an OID, where I think it's the right error message.

I don't think we want to add complexity to the code to be able to report
the procedure name that we depend on in the event cache and that the
user just deleted while in the middle of actually running the event
trigger list we already got from the cache, right?

"Dickson S. Guedes" <listas(at)guedesoft(dot)net> writes:
> 2013/1/21 Robert Haas <robertmhaas(at)gmail(dot)com>:
>> Another thing is that we might want to document that if a command
>> errors out, ddl_command_end will never be reached; and perhaps also
>> that if ddl_command_start errors out, the command itself will never be
>> reached. Perhaps this is so obvious as to not bear mentioning, I
>> don't know, but the thought crossed my mind that someone might fail to
>> realize it.
>
> I think that should be a mention about that in docs, someone could
> expect that ddl_command_end be reached even when
> ddl_command_start erros out, and try to use it in some way.

Yeah, I share both your views. I've been playing with event triggers for
too long to think about such documentation it seems, thanks for
reminding me about it.

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

Attachment Content-Type Size
event_trigger_harness.v0.patch.gz application/octet-stream 2.0 KB

From: Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Event Triggers: adding information
Date: 2013-01-22 12:29:26
Message-ID: m2vcapl7tl.fsf@2ndQuadrant.fr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr> writes:
> I added some in the attached patch.
>
> doc/src/sgml/event-trigger.sgml | 10 ++
> src/backend/commands/event_trigger.c | 6 +-
> src/test/regress/expected/event_trigger.out | 106 +++++++++++++++++++
> src/test/regress/sql/event_trigger.sql | 47 ++++++++
> 4 files changed, 167 insertions(+), 2 deletions(-)

And I did drop a comment line I didn't mean to when trying things out,
so here's the update copy. There's a bug fix in there too, in both the
versions of the patch, that the new regression tests exercise.

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

Attachment Content-Type Size
event_trigger_harness.v1.patch text/x-patch 10.7 KB

From: Thom Brown <thom(at)linux(dot)com>
To: Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Event Triggers: adding information
Date: 2013-01-22 12:56:02
Message-ID: CAA-aLv6Q4rtYyGYqp3e6qrikG0azxSYPigydBwwMgk=eBjeawQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 22 January 2013 12:02, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr> wrote:
>
> "Dickson S. Guedes" <listas(at)guedesoft(dot)net> writes:
> > 2013/1/21 Robert Haas <robertmhaas(at)gmail(dot)com>:
> >> Another thing is that we might want to document that if a command
> >> errors out, ddl_command_end will never be reached; and perhaps also
> >> that if ddl_command_start errors out, the command itself will never be
> >> reached. Perhaps this is so obvious as to not bear mentioning, I
> >> don't know, but the thought crossed my mind that someone might fail to
> >> realize it.
> >
> > I think that should be a mention about that in docs, someone could
> > expect that ddl_command_end be reached even when
> > ddl_command_start erros out, and try to use it in some way.
>
> Yeah, I share both your views. I've been playing with event triggers for
> too long to think about such documentation it seems, thanks for
> reminding me about it.

I haven't followed the long discussions this patch has undergone, so
forgive me if what I'm about to ask has already been discussed and
discarded.

Would it be desirable to have ddl_command_success and
ddl_command_failed events. These would effectively be subsets to
ddl_command_end. Or should this be something handled by the event
trigger function? This would mean that you could have a command end
trigger that wouldn't necessarily fire, depending on the outcome of
the command.

Alternatively, and more elegantly, there could be a filter_variable of
(...thinks...) ERROR (?) which can have a value of true or false.
(previous paragraph seems like an awful idea now)

CREATE EVENT TRIGGER log_bad_event ON ddl_command_end WHEN ERROR IN (true)
EXECUTE PROCEDURE abort_any_command();

This, unfortunately, introducing awkwardness with the WHEN clause
restriction which doesn't accommodate simple equality. And looking at
the IN part of the syntax, it looks awful: WHEN TAG IN ('DROP
SEQUENCE' AND 'CREATE TABLE').

This looks like a confusing hybrid of a standard WHERE clause and an
IN list, but is neither. You'd expect an IN to evaluate a list of
expressions, but instead it's saying what TAGs are allowed. And this
is where it starts to read like a WHERE clause, except such logic in a
WHERE clause wouldn't make any sense as AND might suggest it must
equal both. Please, please let it not stay like this. :)

So note that I'm coming into this syntax kinda fresh, so just giving
first impressions of the current implementation.

--
Thom


From: Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>
To: Thom Brown <thom(at)linux(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Event Triggers: adding information
Date: 2013-01-22 13:28:54
Message-ID: m21uddjqi1.fsf@2ndQuadrant.fr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Thom Brown <thom(at)linux(dot)com> writes:
> Would it be desirable to have ddl_command_success and
> ddl_command_failed events. These would effectively be subsets to

No, because you can't run any SQL in a failed transaction.

> This, unfortunately, introducing awkwardness with the WHEN clause
> restriction which doesn't accommodate simple equality. And looking at
> the IN part of the syntax, it looks awful: WHEN TAG IN ('DROP
> SEQUENCE' AND 'CREATE TABLE').

The syntax is using a comma, not an "AND", as seen in the tests:

create event trigger regress_event_trigger2 on ddl_command_start
when tag in ('create table', 'CREATE FUNCTION')
execute procedure test_event_trigger();

> So note that I'm coming into this syntax kinda fresh, so just giving
> first impressions of the current implementation.

Thanks for that, I'm all for getting something better at the end!

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


From: Thom Brown <thom(at)linux(dot)com>
To: Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Event Triggers: adding information
Date: 2013-01-22 13:37:21
Message-ID: CAA-aLv4BbPRCNdao_RVa2TNaW8kZxiScTLx5xCuDMnfgPWmXHA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 22 January 2013 13:28, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr> wrote:
> Thom Brown <thom(at)linux(dot)com> writes:
>> Would it be desirable to have ddl_command_success and
>> ddl_command_failed events. These would effectively be subsets to
>
> No, because you can't run any SQL in a failed transaction.

Okay, I had misunderstood something someone wrote previously, and that
makes sense now.

>> This, unfortunately, introducing awkwardness with the WHEN clause
>> restriction which doesn't accommodate simple equality. And looking at
>> the IN part of the syntax, it looks awful: WHEN TAG IN ('DROP
>> SEQUENCE' AND 'CREATE TABLE').
>
> The syntax is using a comma, not an "AND", as seen in the tests:
>
> create event trigger regress_event_trigger2 on ddl_command_start
> when tag in ('create table', 'CREATE FUNCTION')
> execute procedure test_event_trigger();

Ah, in that case, the docs are wrong:
http://www.postgresql.org/docs/devel/static/sql-createeventtrigger.html

--
Thom


From: Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>
To: Thom Brown <thom(at)linux(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Event Triggers: adding information
Date: 2013-01-22 14:45:40
Message-ID: m2ip6pi8dn.fsf@2ndQuadrant.fr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Thom Brown <thom(at)linux(dot)com> writes:
> Ah, in that case, the docs are wrong:
> http://www.postgresql.org/docs/devel/static/sql-createeventtrigger.html

Oh. It's missing the comma and applying the AND at the wrong level,
here's a fix:

diff --git a/doc/src/sgml/ref/create_event_trigger.sgml b/doc/src/sgml/ref/create_event_trigger.sgml
index 040df11..3088ffa 100644
--- a/doc/src/sgml/ref/create_event_trigger.sgml
+++ b/doc/src/sgml/ref/create_event_trigger.sgml
@@ -23,7 +23,7 @@ PostgreSQL documentation
<synopsis>
CREATE EVENT TRIGGER <replaceable class="PARAMETER">name</replaceable>
ON <replaceable class="PARAMETER">event</replaceable>
- [ WHEN <replaceable class="PARAMETER">filter_variable</replaceable> IN (filter_value [ AND ... ] ) ]
+ [ WHEN <replaceable class="PARAMETER">filter_variable</replaceable> IN (filter_value [, ... ]) [ AND ... ] ]
EXECUTE PROCEDURE <replaceable class="PARAMETER">function_name</replaceable>()
</synopsis>
</refsynopsisdiv>

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


From: Thom Brown <thom(at)linux(dot)com>
To: Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Event Triggers: adding information
Date: 2013-01-22 14:47:28
Message-ID: CAA-aLv4QKJvLZvpyWi2ORpxeY01kvwe+W3gy2k6L6rFEvibQ=A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 22 January 2013 14:45, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr> wrote:
> Thom Brown <thom(at)linux(dot)com> writes:
>> Ah, in that case, the docs are wrong:
>> http://www.postgresql.org/docs/devel/static/sql-createeventtrigger.html
>
> Oh. It's missing the comma and applying the AND at the wrong level,
> here's a fix:
>
> diff --git a/doc/src/sgml/ref/create_event_trigger.sgml b/doc/src/sgml/ref/create_event_trigger.sgml
> index 040df11..3088ffa 100644
> --- a/doc/src/sgml/ref/create_event_trigger.sgml
> +++ b/doc/src/sgml/ref/create_event_trigger.sgml
> @@ -23,7 +23,7 @@ PostgreSQL documentation
> <synopsis>
> CREATE EVENT TRIGGER <replaceable class="PARAMETER">name</replaceable>
> ON <replaceable class="PARAMETER">event</replaceable>
> - [ WHEN <replaceable class="PARAMETER">filter_variable</replaceable> IN (filter_value [ AND ... ] ) ]
> + [ WHEN <replaceable class="PARAMETER">filter_variable</replaceable> IN (filter_value [, ... ]) [ AND ... ] ]
> EXECUTE PROCEDURE <replaceable class="PARAMETER">function_name</replaceable>()
> </synopsis>
> </refsynopsisdiv>

Okay, that makes sense now. :)

--
Thom


From: Thom Brown <thom(at)linux(dot)com>
To: Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Event Triggers: adding information
Date: 2013-01-22 16:28:07
Message-ID: CAA-aLv5S+uooLiv6w5CQZ4TzUSL0hFaDeF5OBRBt=8WMx0DTBg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 22 January 2013 14:47, Thom Brown <thom(at)linux(dot)com> wrote:
> On 22 January 2013 14:45, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr> wrote:
>> Thom Brown <thom(at)linux(dot)com> writes:
>>> Ah, in that case, the docs are wrong:
>>> http://www.postgresql.org/docs/devel/static/sql-createeventtrigger.html
>>
>> Oh. It's missing the comma and applying the AND at the wrong level,
>> here's a fix:
>>
>> diff --git a/doc/src/sgml/ref/create_event_trigger.sgml b/doc/src/sgml/ref/create_event_trigger.sgml
>> index 040df11..3088ffa 100644
>> --- a/doc/src/sgml/ref/create_event_trigger.sgml
>> +++ b/doc/src/sgml/ref/create_event_trigger.sgml
>> @@ -23,7 +23,7 @@ PostgreSQL documentation
>> <synopsis>
>> CREATE EVENT TRIGGER <replaceable class="PARAMETER">name</replaceable>
>> ON <replaceable class="PARAMETER">event</replaceable>
>> - [ WHEN <replaceable class="PARAMETER">filter_variable</replaceable> IN (filter_value [ AND ... ] ) ]
>> + [ WHEN <replaceable class="PARAMETER">filter_variable</replaceable> IN (filter_value [, ... ]) [ AND ... ] ]
>> EXECUTE PROCEDURE <replaceable class="PARAMETER">function_name</replaceable>()
>> </synopsis>
>> </refsynopsisdiv>
>
> Okay, that makes sense now. :)

ALTER INDEX is missing from the event trigger matrix.

When renaming a column on a foreign table, tg_tag reports 'ALTER
TABLE' instead of 'ALTER FOREIGN TABLE'. It doesn't do this for any
other ALTER FOREIGN TABLE operation, including altering, adding or
dropping a column.

DROP OWNED BY doesn't fire any type of event trigger. Is this right?

And also I can't get ddl_command_end to do anything:

test=# CREATE OR REPLACE FUNCTION public.cmd_trg_info()
test-# RETURNS event_trigger
test-# LANGUAGE plpgsql
test-# AS $function$
test$# BEGIN
test$# RAISE NOTICE E'Command trigger: tg_event=\'%\' tg_tag=\'%\'',
tg_event, tg_tag;
test$# END; $function$;
CREATE FUNCTION
test=# CREATE EVENT TRIGGER cmd_trg_before_create_table ON
ddl_command_end WHEN TAG IN ('CREATE TABLE') EXECUTE PROCEDURE
cmd_trg_info();
CREATE EVENT TRIGGER
test=# CREATE TABLE test4 AS SELECT 1::int id, ''::text test;
SELECT 1
test=#

Tried this with every possible command and the corresponding trigger
never fires. Was this actually tested or am I missing something? :S

Also, I'm assuming the 'ANY COMMAND' special tag is supposed to be removed?

postgres=# CREATE EVENT TRIGGER cmd_trg_before_any_command ON
ddl_command_start WHEN TAG IN ('ANY COMMAND') EXECUTE PROCEDURE
cmd_trg_info_any();
ERROR: filter value "ANY COMMAND" not recognized for filter variable "tag"

... but it lets me use it for ddl_command_end triggers:

postgres=# CREATE EVENT TRIGGER cmd_trg_after_any_command ON
ddl_command_end WHEN TAG IN ('ANY COMMAND') EXECUTE PROCEDURE
cmd_trg_info();
CREATE EVENT TRIGGER

This doesn't actually fire anyway due to the issue I mentioned above
where ddl_command_end doesn't appear to be functional.

However, I *can* get ddl_command_end to run on any command by omitting
the WHEN clause, but this is the only scenario where I can get a
trigger firing when using ddl_command_end:

postgres=# CREATE EVENT TRIGGER cmd_trg_after_any_command_all ON
ddl_command_end EXECUTE PROCEDURE cmd_trg_info();
CREATE EVENT TRIGGER
postgres=# CREATE TABLE moooo2(id serial);
NOTICE: Command trigger: tg_event='ddl_command_end' tg_tag='CREATE TABLE'
CREATE TABLE

--
Thom


From: Thom Brown <thom(at)linux(dot)com>
To: Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Event Triggers: adding information
Date: 2013-01-22 16:38:37
Message-ID: CAA-aLv647ziy1n3egwt5kbSn176GN5C4yvpnTfoj16Yv3SiO3A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 22 January 2013 16:28, Thom Brown <thom(at)linux(dot)com> wrote:
> On 22 January 2013 14:47, Thom Brown <thom(at)linux(dot)com> wrote:
>
> postgres=# CREATE EVENT TRIGGER cmd_trg_after_any_command ON
> ddl_command_end WHEN TAG IN ('ANY COMMAND') EXECUTE PROCEDURE
> cmd_trg_info();
> CREATE EVENT TRIGGER

I can see why this works now. The ddl_command_end option stops the
TAG being evaluated:

test2=# CREATE EVENT TRIGGER cmd_trg_before_any_command ON
ddl_command_end WHEN TAG IN ('MEOW LIKE A CAT') EXECUTE PROCEDURE
cmd_trg_info_any();
CREATE EVENT TRIGGER

...unless I've coincidentally stumbled upon the new MEOW LIKE A CAT
command coming in 9.3. ;)

--
Thom


From: Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>
To: Thom Brown <thom(at)linux(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Event Triggers: adding information
Date: 2013-01-22 16:53:07
Message-ID: m2r4ldf9cc.fsf@2ndQuadrant.fr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Thom Brown <thom(at)linux(dot)com> writes:
> ALTER INDEX is missing from the event trigger matrix.
>
> When renaming a column on a foreign table, tg_tag reports 'ALTER
> TABLE' instead of 'ALTER FOREIGN TABLE'. It doesn't do this for any
> other ALTER FOREIGN TABLE operation, including altering, adding or
> dropping a column.

Will see about that later, thanks for reporting.

> DROP OWNED BY doesn't fire any type of event trigger. Is this right?

It's currently being discussed.

> And also I can't get ddl_command_end to do anything:

Due to a bug in the form of a stray & in the patch that got commited,
and that I fixed this morning:

http://www.postgresql.org/message-id/m2vcapl7tl.fsf@2ndQuadrant.fr

> Also, I'm assuming the 'ANY COMMAND' special tag is supposed to be
> removed?

Yes, just omit the WHEN clause for the TAG.

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


From: Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>
To: Thom Brown <thom(at)linux(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Event Triggers: adding information
Date: 2013-01-22 16:54:43
Message-ID: m2ip6pf99o.fsf@2ndQuadrant.fr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Thom Brown <thom(at)linux(dot)com> writes:
> test2=# CREATE EVENT TRIGGER cmd_trg_before_any_command ON
> ddl_command_end WHEN TAG IN ('MEOW LIKE A CAT') EXECUTE PROCEDURE
> cmd_trg_info_any();
> CREATE EVENT TRIGGER
>
> ...unless I've coincidentally stumbled upon the new MEOW LIKE A CAT
> command coming in 9.3. ;)

Oe Noe, my carefully planted Easter Egg!

Seriously it's covered in the regression tests and shouldn't work, I'm
surprised here. Will see about that tomorrow with the other findings of
yours, thanks again.

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Event Triggers: adding information
Date: 2013-01-22 21:15:00
Message-ID: 20130122211459.GA27577@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2013-01-18 17:20:51 -0500, Robert Haas wrote:
> On Fri, Jan 18, 2013 at 5:12 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> >> On Fri, Jan 18, 2013 at 1:59 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >>> Well, that burden already exists for non-utility statements --- why
> >>> should utility statements get a pass? Other than that we tend to invent
> >>> new utility syntax freely, which might be a good thing to discourage
> >>> anyhow.
> >
> >> My concerns are that (1) it will slow down the addition of new
> >> features to PostgreSQL by adding yet another barrier to commit and (2)
> >> it won't be get enough use or regression test coverage to be, or
> >> remain, bug-free.
> >
> > Meh. The barriers to inventing new statements are already mighty tall.
> > As for (2), I agree there's risk of bugs, but what alternative have you
> > got that is likely to be less bug-prone? At least a reverse-list
> > capability could be tested standalone without having to set up a logical
> > replication configuration.
> >
> > This should not be interpreted as saying I'm gung-ho to do this, mind
> > you. I'm just saying that if our intention is to support logical
> > replication of all DDL operations, none of the alternatives look cheap.
>
> Well, we agree on that, anyway. :-)
>
> I honestly don't know what to do about this. I think you, Alvaro, and
> I all have roughly the same opinion of this, which is that it doesn't
> sound fun, but there's probably nothing better.

Count me in.

> So, what do we do
> when a really cool potential feature (logical replication of DDL)
> butts up against an expensive future maintenance requirement?

I just tried to estimate the effort of that by looking at:
git log REL9_1_0..REL9_2_0 src/backend/parser/gram.y src/include/nodes/
which seems like it would catch most such patches.

First thing I have to say is there are more patches potentially
affecting such a potential feature than I previously had thought.
On the other hand, just about all of those patches are so big that the
amount of changes to the ruleutils.c equivalent for DDL look like a
minor pain in comparison to the rest.

The patches that seemed relevant during 9.1=>9.2 seem to be to be the
following when counting ones that directly would add new deparsing code:
* NOT VALID
* SECURITY LABEL
* per column fdw options
* range datatypes
* security barrier
* LEAKPROOF
* privileges on types
* ALTER FDW
* rename domain constraints

Then there were a bunch of patches adding new features that required
restructuring of existing code that would have affected possible
deparsing code:
* plancache restructuring
* parametrized indexscans
* DROP ... consolidation

In all of those but "ALTER FDW" and "rename domain constraints" possible
deparsing code looks like it would be an absolutely minor part of the
code. And in those its only a relevant part because they are really
small.

There were some bugfixes that would have required touching it as well,
but those seemed to be cases that are basically s/a/b/.

Not sure what the consequence of that is, but I wanted to share it
anyway after spending the time ;).

> I'm not
> even sure what the criteria should be for making a decision on whether
> it's "worth it".

In my experience it is the weakest point in the non-WAL based postgres
replication tools. So I personally think it is worth some overhead when
adding new features.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Event Triggers: adding information
Date: 2013-01-23 02:55:14
Message-ID: CA+TgmoYgpub59RrQ+n=qoejmADhj-kppCfDgQjWP+rR+8m+39Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Jan 22, 2013 at 7:29 AM, Dimitri Fontaine
<dimitri(at)2ndquadrant(dot)fr> wrote:
> Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr> writes:
>> I added some in the attached patch.
>>
>> doc/src/sgml/event-trigger.sgml | 10 ++
>> src/backend/commands/event_trigger.c | 6 +-
>> src/test/regress/expected/event_trigger.out | 106 +++++++++++++++++++
>> src/test/regress/sql/event_trigger.sql | 47 ++++++++
>> 4 files changed, 167 insertions(+), 2 deletions(-)
>
> And I did drop a comment line I didn't mean to when trying things out,
> so here's the update copy. There's a bug fix in there too, in both the
> versions of the patch, that the new regression tests exercise.

I think these new regression tests are no good, because I doubt that
the number of recursive calls that can fit into any given amount of
stack space is guaranteed to be the same on all platforms. I have
committed the bug fixes themselves, however.

I wasn't entirely happy with your proposed documentation so I'm
attaching a counter-proposal. My specific complaints are (1) telling
people that event triggers are run in a savepoint seems a little too
abstract; I have attempted to make the consequences more concrete; (2)
RAISE EXCEPTION is PL/pgsql specific and not the only possible reason
for an error; I have attempted to be more generic; and (3) in the
process of fiddling with this, I noticed that the ddl_command_end
documentation can, I believe, be made both shorter and more clear by
turning it into a rider on the previous paragraph.

Comments?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment Content-Type Size
event-trigger-docs.patch application/octet-stream 1.5 KB

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Event Triggers: adding information
Date: 2013-01-23 03:01:43
Message-ID: CA+Tgmobn7g04hpybaHoDFGzeqE_nfRwYgHWk_5RoA+k2frG1pg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Jan 22, 2013 at 7:02 AM, Dimitri Fontaine
<dimitri(at)2ndquadrant(dot)fr> wrote:
> I had to make quiet a part of the regression tests because of the ERROR
> message containing an OID, where I think it's the right error message.

I have a feeling that's the sort of error message that we don't want
people to get. Can we arrange for something cleaner? Not sure
exactly what. Maybe something like this would do it:

ERROR: queued event trigger function not found

Or maybe we should just silently ignore failures to look up the event
trigger. That might be better, because the DBA could always do:

DROP FUNCTION myeventtrgfn() CASCADE;

...and it would be undesirable for other sessions to error out in that
case due to SnapshotNow effects.

> I don't think we want to add complexity to the code to be able to report
> the procedure name that we depend on in the event cache and that the
> user just deleted while in the middle of actually running the event
> trigger list we already got from the cache, right?

Yeah ... that seems like more code than the situation justifies.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Thom Brown <thom(at)linux(dot)com>
Cc: Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Event Triggers: adding information
Date: 2013-01-23 03:02:42
Message-ID: CA+TgmoaaLyKD_jnStBffe_5VCUJYhHjvhPD+21YoCKwO7PumVQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Jan 22, 2013 at 11:38 AM, Thom Brown <thom(at)linux(dot)com> wrote:
> ...unless I've coincidentally stumbled upon the new MEOW LIKE A CAT
> command coming in 9.3. ;)

+1 for adding that. I've been wanting it for years.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Gavin Flower <GavinFlower(at)archidevsys(dot)co(dot)nz>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Thom Brown <thom(at)linux(dot)com>, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Event Triggers: adding information
Date: 2013-01-23 03:23:48
Message-ID: 50FF57C4.8010409@archidevsys.co.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 23/01/13 16:02, Robert Haas wrote:
> On Tue, Jan 22, 2013 at 11:38 AM, Thom Brown <thom(at)linux(dot)com> wrote:
>> ...unless I've coincidentally stumbled upon the new MEOW LIKE A CAT
>> command coming in 9.3. ;)
> +1 for adding that. I've been wanting it for years.
>
Can we have it back ported to 9.2, please Please, PLEASE!!!

(Or perhaps, it is scheduled for the day after Thursday Feb 28th?)

It is obviously far more important than any other change to pg! :-)

Cheers,
Gavin


From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Gavin Flower <GavinFlower(at)archidevsys(dot)co(dot)nz>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Event Triggers: adding information
Date: 2013-01-23 03:37:57
Message-ID: CAB7nPqQAttVj1WiKJ=hu4JHuaCvQ0sNWy7XwfG83F-f49ccBNQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Jan 23, 2013 at 12:23 PM, Gavin Flower <
GavinFlower(at)archidevsys(dot)co(dot)nz> wrote:

> On 23/01/13 16:02, Robert Haas wrote:
>
> On Tue, Jan 22, 2013 at 11:38 AM, Thom Brown <thom(at)linux(dot)com> <thom(at)linux(dot)com> wrote:
>
> ...unless I've coincidentally stumbled upon the new MEOW LIKE A CAT
> command coming in 9.3. ;)
>
> +1 for adding that. I've been wanting it for years.
>
>
> Can we have it back ported to 9.2, please Please, PLEASE!!!
>
New features are not back ported to maintenance branch ;)
But... It might be possible to do an exception?
--
Michael Paquier
http://michael.otacoo.com


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Event Triggers: adding information
Date: 2013-01-23 04:17:27
Message-ID: CA+TgmoYimdGCiKFbKFbcZ_pJ5jT+sryxN40p24i8efq8j6BNgA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Jan 21, 2013 at 12:27 PM, Dimitri Fontaine
<dimitri(at)2ndquadrant(dot)fr> wrote:
> - Context exposing and filtering
>
> My preference for that point is to include it and limit it to event
> triggers written in C, leaving the door open to add new events in
> the future, and allowing existing consumers of that framework see
> through implementation details should they need it.
>
> It's better than just a ProcessUtility_hook because you can see it
> in \dy, decide to disable it with ALTER EVENT TRIGGER (e.g. in a
> transcation to avoid recursive calling into itself), and it won't
> run in single user mode even when in shared_preload_libraries.
>
> Another idea is to protect it with a GUC.
>
> Yet another idea is to add a new property in the CREATE EVENT
> TRIGGER command so that user can request explicitely to have that
> and know the feature will break at next release and all the ones
> after that:
>
> CREATE EVENT TRIGGER foo ON ddl_command_start
> WHEN show_internals in ('Yes I know what I''m doing',
> 'Thanks')
> EXECUTE PROCEDURE …();
>
> That particular phrasing and option choice might be revisited before
> commit, though, I'm not wedded to it.

I'm not feeling very sanguine about any of this. I feel strongly that
we should try to do something that's more principled than just
throwing random junk into ProcessUtility.

> - Additional Information to PL/pgSQL
>
> We're talking about :
>
> - operation (CREATE, ALTER, DROP)
> - object type (TABLE, FUNCTION, TEXT SEARCH DICTIONARY, …)
> - object OID (when there's a single object target)
> - object shema name (when there's a single object target)
> - object name (when there's a single object target)

In ddl_command_start, I believe you can only expose the information
that is available in the parse tree. That is, if the user typed
CREATE TABLE foo, you can tell the event trigger that the user didn't
specify a schema and that they mentioned the name foo. You cannot
predict what schema foo will actually end up in at that point, and we
shouldn't try. I don't have a problem with exposing the information
we have at this point with the documented caveat that if you rely on
it to mean something it doesn't you get to keep both pieces.

But I wonder: wouldn't it be better to just expose the raw string the
user typed? I mean, in all the cases we really care about, the
information we can *reliably* expose at this point is going to be
pretty nearly the same as extracting the third space-delimited word
out of the command text and splitting it on a period. And you can do
that much easily even in PL/pgsql. You could also extract a whole lot
of OTHER potentially useful information from the command text that you
couldn't get if we just exposed the given schema name and object name.
For example, this would allow you to write an event trigger that lets
you enforce a policy that all column names must use Hungarian
notation. Note that this is a terrible idea and you shouldn't do it,
but you could if you wanted to. And, the trigger would be relatively
long and complex and you might have bugs, but that's still better than
the status quo where you are simply out of luck.

Now, in a ddl_command_end trigger, there's a lot more information that
you can usefully expose. In theory, if the command records "what it
did" somewhere, you can extract that information with as much
precision as it cared to record. However, I'm pretty skeptical of the
idea of exposing a single OID. I mean, if the "main" use case here is
replication, there are holes big enough to drive a truck through
there. It's not really better for other use cases, either: if you're
auditing or logging or whatever, you can drive the same truck through
the same hole. It seems to me that to get anything really useful
here, you need some more complex way of passing data, like oh say an
array of OIDs. Really, you're going to need something quite a bit
more complicated than that to describe something like ALTER TABLE, but
even for pretty simple cases a single OID doesn't cut it.

> - Additional Information to PL/*
>
> It's separated away because you did opt-out from reviewing that
> part, so I guess we will continue to make that a separate patch, and
> it will still need to happen. I've been said that I shouldn't be on
> the hook to provide for that, and I wonder, but anyway I did already
> implement it in a previous version so I can do it again as soon as
> we know what we expose.

I'm not sure what you mean that you shouldn't be on the hook to
provide that. I don't think that you're obliged to submit patches to
make this work for other PLs - indeed, I think it's too late for that
for 9.3 - but I don't think anyone else is required to do it, either.
It doesn't seem like it would be too hard to get the other PLs up to
the same level that PL/pgsql is at; I just didn't want to try to
familiarize myself with PL/perl, PL/python, and PL/tcl, none of which
I know the guts of and most of which I don't compile regularly, at the
same time I was trying to grok everything else in the patch.

> - Command String Normalisation
>
> I removed it already from the main patch because we needed to talk
> about it more, and it's kind of a big part in itself. Meanwhile it
> seems to me that an agreement has been reached and that I will be
> able to follow the consensus and resume working on that part.

I'm not sure we've reached a consensus here (what is it?), but let me
run an idea or two up the flagpole and see if anyone salutes. It
seems to me that there's some consensus that reverse-parsing may be a
reasonable thing to include in some form, but there are a lot of
unsolved problems there, including (1) that the parse tree might not
contain enough information for the reverse-parser to describe what the
command actually did, as opposed to what the user asked for, and (2)
there may be important decisions which were made at execution time and
which are needed for correct replication but which actually can't be
represented in the syntax at all (I am not sure what these would be:
but Tom seemed to think that such cases exist, so they need to be
understood and thought about).

If we're going to do reverse parsing, I would argue that we need to
put some effort into making that more flexible than the designs I've
seen so far. For example, I believe your concept of command string
normalization includes the idea that we'd fully schema-qualify the
object name. Like... instead of saying CREATE TABLE foo (a text),
we'd say CREATE TABLE public.foo (a text). That would guarantee that
when we replay the command on another machine, it ends up in the same
schema. But who is to say that's what the user wants? You can
imagine, for example, that a user might want to replicate schema
public on node A into schema node_a on node B. If we're back to
parsing the SQL at that point we are still ahead - but not very far
ahead. And, of course, the proper canonicalization is really CREATE
TABLE public.foo (a pg_catalog.text), because there could be another
type called text in some other schema on node B. That may seem like a
borderline case when the name of the type is "text" but if you
substitute "bar" for "text" it suddenly starts to seem totally
plausible that there could be more than one bar. Of course if you
substitute "hstore" for "text" but hstore is in a different schema in
the source and target databases then you've got a problem if you DO
schema-qualify things; and if the goal is just to pretty-print things
for the user's benefit, all of this may be overkill.

I recognize that it's probably impossible or unrealistic to get an
exactly-perfect solution to all of these problems, but I'm working my
way around to a proposal. Suppose we added a type pg_parse_tree, and
a bunch of accessor functions that operate on it. You could have
things like get_operation(pg_parse_tree) and
get_object_type(pg_parse_tree) and
get_array_of_object_names_and_schemas(pg_parse_tree), which would
largely obviate the need to pass 27 magic variables for all of the
things somebody might want - anything we decide we want to expose just
becomes another accessor function. Deparsing is also a function,
which means we don't incur the overhead of doing it unless the user
asks for it. And you might even be able to have mutators, so that you
can for example take a pg_parse_tree, make a new one by changing the
schema name that's mentioned, and then deparse that. I'm sure Tom is
scared now, if he's reading this, but maybe it's got some potential.

I certainly don't think this solves every problem. In particular, we
run the risk of exposing every internal detail of the parse tree,
something Tom and I have both vigorously resisted in every discussion
of event triggers since time immemorial. On the other hand, the
discussion of this topic suggests that the main thing that people want
out of event triggers is precisely to be able to get at information
from the parse tree, so we're kind of damned if we do and damned if we
don't. At least something like this provides a reasonable abstraction
boundary that allows the information to be exposed in a controlled
way. For example, if we add a function that tells you what object
name the user typed, it doesn't need to expose the fact that the parse
tree names that field differently in different DDL constructs, which
certainly increases our odds of not going insane after a release or
two of maintaining this code.

The other question is what to do about the information that's gathered
as the command is running that you afterwards wish to provide to the
event trigger - for example, for CREATE TABLE foo (), you want to
reveal the actually-selected creation schema. One idea is to have DDL
commands annotate the parse tree with these sorts of details, but the
disadvantage of this is that it might not be what everybody wants for
all purposes; and you still have the question of what to do with
information that can't be effectively represented in the parse tree at
all. I'm inclined to think a better approach might be to pass around
some kind of "container" data structure to which the DDL command can
glue in whatever bits of information it wants to include, and then
dump the result out as a 2-D text array or a JSON object or something.
I'm waving my hands at this problem a bit because of course one
important use case is to be able to do a reverse-parse that factors
this information into account, but we could do that, I think, by
allowing this information to be passed to the deparse API along with
the pg_parse_tree object. I'm waving my hands here, but maybe not any
harder than any other design thus far proposed, and a fuller
explanation would make this interminably long email even longer, so
I'll stop here for now.

> - ddl_event_trace
>
> Now is the time to kill it (for 9.3) or allow it in.
>
> It means another cache lookup at DDL Event time (so 2 cache lookups
> per DDL in 9.3); and allow users who just want to have the
> information not to have to care about when it is available.

Unless we get some of the other stuff this doesn't have much value
anyway, and depending on how other parts of the design shake out this
might seem less relevant by the time we get done.

> - Non DDL related Events
>
> Extra bonus items, a table_rewrite event. Unlikely for 9.3 at best,
> unless someone else wants to have at it, maybe?

I'd like to take a crack at adding support for some kind of really
cool event for 9.4 - I suspect this one is not what I'd pick as a
first target, just because it seems hard. But I'll mull it over.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Event Triggers: adding information
Date: 2013-01-23 09:57:52
Message-ID: m2zk00cjbz.fsf@2ndQuadrant.fr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> I think these new regression tests are no good, because I doubt that
> the number of recursive calls that can fit into any given amount of
> stack space is guaranteed to be the same on all platforms. I have
> committed the bug fixes themselves, however.

Thanks for commiting the fixes. About the regression tests, I think
you're right, but then I can't see how to include such a test. Maybe you
could add the other one, though?

> I wasn't entirely happy with your proposed documentation so I'm
> attaching a counter-proposal. My specific complaints are (1) telling
> people that event triggers are run in a savepoint seems a little too
> abstract; I have attempted to make the consequences more concrete; (2)
> RAISE EXCEPTION is PL/pgsql specific and not the only possible reason
> for an error; I have attempted to be more generic; and (3) in the
> process of fiddling with this, I noticed that the ddl_command_end
> documentation can, I believe, be made both shorter and more clear by
> turning it into a rider on the previous paragraph.
>
> Comments?

+1 for this version, thanks.

--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Event Triggers: adding information
Date: 2013-01-23 14:33:58
Message-ID: CA+TgmobaY1gsMW1_cbKKoYA1j5vd8Yx_eWN98-4apAMzj_tZcQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Jan 23, 2013 at 4:57 AM, Dimitri Fontaine
<dimitri(at)2ndquadrant(dot)fr> wrote:
> Thanks for commiting the fixes. About the regression tests, I think
> you're right, but then I can't see how to include such a test. Maybe you
> could add the other one, though?

Can you point me specifically at what you have in mind so I can make
sure I'm considering the right thing?

> +1 for this version, thanks.

OK, committed that also.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Event Triggers: adding information
Date: 2013-01-23 19:36:18
Message-ID: 20130123193618.GA29229@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Jan 23, 2013 at 09:33:58AM -0500, Robert Haas wrote:
> On Wed, Jan 23, 2013 at 4:57 AM, Dimitri Fontaine
> <dimitri(at)2ndquadrant(dot)fr> wrote:
> > Thanks for commiting the fixes. About the regression tests, I think
> > you're right, but then I can't see how to include such a test. Maybe you
> > could add the other one, though?
>
> Can you point me specifically at what you have in mind so I can make
> sure I'm considering the right thing?
>
> > +1 for this version, thanks.
>
> OK, committed that also.

Also, I assume we no longer want after triggers on system tables, so I
removed that from the TODO list and added event triggers as a completed
item.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Event Triggers: adding information
Date: 2013-01-23 20:02:24
Message-ID: CA+Tgmobzi9pDqWAtx_QfVq=JnP28kxW9camcuv3SGY=79D4nSQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Jan 23, 2013 at 2:36 PM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> On Wed, Jan 23, 2013 at 09:33:58AM -0500, Robert Haas wrote:
>> On Wed, Jan 23, 2013 at 4:57 AM, Dimitri Fontaine
>> <dimitri(at)2ndquadrant(dot)fr> wrote:
>> > Thanks for commiting the fixes. About the regression tests, I think
>> > you're right, but then I can't see how to include such a test. Maybe you
>> > could add the other one, though?
>>
>> Can you point me specifically at what you have in mind so I can make
>> sure I'm considering the right thing?
>>
>> > +1 for this version, thanks.
>>
>> OK, committed that also.
>
> Also, I assume we no longer want after triggers on system tables, so I
> removed that from the TODO list and added event triggers as a completed
> item.

Seems reasonable. Event triggers are not completed in the sense that
there is a lot more stuff we can do with this architecture, but we've
got a basic implementation now and that's progress. And they do
address the use case that triggers on system tables would have
targeted, I think, but better.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Event Triggers: adding information
Date: 2013-01-23 20:04:43
Message-ID: 20130123200443.GI23670@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Jan 23, 2013 at 03:02:24PM -0500, Robert Haas wrote:
> On Wed, Jan 23, 2013 at 2:36 PM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> > On Wed, Jan 23, 2013 at 09:33:58AM -0500, Robert Haas wrote:
> >> On Wed, Jan 23, 2013 at 4:57 AM, Dimitri Fontaine
> >> <dimitri(at)2ndquadrant(dot)fr> wrote:
> >> > Thanks for commiting the fixes. About the regression tests, I think
> >> > you're right, but then I can't see how to include such a test. Maybe you
> >> > could add the other one, though?
> >>
> >> Can you point me specifically at what you have in mind so I can make
> >> sure I'm considering the right thing?
> >>
> >> > +1 for this version, thanks.
> >>
> >> OK, committed that also.
> >
> > Also, I assume we no longer want after triggers on system tables, so I
> > removed that from the TODO list and added event triggers as a completed
> > item.
>
> Seems reasonable. Event triggers are not completed in the sense that
> there is a lot more stuff we can do with this architecture, but we've
> got a basic implementation now and that's progress. And they do
> address the use case that triggers on system tables would have
> targeted, I think, but better.

Right. Users would always be chasing implementation details if they
tried to trigger on system tables.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +


From: Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Event Triggers: adding information
Date: 2013-01-24 09:35:41
Message-ID: m2a9ryc49e.fsf@2ndQuadrant.fr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> Or maybe we should just silently ignore failures to look up the event
> trigger. That might be better, because the DBA could always do:
>
> DROP FUNCTION myeventtrgfn() CASCADE;
>
> ...and it would be undesirable for other sessions to error out in that
> case due to SnapshotNow effects.

What about taking a lock on the functions we decide we will need to be
running, maybe a ShareUpdateExclusiveLock, so that the function can not
disappear under us from concurrent activity?

Note to self, most probably using:
LockRelationOid(fnoid, ShareUpdateExclusiveLock);

After all, we might be right not to optimize for DDL concurrency…

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


From: Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Steve Singer <ssinger(at)ca(dot)afilias(dot)info>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Event Triggers: adding information
Date: 2013-01-24 10:43:48
Message-ID: m2k3r27tej.fsf@2ndQuadrant.fr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Mon, Jan 21, 2013 at 12:27 PM, Dimitri Fontaine
> <dimitri(at)2ndquadrant(dot)fr> wrote:
>> - Context exposing and filtering
>
> I'm not feeling very sanguine about any of this. I feel strongly that
> we should try to do something that's more principled than just
> throwing random junk into ProcessUtility.

The goal is to allow for advanced users of that feature to get at the
sequence, constraints and index names that have been picked
automatically by PostgreSQL when doing some CREATE TABLE statement that
involves them:

CREATE TABLE foo (id serial PRIMARY KEY, f1 text);

Andres and Steve, as you're the next ones to benefit directly from that
whole feature and at different levels, do you have a strong opinion on
whether you really need that to operate at the logical level?

The trick is not implementing the feature (that's quite easy really),
it's about being able to provide it without any API promises, because
it's an opening on the internals.

Maybe the answer is to provide the same level of information as you get
in an Event Trigger to new hooks.

>> - Additional Information to PL/pgSQL
>>
>> We're talking about :
>>
>> - operation (CREATE, ALTER, DROP)
>> - object type (TABLE, FUNCTION, TEXT SEARCH DICTIONARY, …)
>> - object OID (when there's a single object target)
>> - object shema name (when there's a single object target)
>> - object name (when there's a single object target)

I realise I omited something important here. The current patch effort is
only adding those information if the target object exists in the
catalogs at the time of the information lookup.

> In ddl_command_start, I believe you can only expose the information
> that is available in the parse tree. That is, if the user typed
> CREATE TABLE foo, you can tell the event trigger that the user didn't
> specify a schema and that they mentioned the name foo. You cannot

I used to do that, and that's the reason why I had the Command String
Normalisation in the same patch: both are walking the parsetree to some
extend, and I didn't want to have different routines doing different
walks to get at the information.

> predict what schema foo will actually end up in at that point, and we
> shouldn't try. I don't have a problem with exposing the information
> we have at this point with the documented caveat that if you rely on
> it to mean something it doesn't you get to keep both pieces.

Fair enough. I retracted from doing that to be able to separate away the
parse tree walking code for later review, and I'm now wondering if
that's really what we want to do.

If auditing, you can log the information after the command has been run
(in case of CREATE and ALTER command) or before the object has been
deleted (in case of a DROP command).

If you want to cancel the whole operation depending on the name of the
object, or the schema it ends up in, then you can do it at the end of
the command's execution with a RAISE EXCEPTION.

> But I wonder: wouldn't it be better to just expose the raw string the
> user typed? I mean, in all the cases we really care about, the
> information we can *reliably* expose at this point is going to be
> pretty nearly the same as extracting the third space-delimited word
> out of the command text and splitting it on a period. And you can do
> that much easily even in PL/pgsql. You could also extract a whole lot

Totally Agreed. That's another reason why I want to provide users with
the Normalized Command String, it will be then be even easier for them
to do what you just say.

After having been working on that for some time, though, I'm leaning
toward Tom's view that you shouldn't try to expose information about
objets that don't exists in the catalogs, because "that's not code,
that's magic".

> of OTHER potentially useful information from the command text that you
> couldn't get if we just exposed the given schema name and object name.
> For example, this would allow you to write an event trigger that lets
> you enforce a policy that all column names must use Hungarian
> notation. Note that this is a terrible idea and you shouldn't do it,
> but you could if you wanted to. And, the trigger would be relatively
> long and complex and you might have bugs, but that's still better than
> the status quo where you are simply out of luck.

You seem to be talking about System Hungarian, not Apps Hungarian, but I
don't think that's the point here.

http://www.joelonsoftware.com/articles/Wrong.html

> Now, in a ddl_command_end trigger, there's a lot more information that
> you can usefully expose. In theory, if the command records "what it
> did" somewhere, you can extract that information with as much
> precision as it cared to record. However, I'm pretty skeptical of the
> idea of exposing a single OID. I mean, if the "main" use case here is

There's precedent in PostgreSQL: how do you get information about each
row that were in the target from a FOR EACH STATEMENT trigger?

> array of OIDs. Really, you're going to need something quite a bit
> more complicated than that to describe something like ALTER TABLE, but
> even for pretty simple cases a single OID doesn't cut it.

If you want to solve that problem, let's talk about pseudo relations
that you can SELECT FROM, please. We can already expose a tuplestore in
PL code by means of cursors and SRF, but I'm not sure that's how we want
to expose the "statement level information" here.

The only commands that will target more than one object are:

- DROP, either in the command or by means of CASCADE;
- CREATE SCHEMA with its PROCESS_UTILITY_SUBCOMMAND usage;

At that, not all of the DropStmt variants allow more than one object,
only those are concerned:

TABLE, SEQUENCE, VIEW, INDEX, FOREIGN TABLE, EVENT TRIGGER,
TYPE, DOMAIN, COLLATION, CONVERSION, SCHEMA, EXTENSION,
TEXT SEARCH PARSER, TEXT SEARCH DICTIONARY, TEXT SEARCH
TEMPLATE, TEXT SEARCH CONFIGURATION

So my proposal was to address that by exposing non-TOPLEVEL commands to
the Event Triggers users, so that the create schema targets are to be
found in their respective subcommand. As far as DROP are concerned, my
thinking was to process a DROP targetting several objects at once like
as many DROP commands as objects given in the command line, producing
parse tree nodes to feed as SUBCOMMANDs again.

The CASCADE case is something else entirely, and we need to be very
careful about its locking behavior. Also, in the CASCADE case I can
perfectly agree that we're not talking about a ddl_something event any
more, and that in 9.4 we will be able to provide a sql_drop generic
event that will now about that.

>> - Additional Information to PL/*
>
> I'm not sure what you mean that you shouldn't be on the hook to
> provide that. I don't think that you're obliged to submit patches to
> make this work for other PLs - indeed, I think it's too late for that
> for 9.3 - but I don't think anyone else is required to do it, either.

Mostly agreed. Do we want to ship with only PL/pgSQL support? I've not
been following how those PL features usually reach the other PLs…

> It doesn't seem like it would be too hard to get the other PLs up to
> the same level that PL/pgsql is at; I just didn't want to try to
> familiarize myself with PL/perl, PL/python, and PL/tcl, none of which
> I know the guts of and most of which I don't compile regularly, at the
> same time I was trying to grok everything else in the patch.

Same here. But I felt like I had to do it anyway before to submit the
patch, so the time consuming part of the job is done already.

>> - Command String Normalisation
>
> I'm not sure we've reached a consensus here (what is it?), but let me

The consensus I feel we're reaching now is that it's an important
feature to have and there's no other practical way to have it.

> run an idea or two up the flagpole and see if anyone salutes. It
> seems to me that there's some consensus that reverse-parsing may be a
> reasonable thing to include in some form, but there are a lot of
> unsolved problems there, including (1) that the parse tree might not
> contain enough information for the reverse-parser to describe what the
> command actually did, as opposed to what the user asked for, and (2)

For having actually implemented the most complex of our commands, namely
CREATE TABLE and ALTER TABLE, the trick is to complement the parse tree
with some catalog (cache) lookups to have the exact information about
what the command did, or to tweak the command's implementation to expose
it, as I did for ALTER TABLE.

> there may be important decisions which were made at execution time and
> which are needed for correct replication but which actually can't be
> represented in the syntax at all (I am not sure what these would be:
> but Tom seemed to think that such cases exist, so they need to be
> understood and thought about).

Well, try and get the name of the exclusion constraint created when
running that command, then rewrite it with the name injected in the
command:

CREATE TABLE xyz(i int, exclude (i WITH =) where (i > 10) deferrable);

Another case is column DEFAULT values, but I don't think we care about
the constraint name in that case. It's possible to inject the column and
table level CHECK constraints names in the CREATE TABLE syntax, though.

I don't remember having hit another blocker here.

> If we're going to do reverse parsing, I would argue that we need to
> put some effort into making that more flexible than the designs I've
> seen so far. For example, I believe your concept of command string

While that seems fair enough, I have to stress that it's also known as
feature creep and a sure way to kill a patch idea. Sometimes it's
exactly what needs to be done. Sometimes not.

> normalization includes the idea that we'd fully schema-qualify the
> object name. Like... instead of saying CREATE TABLE foo (a text),
> we'd say CREATE TABLE public.foo (a text). That would guarantee that
> when we replay the command on another machine, it ends up in the same
> schema. But who is to say that's what the user wants? You can
> imagine, for example, that a user might want to replicate schema
> public on node A into schema node_a on node B. If we're back to

You have to remember at this point that the patch we're talking about is
not exposing *ONLY* the command string, but also the object name and the
schema name is nicely separated away variables.

> parsing the SQL at that point we are still ahead - but not very far
> ahead. And, of course, the proper canonicalization is really CREATE
> TABLE public.foo (a pg_catalog.text), because there could be another
> type called text in some other schema on node B. That may seem like a
> borderline case when the name of the type is "text" but if you
> substitute "bar" for "text" it suddenly starts to seem totally
> plausible that there could be more than one bar. Of course if you

That's taken care of in my ddl rewriting branch, yes. Even int and
bigint could be rewritten with their real qualified names. See a
previsouly posted example in:

http://www.postgresql.org/message-id/m2txrsdzxa.fsf@2ndQuadrant.fr

[… cut out description of …]
> get_array_of_object_names_and_schemas(pg_parse_tree), which would

> I certainly don't think this solves every problem. In particular, we
> run the risk of exposing every internal detail of the parse tree,
> something Tom and I have both vigorously resisted in every discussion
> of event triggers since time immemorial. On the other hand, the

And which is why we have very limited preprocessed variables exposed in
the event triggers instead, including the normalized command string,
rather than the parse tree itself (available only in C).

> way. For example, if we add a function that tells you what object
> name the user typed, it doesn't need to expose the fact that the parse
> tree names that field differently in different DDL constructs, which
> certainly increases our odds of not going insane after a release or
> two of maintaining this code.

That's called tg_objectname and tg_schemaname in my proposal. Can you
come up with either a design proposal of an "external" stable
representation of the parse tree, or a more complete list of information
you want exposed?

My proposal is that the external stable representation format is another
couple of years of work just to reach some traction on its specifics,
and that adding more preprocessed information is better done in 9.4.

Meanwhile, you can always get at it from an extension coded in C.

> some kind of "container" data structure to which the DDL command can
> glue in whatever bits of information it wants to include, and then
> dump the result out as a 2-D text array or a JSON object or something.

Either you want to reach agreement on the container specs, and it's a 2
years project at best, or you want to weight in so much that something
gets into 9.3, and then you will make non compatible changes in 9.4, so
that all this work is wasted: the original goal is to prevent changing
that external format in incompatible ways in between major versions.
Baring that goal, we would just be exposing the parse tree as it is.

This line of though comes from the fact that I've been sitting in
different meetings with people wanting to solve that very problem for
external tools (SQL syntax highlighters, indentation, you name it) for
several years in a row now, and I'm yet to see such a data structure
specs emerge.

>> - Non DDL related Events
>>
>> Extra bonus items, a table_rewrite event. Unlikely for 9.3 at best,
>> unless someone else wants to have at it, maybe?
>
> I'd like to take a crack at adding support for some kind of really
> cool event for 9.4 - I suspect this one is not what I'd pick as a
> first target, just because it seems hard. But I'll mull it over.

I'll hand you another one then: have INSERT INTO create the destination
TABLE when it does not exists, for RAD users out there. You know, those
schema less NoSQL guys…

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


From: Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Event Triggers: adding information
Date: 2013-01-24 10:51:12
Message-ID: m24ni67t27.fsf@2ndQuadrant.fr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> Can you point me specifically at what you have in mind so I can make
> sure I'm considering the right thing?

Something like this part:

+ -- now try something crazy to ensure we don't crash the backend
+ create function test_event_trigger_drop_function()
+ returns event_trigger
+ as $$
+ BEGIN
+ drop function test_event_trigger2() cascade;
+ END
+ $$ language plpgsql;
+
+ create function test_event_trigger2() returns event_trigger as $$
+ BEGIN
+ RAISE NOTICE 'test_event_trigger2: % %', tg_event, tg_tag;
+ END
+ $$ language plpgsql;
+
+ create event trigger drop_test_a on "ddl_command_start"
+ when tag in ('create table')
+ execute procedure test_event_trigger_drop_function();
+
+ create event trigger drop_test_b on "ddl_command_start"
+ execute procedure test_event_trigger2();
+
+ create table event_trigger_fire1 (a int);
+
+ -- then cleanup the crazy test
+ drop event trigger drop_test_a;
+ drop event trigger drop_test_b;
+ drop function test_event_trigger_drop_function();

The only problem for the regression tests being that there's an OID in
the ouput, but with your proposed error message that would go away.

--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


From: Steve Singer <ssinger(at)ca(dot)afilias(dot)info>
To: Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Event Triggers: adding information
Date: 2013-01-25 01:58:10
Message-ID: 5101E6B2.5010602@ca.afilias.info
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 13-01-24 05:43 AM, Dimitri Fontaine wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Mon, Jan 21, 2013 at 12:27 PM, Dimitri Fontaine
>> <dimitri(at)2ndquadrant(dot)fr> wrote:
>>> - Context exposing and filtering
>>
>> I'm not feeling very sanguine about any of this. I feel strongly that
>> we should try to do something that's more principled than just
>> throwing random junk into ProcessUtility.
>
> The goal is to allow for advanced users of that feature to get at the
> sequence, constraints and index names that have been picked
> automatically by PostgreSQL when doing some CREATE TABLE statement that
> involves them:
>
> CREATE TABLE foo (id serial PRIMARY KEY, f1 text);
>
> Andres and Steve, as you're the next ones to benefit directly from that
> whole feature and at different levels, do you have a strong opinion on
> whether you really need that to operate at the logical level?
>

I haven't been following this thread very closely and I'm not exactly
sure what your asking.

If the question is what a replication system would need to replicate DDL
commands, then I need a method of translating whatever the DDL trigger
is passed into either the 'CREATE TABLE public.foo(id serial primary
key, f1 text)' OR some set of equivalent commands that will allow me to
create the same objects on the replica. I don't have any brilliant
insight on how I would go about during it. Honestly I haven't spent a
lot of time thinking about it in the past 8 months.

If your asking what I would need for a trigger to automatically add a
new table to replication then I think I would only need to know (or be
able to obtain) the fully qualified name of the table and then in
another trigger call be given the name of the fully qualified name of
the sequence. The triggers would need to be able to do DML to
configuration tables. I don't need to know that a table and sequence
are connected if all I want to do is replicate them.


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Event Triggers: adding information
Date: 2013-01-25 15:04:19
Message-ID: CA+TgmoaHO7nBS9_e_fE9KZ7nZo4zVihUirTWaiJfnpUM_Q91DA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

[ changing subject line to keep threads of discussion separate ]

On Thu, Jan 24, 2013 at 5:51 AM, Dimitri Fontaine
<dimitri(at)2ndquadrant(dot)fr> wrote:
> Something like this part:
>
> + -- now try something crazy to ensure we don't crash the backend
> + create function test_event_trigger_drop_function()
> + returns event_trigger
> + as $$
> + BEGIN
> + drop function test_event_trigger2() cascade;
> + END
> + $$ language plpgsql;
> +
> + create function test_event_trigger2() returns event_trigger as $$
> + BEGIN
> + RAISE NOTICE 'test_event_trigger2: % %', tg_event, tg_tag;
> + END
> + $$ language plpgsql;
> +
> + create event trigger drop_test_a on "ddl_command_start"
> + when tag in ('create table')
> + execute procedure test_event_trigger_drop_function();
> +
> + create event trigger drop_test_b on "ddl_command_start"
> + execute procedure test_event_trigger2();
> +
> + create table event_trigger_fire1 (a int);
> +
> + -- then cleanup the crazy test
> + drop event trigger drop_test_a;
> + drop event trigger drop_test_b;
> + drop function test_event_trigger_drop_function();
>
> The only problem for the regression tests being that there's an OID in
> the ouput, but with your proposed error message that would go away.

This seems reasonable, but looking into it a little further, fixing
the error message is not quite as simple as might be hoped. It's
coming from fmgr_info, which can't be dissuaded from erroring out if
the function is gone. We could do the fmgr_info lookup earlier, but
it doesn't really help: if the tuple disappears, then
plpgsql_compile() will eventually be reached and will go splat anyway.

As far as I can see, the only reasonable way to keep this from
exploding is to take AccessShareLock on the function once we've looked
it up, and then repeat the syscache lookup immediately thereafter to
verify that the OID is still present. If it is, it can't subsequently
go away. This seems like a pretty expensive solution, though. At
present event triggers only fire during DDL commands so maybe this is
OK, but if you imagine using it for NOTIFY or DML then it starts to
sound like a cripplingly-high cost.

So I'm not sure what to do. I'm not thrilled with the idea of leaving
this the way it is, but it's not a big enough problem for me to argue
for ripping event triggers back out in their entirety, either. So
maybe we just have to live with it until somebody comes up with a
better idea.

Anybody got one?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Steve Singer <ssinger(at)ca(dot)afilias(dot)info>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Event Triggers: adding information
Date: 2013-01-25 15:42:24
Message-ID: CA+TgmoZz6MXQ5zX6dopc_xaHVkdwhEhgDFJeAWsRNs+N7e_ueA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Jan 24, 2013 at 5:43 AM, Dimitri Fontaine
<dimitri(at)2ndquadrant(dot)fr> wrote:
>> But I wonder: wouldn't it be better to just expose the raw string the
>> user typed? I mean, in all the cases we really care about, the
>> information we can *reliably* expose at this point is going to be
>> pretty nearly the same as extracting the third space-delimited word
>> out of the command text and splitting it on a period. And you can do
>> that much easily even in PL/pgsql. You could also extract a whole lot
>
> Totally Agreed. That's another reason why I want to provide users with
> the Normalized Command String, it will be then be even easier for them
> to do what you just say.

OK, but can we lay the issue of a *normalized* command string to the
side for just one minute, and talk about exposing the *raw* command
string? It seems to me that this would be (1) very easy to do, (2)
reasonable to slip into 9.3, and (3) useful to some people. Arguably,
the normalized command string would be useful to MORE people, but I
think the raw command string is not without its uses, and it's at
least one order of magnitude less work.

>> Now, in a ddl_command_end trigger, there's a lot more information that
>> you can usefully expose. In theory, if the command records "what it
>> did" somewhere, you can extract that information with as much
>> precision as it cared to record. However, I'm pretty skeptical of the
>> idea of exposing a single OID. I mean, if the "main" use case here is
>
> There's precedent in PostgreSQL: how do you get information about each
> row that were in the target from a FOR EACH STATEMENT trigger?

You don't. But, the fact that you can't is an unpleasant limitation,
not a model of emulation.

We make trade-offs about the scope of features all the time. Somebody
evidently thought it was reasonable for each-row triggers to expose
the tuple but to leave the corresponding feature for each-statement
triggers unimplemented, and as much as I'd like to have that feature,
I agree that was a reasonable trade-off. But it would be quite
something else again if someone were to propose the idea of having NEW
and OLD available for each-statement triggers, but only give the
information about the first row affected by the statement, rather than
all of them. I think that proposal would quite rightly be rejected,
and I think it's morally equivalent to what you're proposing, or what
I understood you to be proposing, at any rate.

>> array of OIDs. Really, you're going to need something quite a bit
>> more complicated than that to describe something like ALTER TABLE, but
>> even for pretty simple cases a single OID doesn't cut it.
>
> If you want to solve that problem, let's talk about pseudo relations
> that you can SELECT FROM, please. We can already expose a tuplestore in
> PL code by means of cursors and SRF, but I'm not sure that's how we want
> to expose the "statement level information" here.

I'm not sure, either, but I think that exposing things as tables is a
neat idea. I had imagined passing either JSON or arrays, but tables
would be easier to work with, at least for PL/pgsql triggers.

> The only commands that will target more than one object are:
>
> - DROP, either in the command or by means of CASCADE;
> - CREATE SCHEMA with its PROCESS_UTILITY_SUBCOMMAND usage;

CREATE TABLE can also create subsidiary objects (indexes,
constraints); and there could be more things that do this in the
future.

> The CASCADE case is something else entirely, and we need to be very
> careful about its locking behavior. Also, in the CASCADE case I can
> perfectly agree that we're not talking about a ddl_something event any
> more, and that in 9.4 we will be able to provide a sql_drop generic
> event that will now about that.

I've felt from the beginning that we really need that to be able to do
anything worthwhile with this feature, and I said so last year around
this time. Meanwhile, absent that, if we put something in here that
only handles a subset of cases, the result will be DDL replication
that randomly breaks. Bill Gates used to have a reputation for being
able to give demos of beta software where he carefully avoided doing
things that would trigger the known crash bugs, and thus make it look
like everything was working great. That's fine for a marketing
presentation, but releasing in that kind of state gets you a bad
reputation.

> Mostly agreed. Do we want to ship with only PL/pgSQL support? I've not
> been following how those PL features usually reach the other PLs…

That doesn't bother me. If the facility is useful enough that people
want it in other PLs, they can submit patches to add it. I don't
believe there's any requirement that we must support this feature in
every PL before we can think about releasing.

> Well, try and get the name of the exclusion constraint created when
> running that command, then rewrite it with the name injected in the
> command:
>
> CREATE TABLE xyz(i int, exclude (i WITH =) where (i > 10) deferrable);

Oh. Ouch.

> Another case is column DEFAULT values, but I don't think we care about
> the constraint name in that case. It's possible to inject the column and
> table level CHECK constraints names in the CREATE TABLE syntax, though.

I don't think column DEFAULTs create constraints, do they?

>> way. For example, if we add a function that tells you what object
>> name the user typed, it doesn't need to expose the fact that the parse
>> tree names that field differently in different DDL constructs, which
>> certainly increases our odds of not going insane after a release or
>> two of maintaining this code.
>
> That's called tg_objectname and tg_schemaname in my proposal. Can you
> come up with either a design proposal of an "external" stable
> representation of the parse tree, or a more complete list of information
> you want exposed?

I'm not sure I can, but I think that the design I'm proposing gives a
lot of flexibility. If we create a pg_parse_tree type with no input
function and an output function that merely returns a dummy value, and
we pass a pg_parse_tree object to PL/pgsql event triggers, then it's a
small win even if it offers no accessors at all - because somebody
could put whatever accessor functions they want in a contrib module
without waiting for 9.4.

If we do offer some accessors (which I assume we would), then it's
even better. You can get at the information exposed by those
accessors, and if you want more, you don't need to wait for a new PG
release, you can add 'em via a home-grown contrib module, again,
without waiting for 9.4.

The point is - we wouldn't expose the whole parse tree. Rather, we'd
expose a parse-tree as an opaque object, and give the user functions
that would expose specific bits of it. Those bits could grow over
time without needing any changes to PLs or existing trigger functions,
and they wouldn't change if the parse tree internals changed, and
users could extend the amount of information exposed via contrib
functions (at their own risk, of course).

>>> Extra bonus items, a table_rewrite event. Unlikely for 9.3 at best,
>>> unless someone else wants to have at it, maybe?
>>
>> I'd like to take a crack at adding support for some kind of really
>> cool event for 9.4 - I suspect this one is not what I'd pick as a
>> first target, just because it seems hard. But I'll mull it over.
>
> I'll hand you another one then: have INSERT INTO create the destination
> TABLE when it does not exists, for RAD users out there. You know, those
> schema less NoSQL guys…

Wow, you don't think small. I'm not sure how hard that would be, but
I agree it would be cool. I think the trigger might have to be
written in C to get the behavior people would likely want without an
unreasonable amount of screwing around.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Steve Singer <ssinger(at)ca(dot)afilias(dot)info>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Event Triggers: adding information
Date: 2013-01-25 16:58:07
Message-ID: m238xptd28.fsf@2ndQuadrant.fr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> OK, but can we lay the issue of a *normalized* command string to the
> side for just one minute, and talk about exposing the *raw* command
> string? It seems to me that this would be (1) very easy to do, (2)
> reasonable to slip into 9.3, and (3) useful to some people. Arguably,
> the normalized command string would be useful to MORE people, but I
> think the raw command string is not without its uses, and it's at
> least one order of magnitude less work.

My understanding is that if the command string we give to event triggers
is ambiguous (sub-object names, schema qualifications, etc), it comes
useless for logical replication use. I'll leave it to the consumers of
that to speak up now.

[… no access to tuples in per-statement triggers …]
> I agree that was a reasonable trade-off. But it would be quite
> something else again if someone were to propose the idea of having NEW
> and OLD available for each-statement triggers, but only give the
> information about the first row affected by the statement, rather than
> all of them. I think that proposal would quite rightly be rejected,
> and I think it's morally equivalent to what you're proposing, or what
> I understood you to be proposing, at any rate.

No it's not.

+ /*
+ * we only support filling-in information for DROP command if we only
+ * drop a single object at a time, in all other cases the ObjectID,
+ * Name and Schema will remain NULL.
+ */
+ if (list_length(stmt->objects) != 1)

The current patch implementation is to fill in the object id, name and
schema with NULL when we have something else than a single object as the
target. I did that when I realized we have a precedent with statement
triggers and that we would maybe share the implementation of the "record
set variable" facility for PLs here.

> I'm not sure, either, but I think that exposing things as tables is a
> neat idea. I had imagined passing either JSON or arrays, but tables
> would be easier to work with, at least for PL/pgsql triggers.

Any idea about a practical implementation that we can do in the 9.3
timeframe? Baring that my proposal is to allow ourselves not to provide
the information at all in that case in 9.3, and complete the feature by
9.4 time frame.

Possibly with OLD and NEW relations for per-statement triggers, if it
turns out as I think that we can re-use the same piece of PLpgSQL side
framework in both cases.

>> The only commands that will target more than one object are:
>>
>> - DROP, either in the command or by means of CASCADE;
>> - CREATE SCHEMA with its PROCESS_UTILITY_SUBCOMMAND usage;
>
> CREATE TABLE can also create subsidiary objects (indexes,
> constraints); and there could be more things that do this in the
> future.

Subsidiary objects are not the same thing at all as a command that
targets more than one object, and the difference is user visible.

>> The CASCADE case is something else entirely, and we need to be very
>> careful about its locking behavior. Also, in the CASCADE case I can
>> perfectly agree that we're not talking about a ddl_something event any
>> more, and that in 9.4 we will be able to provide a sql_drop generic
>> event that will now about that.
>
> I've felt from the beginning that we really need that to be able to do
> anything worthwhile with this feature, and I said so last year around
> this time. Meanwhile, absent that, if we put something in here that
> only handles a subset of cases, the result will be DDL replication
> that randomly breaks.

So we have two proposals here:

- Have the cascading drop calls back to process utility with a new
context value of PROCESS_UTILITY_CASCADE and its parse node, wherein
you only stuff the ObjectAdress, and provide event triggers support
for this new cascade command;

- Implement a new event called "sql_drop" where you provide the same
amount of information than in a ddl_command event (same lookups,
etc), but without any parsetree nor I suppose the command string
that the user would have to type to drop just that object.

You objected to the first on modularity violation grounds, and on the
second on development cycle timing grounds. And now you're saying that
because we don't have a practical solution, I'm not sure, apparently
it's dead, but what is?

Please help me decipher your process of thoughs and conclusions.

> That doesn't bother me. If the facility is useful enough that people
> want it in other PLs, they can submit patches to add it. I don't
> believe there's any requirement that we must support this feature in
> every PL before we can think about releasing.

Ok, I won't bother either then.

> I'm not sure I can, but I think that the design I'm proposing gives a
> lot of flexibility. If we create a pg_parse_tree type with no input
> function and an output function that merely returns a dummy value, and
> we pass a pg_parse_tree object to PL/pgsql event triggers, then it's a
> small win even if it offers no accessors at all - because somebody
> could put whatever accessor functions they want in a contrib module
> without waiting for 9.4.

That's not solving the problem we want to solve, though. You can already
do what you say with the code that we have today, you just have to code
the extension in C.

> The point is - we wouldn't expose the whole parse tree. Rather, we'd
> expose a parse-tree as an opaque object, and give the user functions
> that would expose specific bits of it. Those bits could grow over
> time without needing any changes to PLs or existing trigger functions,
> and they wouldn't change if the parse tree internals changed, and
> users could extend the amount of information exposed via contrib
> functions (at their own risk, of course).

We already have that, baring the extension that grovels through the
parse tree. The benefit from maintaining that code as an extension is
relying on a stable enough API from the backend so that you can provide
the same behaviour to users accross major versions. And with pg_upgrade,
the same storage format.

Here I don't see how to reach that goals without anything new in core,
given that the main requirement from core about the parse tree is to be
able to whack it around in any release (even minor) without concern's
about API nor ABI stability.

>> I'll hand you another one then: have INSERT INTO create the destination
>> TABLE when it does not exists, for RAD users out there. You know, those
>> schema less NoSQL guys…
>
> Wow, you don't think small. I'm not sure how hard that would be, but
> I agree it would be cool. I think the trigger might have to be
> written in C to get the behavior people would likely want without an
> unreasonable amount of screwing around.

What I'm chasing here is being able to extend DDL in user land, without
even having to code in C. We'll get there.

A practical example that we saw on this list very recently is addressing
the trusted bits for PL: install an event trigger on CREATE FUNCTION and
have it run a SECURITY DEFINER function implemented in PLpgSQL, that
will do some checks then the CREATE FUNCTION for you (taking care of
disabling its own event trigger to avoid infinite recursion).

Of course to get there you need INSTEAD OF event triggers and we had to
remove them from my patch series last year because we were not in a
position to assess that the call points in each commands where safe
enough. Once the current framework is in place, we will be able to get
back to INSTEAD OF event triggers. We'll get there.

And, well, yes, when you embark in side projects for a 2 or 3 cycles
development, you need to have something of a vision, I guess… Event
Triggers are but a building block to progress towards achieving mine.
That feature appeared to me to be the Right Way™ to add a generic
facility in PostgreSQL that you can use for a number of different use
cases, many birds with a single stone. Well, a menhir maybe.

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


From: Christopher Browne <cbbrowne(at)gmail(dot)com>
To: Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Steve Singer <ssinger(at)ca(dot)afilias(dot)info>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Event Triggers: adding information
Date: 2013-01-25 18:06:00
Message-ID: CAFNqd5VuowUqXrHfWf_Ld-_szCUxaN3=RZD=XiVmNr_Yd=53QQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Jan 25, 2013 at 11:58 AM, Dimitri Fontaine
<dimitri(at)2ndquadrant(dot)fr> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> OK, but can we lay the issue of a *normalized* command string to the
>> side for just one minute, and talk about exposing the *raw* command
>> string? It seems to me that this would be (1) very easy to do, (2)
>> reasonable to slip into 9.3, and (3) useful to some people. Arguably,
>> the normalized command string would be useful to MORE people, but I
>> think the raw command string is not without its uses, and it's at
>> least one order of magnitude less work.
>
> My understanding is that if the command string we give to event triggers
> is ambiguous (sub-object names, schema qualifications, etc), it comes
> useless for logical replication use. I'll leave it to the consumers of
> that to speak up now.

"Useless" is a strong word, but it certainly injures usefulness pretty
substantially.

If it isn't normalized, then either we accept that:

a) If you fail to properly qualify your inputs, when generating DDL,
we can offer that it's pretty likely it'll all smash on the floor when
we try to replicate it, or

b) We need to capture the active search_path from the
environment at the instant of the DDL capture event, and carry it
over along with the DDL. If we could assume that the
GUC, search_path, was the correct value, that's possibly not super
difficult, but I'm not certain that's the correct thing to capture.
--
When confronted by a difficult problem, solve it by reducing it to the
question, "How would the Lone Ranger handle this?"


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Steve Singer <ssinger(at)ca(dot)afilias(dot)info>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Event Triggers: adding information
Date: 2013-01-27 04:11:42
Message-ID: CA+TgmoZ4U6Bwe-+HhUFWaj2fhG5S=Wo5zK5KR0x7RyU9qphKAQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Jan 25, 2013 at 11:58 AM, Dimitri Fontaine
<dimitri(at)2ndquadrant(dot)fr> wrote:
> My understanding is that if the command string we give to event triggers
> is ambiguous (sub-object names, schema qualifications, etc), it comes
> useless for logical replication use. I'll leave it to the consumers of
> that to speak up now.

Yeah, that's probably true. I think it might be useful for other
purposes, but I think we need a bunch of infrastructure we don't have
yet to make logical replication of DDL a reality.

> [… no access to tuples in per-statement triggers …]
>> I agree that was a reasonable trade-off. But it would be quite
>> something else again if someone were to propose the idea of having NEW
>> and OLD available for each-statement triggers, but only give the
>> information about the first row affected by the statement, rather than
>> all of them. I think that proposal would quite rightly be rejected,
>> and I think it's morally equivalent to what you're proposing, or what
>> I understood you to be proposing, at any rate.
>
> No it's not.
>
> + /*
> + * we only support filling-in information for DROP command if we only
> + * drop a single object at a time, in all other cases the ObjectID,
> + * Name and Schema will remain NULL.
> + */
> + if (list_length(stmt->objects) != 1)
>
> The current patch implementation is to fill in the object id, name and
> schema with NULL when we have something else than a single object as the
> target. I did that when I realized we have a precedent with statement
> triggers and that we would maybe share the implementation of the "record
> set variable" facility for PLs here.

But I don't see how *that's* going to be any good for logical
replication either. If the raw query string isn't useful, getting the
OID in some cases but not all surely can't be any better.

>> I'm not sure, either, but I think that exposing things as tables is a
>> neat idea. I had imagined passing either JSON or arrays, but tables
>> would be easier to work with, at least for PL/pgsql triggers.
>
> Any idea about a practical implementation that we can do in the 9.3
> timeframe? Baring that my proposal is to allow ourselves not to provide
> the information at all in that case in 9.3, and complete the feature by
> 9.4 time frame.

No, I don't have a brilliant idea. I think that we've probably about
come to the end of what we can reasonably do for 9.3.

> Possibly with OLD and NEW relations for per-statement triggers, if it
> turns out as I think that we can re-use the same piece of PLpgSQL side
> framework in both cases.

That would be a GREAT thing to have.

>>> The only commands that will target more than one object are:
>>>
>>> - DROP, either in the command or by means of CASCADE;
>>> - CREATE SCHEMA with its PROCESS_UTILITY_SUBCOMMAND usage;
>>
>> CREATE TABLE can also create subsidiary objects (indexes,
>> constraints); and there could be more things that do this in the
>> future.
>
> Subsidiary objects are not the same thing at all as a command that
> targets more than one object, and the difference is user visible.

Hmm. Really?

I guess I'm not seeing why those cases are particularly different.

>>> The CASCADE case is something else entirely, and we need to be very
>>> careful about its locking behavior. Also, in the CASCADE case I can
>>> perfectly agree that we're not talking about a ddl_something event any
>>> more, and that in 9.4 we will be able to provide a sql_drop generic
>>> event that will now about that.
>>
>> I've felt from the beginning that we really need that to be able to do
>> anything worthwhile with this feature, and I said so last year around
>> this time. Meanwhile, absent that, if we put something in here that
>> only handles a subset of cases, the result will be DDL replication
>> that randomly breaks.
>
> So we have two proposals here:
>
> - Have the cascading drop calls back to process utility with a new
> context value of PROCESS_UTILITY_CASCADE and its parse node, wherein
> you only stuff the ObjectAdress, and provide event triggers support
> for this new cascade command;
>
> - Implement a new event called "sql_drop" where you provide the same
> amount of information than in a ddl_command event (same lookups,
> etc), but without any parsetree nor I suppose the command string
> that the user would have to type to drop just that object.
>
> You objected to the first on modularity violation grounds, and on the
> second on development cycle timing grounds. And now you're saying that
> because we don't have a practical solution, I'm not sure, apparently
> it's dead, but what is?
>
> Please help me decipher your process of thoughs and conclusions.

OK, so I object to the first one on modularity grounds (as did Tom).
I don't object to the second one at all except that, AFAIK, nobody's
written the code yet. Maybe I'm misunderstanding something.

Actually, I think we could probably use a similar set of
infrastructure *either* to implement sql_drop OR to provide the
information to ddl_command_end. What I'm thinking about is that we
might have something sort of like after-trigger queue that we use for
ordinary triggers. Instead of calling a trigger as the drops happen,
we queue up a bunch of events that say "xyz got dropped". And then,
we can either fire something like ddl_command_end (and pass in the
whole list as an array or a table) or we can call something like
sql_drop (n times, passing details one one object per invocation). In
either approach, the triggers fire *after* the drops rather than in
the middle of them - it's just a question of whether you want one call
for all the drops or one call for each drop.

>> I'm not sure I can, but I think that the design I'm proposing gives a
>> lot of flexibility. If we create a pg_parse_tree type with no input
>> function and an output function that merely returns a dummy value, and
>> we pass a pg_parse_tree object to PL/pgsql event triggers, then it's a
>> small win even if it offers no accessors at all - because somebody
>> could put whatever accessor functions they want in a contrib module
>> without waiting for 9.4.
>
> That's not solving the problem we want to solve, though. You can already
> do what you say with the code that we have today, you just have to code
> the extension in C.

I don't agree, but I recognize there might be more than one valid opinion here.

>> The point is - we wouldn't expose the whole parse tree. Rather, we'd
>> expose a parse-tree as an opaque object, and give the user functions
>> that would expose specific bits of it. Those bits could grow over
>> time without needing any changes to PLs or existing trigger functions,
>> and they wouldn't change if the parse tree internals changed, and
>> users could extend the amount of information exposed via contrib
>> functions (at their own risk, of course).
>
> We already have that, baring the extension that grovels through the
> parse tree. The benefit from maintaining that code as an extension is
> relying on a stable enough API from the backend so that you can provide
> the same behaviour to users accross major versions. And with pg_upgrade,
> the same storage format.
>
> Here I don't see how to reach that goals without anything new in core,
> given that the main requirement from core about the parse tree is to be
> able to whack it around in any release (even minor) without concern's
> about API nor ABI stability.

Well, the point is that if you have a function that maps a parse tree
onto an object name, any API or ABI changes can be reflected in an
updated definition for that function. So suppose I have the command
"CREATE TABLE public.foo (a int)". And we have a call
pg_target_object_namespace(), which will return "public" given the
parse tree for the foregoing command. And we have a call
pg_target_object_name(), which will return "foo". We can whack around
the underlying parse tree representation all we want and still not
break anything - because any imaginable parse tree representation will
allow the object name and object namespace to be extracted. Were that
not possible it could scarcely be called a parse tree any longer.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Steve Singer <ssinger(at)ca(dot)afilias(dot)info>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, Andres Freund <andres(at)anarazel(dot)de>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Event Triggers: adding information
Date: 2013-01-27 17:08:33
Message-ID: 51055F11.6040208@ca.afilias.info
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 13-01-26 11:11 PM, Robert Haas wrote:
> On Fri, Jan 25, 2013 at 11:58 AM, Dimitri Fontaine
> <dimitri(at)2ndquadrant(dot)fr> wrote:
>> My understanding is that if the command string we give to event triggers
>> is ambiguous (sub-object names, schema qualifications, etc), it comes
>> useless for logical replication use. I'll leave it to the consumers of
>> that to speak up now.
>
> Yeah, that's probably true. I think it might be useful for other
> purposes, but I think we need a bunch of infrastructure we don't have
> yet to make logical replication of DDL a reality.
>

I agree. Does anyone have a specific use case other than DDL
replication where an ambiguous command string would be useful? Even for
use cases like automatically removing a table from replication when it
is dropped, I would want to be able to determine which table is being
dropped unambiguously. Could I determine that from an oid? I suspect so,
but parsing a command string and then trying to figure out the table
from the search_path doesn't sound very appealing.

> Well, the point is that if you have a function that maps a parse tree
> onto an object name, any API or ABI changes can be reflected in an
> updated definition for that function. So suppose I have the command
> "CREATE TABLE public.foo (a int)". And we have a call
> pg_target_object_namespace(), which will return "public" given the
> parse tree for the foregoing command. And we have a call
> pg_target_object_name(), which will return "foo". We can whack around
> the underlying parse tree representation all we want and still not
> break anything - because any imaginable parse tree representation will
> allow the object name and object namespace to be extracted. Were that
> not possible it could scarcely be called a parse tree any longer.

How do you get the fully qualified type of the first column?
col1=pg_target_get_column(x, 0)
pg_target_get_type(col1);

or something similar.

I think that could work but we would be adding a lot of API functions to
get all the various bits of info one would want the API to expose. I
also suspect executing triggers that had to make lots of function calls
to walk a tree would be much slower than an extension that could just
walk the parse-tree or some other abstract tree like structure.

Steve


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Steve Singer <ssinger(at)ca(dot)afilias(dot)info>
Cc: Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, Andres Freund <andres(at)anarazel(dot)de>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Event Triggers: adding information
Date: 2013-01-27 17:20:18
Message-ID: CA+TgmobB1psa0HLAy6uTK5NrZd37Li4pMH+8fF8RNFiheZcGHg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Jan 27, 2013 at 12:08 PM, Steve Singer <ssinger(at)ca(dot)afilias(dot)info> wrote:
> On 13-01-26 11:11 PM, Robert Haas wrote:
>> On Fri, Jan 25, 2013 at 11:58 AM, Dimitri Fontaine
>> <dimitri(at)2ndquadrant(dot)fr> wrote:
>>>
>>> My understanding is that if the command string we give to event triggers
>>> is ambiguous (sub-object names, schema qualifications, etc), it comes
>>> useless for logical replication use. I'll leave it to the consumers of
>>> that to speak up now.
>>
>> Yeah, that's probably true. I think it might be useful for other
>> purposes, but I think we need a bunch of infrastructure we don't have
>> yet to make logical replication of DDL a reality.
>
> I agree. Does anyone have a specific use case other than DDL replication
> where an ambiguous command string would be useful? Even for use cases like
> automatically removing a table from replication when it is dropped, I would
> want to be able to determine which table is being dropped unambiguously.
> Could I determine that from an oid? I suspect so, but parsing a command
> string and then trying to figure out the table from the search_path doesn't
> sound very appealing.

I was thinking about logging or auditing applications - e.g. log all
command strings whose first word is "drop". We do get customers
coming to us with that sort of request from time to time.

>> Well, the point is that if you have a function that maps a parse tree
>> onto an object name, any API or ABI changes can be reflected in an
>> updated definition for that function. So suppose I have the command
>> "CREATE TABLE public.foo (a int)". And we have a call
>> pg_target_object_namespace(), which will return "public" given the
>> parse tree for the foregoing command. And we have a call
>> pg_target_object_name(), which will return "foo". We can whack around
>> the underlying parse tree representation all we want and still not
>> break anything - because any imaginable parse tree representation will
>> allow the object name and object namespace to be extracted. Were that
>> not possible it could scarcely be called a parse tree any longer.
>
> How do you get the fully qualified type of the first column?
> col1=pg_target_get_column(x, 0)
> pg_target_get_type(col1);
>
> or something similar.

Or maybe return all the data for all the columns as an array of records...

> I think that could work but we would be adding a lot of API functions to get
> all the various bits of info one would want the API to expose.

Yeah, that's the principle disadvantage of this method, AFAICS. OTOH,
I am not sure anything else we might do is any better. You can avoid
the complexity of defining and specifying all that interface by just
taking the parse tree and running nodeToString() on it and lobbing it
over the fence, but that is basically abdicating the responsibility of
defining a sane interface in favor of just chucking whatever you
happen to have handy over the wall and hoping the user is OK with
calling that an interface. Assuming you think that's a bad idea (and
Tom and I do, at least), you've got to put in the effort to design
something, and that thing, however constituted, is going to be fairly
extensive.

> I also
> suspect executing triggers that had to make lots of function calls to walk a
> tree would be much slower than an extension that could just walk the
> parse-tree or some other abstract tree like structure.

I am kind of doubtful that this is a real problem - why should it be
any slower than, say, parsing a JSON object into an AST?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Steve Singer <ssinger(at)ca(dot)afilias(dot)info>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Event Triggers: adding information
Date: 2013-01-27 17:57:11
Message-ID: m2622io6fc.fsf@2ndQuadrant.fr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> The current patch implementation is to fill in the object id, name and
>> schema with NULL when we have something else than a single object as the
>> target. I did that when I realized we have a precedent with statement
>> triggers and that we would maybe share the implementation of the "record
>> set variable" facility for PLs here.
>
> But I don't see how *that's* going to be any good for logical
> replication either. If the raw query string isn't useful, getting the
> OID in some cases but not all surely can't be any better.

Well I would think that "we don't support droping several objects at
once yet in 9.3" is an acceptable answer. And for DROP, it's a special
case anyway as what happens is that you *stop* receiving any data from
those tables/objects, so the problem is one of cleanup.

>> So we have two proposals here:
>>
>> - Have the cascading drop calls back to process utility with a new
>> context value of PROCESS_UTILITY_CASCADE and its parse node, wherein
>> you only stuff the ObjectAdress, and provide event triggers support
>> for this new cascade command;
>>
>> - Implement a new event called "sql_drop" where you provide the same
>> amount of information than in a ddl_command event (same lookups,
>> etc), but without any parsetree nor I suppose the command string
>> that the user would have to type to drop just that object.
>>
>> You objected to the first on modularity violation grounds, and on the
>> second on development cycle timing grounds. And now you're saying that
>> because we don't have a practical solution, I'm not sure, apparently
>> it's dead, but what is?
>>
>> Please help me decipher your process of thoughs and conclusions.
>
> OK, so I object to the first one on modularity grounds (as did Tom).
> I don't object to the second one at all except that, AFAIK, nobody's
> written the code yet. Maybe I'm misunderstanding something.

I mean object as in "not in 9.3, too late", which you seem to me
confirming here. So what do we want to provide in 9.3 exactly?

Note that I have enough time allocated on finishing that patch that I
surely can come up with an implementation of whatever we finally decide
within this commit fest, if the implementation is straightforward
enough. As I've been playing in and out around those topics for 2 years
now, plenty of things are a SMOP now: the problem is reaching an
agreement about how to implement things.

> Actually, I think we could probably use a similar set of
> infrastructure *either* to implement sql_drop OR to provide the
> information to ddl_command_end. What I'm thinking about is that we
> might have something sort of like after-trigger queue that we use for
> ordinary triggers. Instead of calling a trigger as the drops happen,
> we queue up a bunch of events that say "xyz got dropped". And then,
> we can either fire something like ddl_command_end (and pass in the
> whole list as an array or a table) or we can call something like
> sql_drop (n times, passing details one one object per invocation). In
> either approach, the triggers fire *after* the drops rather than in
> the middle of them - it's just a question of whether you want one call
> for all the drops or one call for each drop.

So, do you want to see a patch implementing that to be able to provide
information about DROP commands targeting more than one object and same
information in other cases (object id, name, schema, operation name,
object type)? That could be the next patch of the series, by Thursday.

> Well, the point is that if you have a function that maps a parse tree
> onto an object name, any API or ABI changes can be reflected in an
> updated definition for that function. So suppose I have the command
> "CREATE TABLE public.foo (a int)". And we have a call
> pg_target_object_namespace(), which will return "public" given the
> parse tree for the foregoing command. And we have a call
> pg_target_object_name(), which will return "foo". We can whack around
> the underlying parse tree representation all we want and still not
> break anything - because any imaginable parse tree representation will
> allow the object name and object namespace to be extracted. Were that
> not possible it could scarcely be called a parse tree any longer.

In my view, we have no DDL operation where we don't want to be able to
get at the Object ID, name and schema of the operation. So while more
complex cases might well require a full function based API, I think a
set of magic variables TG_* for basic information is going to make life
easier for everybody.

Then, more complex cases are left alone in 9.3 and we have whole
development cycle to address them, with some data type and a set of
functions maybe.

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Steve Singer <ssinger(at)ca(dot)afilias(dot)info>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Event Triggers: adding information
Date: 2013-01-27 19:08:47
Message-ID: CA+TgmoZYxSEm2Pzpif-KGGBpFHAxODqSpJL+zT-ic_QgAH-8kA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Jan 27, 2013 at 12:57 PM, Dimitri Fontaine
<dimitri(at)2ndquadrant(dot)fr> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>>> The current patch implementation is to fill in the object id, name and
>>> schema with NULL when we have something else than a single object as the
>>> target. I did that when I realized we have a precedent with statement
>>> triggers and that we would maybe share the implementation of the "record
>>> set variable" facility for PLs here.
>>
>> But I don't see how *that's* going to be any good for logical
>> replication either. If the raw query string isn't useful, getting the
>> OID in some cases but not all surely can't be any better.
>
> Well I would think that "we don't support droping several objects at
> once yet in 9.3" is an acceptable answer.

On that point, I disagree.

>>> So we have two proposals here:
>>>
>>> - Have the cascading drop calls back to process utility with a new
>>> context value of PROCESS_UTILITY_CASCADE and its parse node, wherein
>>> you only stuff the ObjectAdress, and provide event triggers support
>>> for this new cascade command;
>>>
>>> - Implement a new event called "sql_drop" where you provide the same
>>> amount of information than in a ddl_command event (same lookups,
>>> etc), but without any parsetree nor I suppose the command string
>>> that the user would have to type to drop just that object.
>>>
>>> You objected to the first on modularity violation grounds, and on the
>>> second on development cycle timing grounds. And now you're saying that
>>> because we don't have a practical solution, I'm not sure, apparently
>>> it's dead, but what is?
>>>
>>> Please help me decipher your process of thoughs and conclusions.
>>
>> OK, so I object to the first one on modularity grounds (as did Tom).
>> I don't object to the second one at all except that, AFAIK, nobody's
>> written the code yet. Maybe I'm misunderstanding something.
>
> I mean object as in "not in 9.3, too late", which you seem to me
> confirming here. So what do we want to provide in 9.3 exactly?
>
> Note that I have enough time allocated on finishing that patch that I
> surely can come up with an implementation of whatever we finally decide
> within this commit fest, if the implementation is straightforward
> enough. As I've been playing in and out around those topics for 2 years
> now, plenty of things are a SMOP now: the problem is reaching an
> agreement about how to implement things.

I don't really know how to respond to this. The CommitFest time is
supposed to be when you stop working on your own stuff and work on
reviewing other people's stuff, and yet, both this year and last year,
you seem to have no compunctions whatever about continuing to code
right through the CommitFest, which to me misses the point. The
CommitFest, and especially the last one, is supposed to be the time to
commit what is committable now, not to go write new code. And it is
not as if I didn't make these same design points last year around this
time.

On the other hand, if it's really true that you can bang this out in a
couple of days and come up with something committable, will I commit
it? Yeah, probably. But I think that is likely to take several
months of work, not several days, and the longer we go on with this
patch the less time there is for me to review other things that might
be a whole lot more ready to go than code that isn't written yet.

>> Actually, I think we could probably use a similar set of
>> infrastructure *either* to implement sql_drop OR to provide the
>> information to ddl_command_end. What I'm thinking about is that we
>> might have something sort of like after-trigger queue that we use for
>> ordinary triggers. Instead of calling a trigger as the drops happen,
>> we queue up a bunch of events that say "xyz got dropped". And then,
>> we can either fire something like ddl_command_end (and pass in the
>> whole list as an array or a table) or we can call something like
>> sql_drop (n times, passing details one one object per invocation). In
>> either approach, the triggers fire *after* the drops rather than in
>> the middle of them - it's just a question of whether you want one call
>> for all the drops or one call for each drop.
>
> So, do you want to see a patch implementing that to be able to provide
> information about DROP commands targeting more than one object and same
> information in other cases (object id, name, schema, operation name,
> object type)? That could be the next patch of the series, by Thursday.

I think this is approximately correct as a next logical step, but are
we really clear on what the design for that looks like? I thought you
were saying upthread that you wanted to make this information
available via some kind of pseudo-table for 9.4. In any event, adding
a new event type (sql_drop) is a separate patch, and adding
information to the existing types of event triggers is a separate
patch, so you should try to do one of those two, not both.

>> Well, the point is that if you have a function that maps a parse tree
>> onto an object name, any API or ABI changes can be reflected in an
>> updated definition for that function. So suppose I have the command
>> "CREATE TABLE public.foo (a int)". And we have a call
>> pg_target_object_namespace(), which will return "public" given the
>> parse tree for the foregoing command. And we have a call
>> pg_target_object_name(), which will return "foo". We can whack around
>> the underlying parse tree representation all we want and still not
>> break anything - because any imaginable parse tree representation will
>> allow the object name and object namespace to be extracted. Were that
>> not possible it could scarcely be called a parse tree any longer.
>
> In my view, we have no DDL operation where we don't want to be able to
> get at the Object ID, name and schema of the operation. So while more
> complex cases might well require a full function based API, I think a
> set of magic variables TG_* for basic information is going to make life
> easier for everybody.
>
> Then, more complex cases are left alone in 9.3 and we have whole
> development cycle to address them, with some data type and a set of
> functions maybe.

Doing some of it one way and some of it the other way seems kind of
grotty, though.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Steve Singer <ssinger(at)ca(dot)afilias(dot)info>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Event Triggers: adding information
Date: 2013-01-27 20:51:54
Message-ID: m27gmyjqmt.fsf@2ndQuadrant.fr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> Well I would think that "we don't support droping several objects at
>> once yet in 9.3" is an acceptable answer.
>
> On that point, I disagree.

Ok, will provide a patch for that soon, then. I did give higher priority
to exposing more information, ok to switch. Read some more for caveats.

> I don't really know how to respond to this. The CommitFest time is
> supposed to be when you stop working on your own stuff and work on
> reviewing other people's stuff, and yet, both this year and last year,
> you seem to have no compunctions whatever about continuing to code
> right through the CommitFest, which to me misses the point. The

You might realize that I'm not a commiter, so commit fest is when I get
some review and a chance to react quickly so that we get enough momentum
to reach a consensus and maybe commit something. Also, it seems to me
that I've been very careful to review as many patches as I could in all
commit fests where I had a patch involved.

Also, please consider that if I only allowed myself to write code in
between commit fests, I would still have zero patches in PostgreSQL.
None. Zilch.

> CommitFest, and especially the last one, is supposed to be the time to
> commit what is committable now, not to go write new code. And it is
> not as if I didn't make these same design points last year around this
> time.

Well, a commit fest for me is the only time when we can reach a common
design, call it a consensus and have matching code to evaluate and maybe
commit. And it's not like if I didn't rewrite this patch series several
times already from scratch to follow your design reviews.

The points where I didn't rewrite the code to match your ideas are those
that I didn't give priority to, or for which I still think my proposal
is better than yours, and so we didn't get to a consensus yet. It seems
to me to be a pretty normal state of affairs, and again, if I am to
follow your advice and never come up with new code while a commit fest
is in progress, I'd better just stop trying to contribute to PostgreSQL.

>> So, do you want to see a patch implementing that to be able to provide
>> information about DROP commands targeting more than one object and same
>> information in other cases (object id, name, schema, operation name,
>> object type)? That could be the next patch of the series, by Thursday.
>
> I think this is approximately correct as a next logical step, but are
> we really clear on what the design for that looks like? I thought you
> were saying upthread that you wanted to make this information
> available via some kind of pseudo-table for 9.4. In any event, adding
> a new event type (sql_drop) is a separate patch, and adding
> information to the existing types of event triggers is a separate
> patch, so you should try to do one of those two, not both.

That's exactly what I've been saying too from the beginnings, argueing
that the first step should be providing enough information to user
functions called by event triggers.

If I'm to provide a solution for the multiple object drops without any
specific information associated with firing a trigger, I don't much see
the point of the exercise.

So in my view, either we have specific information and a proper DROP
CASCADE support, or only specific information without support for
multiple DROP targets, or both of them.

Support for CASCADE with no information about the objects being DROPed?
Why would we want that?

> Doing some of it one way and some of it the other way seems kind of
> grotty, though.

It's only following existing design:

http://www.postgresql.org/docs/9.2/static/plpgsql-control-structures.html#PLPGSQL-ERROR-TRAPPING

39.6.6.1. Obtaining information about an error

There are two ways to get information about the current exception in
PL/pgSQL: special variables and the GET STACKED DIAGNOSTICS command.

And I think that having special variables for the most common cases (ID,
name, schema) and a more complex API (later) for more complex cases only
makes sense. And in 9.3, for the most complex cases, you will need to
code your Event Trigger in C and walk the parse tree.

What's wrong with that, in details?

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


From: Christopher Browne <cbbrowne(at)gmail(dot)com>
To: Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Event Triggers: adding information
Date: 2013-01-28 23:07:32
Message-ID: CAFNqd5U7fGviRXssSE6TPHio6NUvaZqq5zu3E598f1KwTFHJXA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I'm poking at event triggers a bit; would like to set up some examples
(and see if they
work, or break down badly; both are interesting results) to do some
validation of schema
for Slony.

What I'm basically thinking about is to set up some event triggers that run on
DROP TABLE / DROP SEQUENCE, and see about cleaning up the replication
side of things (e.g. - inject a request to drop the table/sequence
from replication).

I have a bit of a complaint as to what documentation is included; I don't see
any references in the documentation to ddl_command_start / ddl_command_end,
which seem to be necessary values for event triggers.

I'd tend to think that there should be a new subsection in the "man page" for
CREATE TRIGGER that includes at least two fully formed examples of event
triggers, involving the two events in question. Is change of that
sort in progress?
--
When confronted by a difficult problem, solve it by reducing it to the
question, "How would the Lone Ranger handle this?"


From: Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>
To: Christopher Browne <cbbrowne(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Event Triggers: adding information
Date: 2013-01-28 23:19:07
Message-ID: m2txq0aob8.fsf@2ndQuadrant.fr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Christopher Browne <cbbrowne(at)gmail(dot)com> writes:
> I'm poking at event triggers a bit; would like to set up some examples
> (and see if they
> work, or break down badly; both are interesting results) to do some
> validation of schema
> for Slony.

Cool, thanks!

> What I'm basically thinking about is to set up some event triggers that run on
> DROP TABLE / DROP SEQUENCE, and see about cleaning up the replication
> side of things (e.g. - inject a request to drop the table/sequence
> from replication).

Sure. In what got commited from the current patch series, you will only
know that a DROP TABLE (or DROP SEQUENCE) occured, and we're trying to
get to an agreement with Robert if we should prefer to add visibility to
such events that occurs in a CASCADE statement or rather add the OID
(and maybe the name) of the Object that's going to be dropped.

Your opinion is worth a lot on that matter, if you have one to share! :)

> I have a bit of a complaint as to what documentation is included; I don't see
> any references in the documentation to ddl_command_start / ddl_command_end,
> which seem to be necessary values for event triggers.

What we have now here:

http://www.postgresql.org/docs/devel/static/event-trigger-matrix.html
http://www.postgresql.org/docs/devel/static/sql-createeventtrigger.html
http://www.postgresql.org/docs/devel/static/plpgsql-trigger.html#PLPGSQL-EVENT-TRIGGER

Is it not visible enough, or really missing the point?

> I'd tend to think that there should be a new subsection in the "man page" for
> CREATE TRIGGER that includes at least two fully formed examples of event
> triggers, involving the two events in question. Is change of that
> sort in progress?

The event triggers are addressed in a whole new chapter of the docs,
maybe that's why you didn't find the docs?

--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


From: Christopher Browne <cbbrowne(at)gmail(dot)com>
To: Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Event Triggers: adding information
Date: 2013-01-29 17:04:11
Message-ID: CAFNqd5UwRZ8kctJV=WQ0VfzaMux4tHf_0cyjWQKO=pjfJ68K8w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Jan 28, 2013 at 6:19 PM, Dimitri Fontaine
<dimitri(at)2ndquadrant(dot)fr> wrote:
> Christopher Browne <cbbrowne(at)gmail(dot)com> writes:
>> I'm poking at event triggers a bit; would like to set up some examples
>> (and see if they
>> work, or break down badly; both are interesting results) to do some
>> validation of schema
>> for Slony.
>
> Cool, thanks!
>
>> What I'm basically thinking about is to set up some event triggers that run on
>> DROP TABLE / DROP SEQUENCE, and see about cleaning up the replication
>> side of things (e.g. - inject a request to drop the table/sequence
>> from replication).
>
> Sure. In what got commited from the current patch series, you will only
> know that a DROP TABLE (or DROP SEQUENCE) occured, and we're trying to
> get to an agreement with Robert if we should prefer to add visibility to
> such events that occurs in a CASCADE statement or rather add the OID
> (and maybe the name) of the Object that's going to be dropped.
>
> Your opinion is worth a lot on that matter, if you have one to share! :)

Hmm. I think some information about the object is pretty needful.

For the immediate case I'm poking at, namely looking for dropped tables,I
could determine that which object is gone by inference; if I run the trigger
as part of the ddl_command_end event, then I could run a query that
searches the slony table sl_table, and if I find any tables for which there
is no longer a corresponding table in pg_catalog.pg_class, then I infer
which table got dropped.

But I think I'd really rather know more explicitly which table is being dropped.

Having the oid available in some trigger variable should suffice.

It appears to me as though it's relevant to return an OID for all of the command
tags.

Something useful to clarify in the documentation is what differences are
meaningful between ddl_command_start and ddl_command_end to make
it easier to determine which event one would most want to use.

Musing a bit... It seems to me that it might be a slick idea to run a
trigger at
both _start and _end, capturing metadata about the object into temp tables
at both times, which would then allow the _end function to compare the data
in the temp table to figure out what to do next. I wouldn't think
that's apropos
as default behaviour; that's something for the crafty developer that's building
a trigger function to do.

Having a parse tree for the query that initiates the event would be
mighty useful,
as would be a canonicalized form of the query.

I think we could add some useful "protection" (e.g. - such as my example of
an event trigger that generates "DROP TABLE FROM REPLICATION") using
the present functionality, even perhaps without OIDs, but I don't
think I'd want
to get into trying to forward arbitrary DDL without having the canonicalized
query available.

>> I have a bit of a complaint as to what documentation is included; I don't see
>> any references in the documentation to ddl_command_start / ddl_command_end,
>> which seem to be necessary values for event triggers.
>
> What we have now here:
>
> http://www.postgresql.org/docs/devel/static/event-trigger-matrix.html
> http://www.postgresql.org/docs/devel/static/sql-createeventtrigger.html
> http://www.postgresql.org/docs/devel/static/plpgsql-trigger.html#PLPGSQL-EVENT-TRIGGER
>
> Is it not visible enough, or really missing the point?

Ah, I missed the second one; I was looking under CREATE TRIGGER,
didn't notice that
CREATE EVENT TRIGGER was separately available; that resolves most of
what I thought
was missing.

I think a bit more needs to be said about the meanings of the events
and the command tags,
but what I imagined missing wasn't.

>> I'd tend to think that there should be a new subsection in the "man page" for
>> CREATE TRIGGER that includes at least two fully formed examples of event
>> triggers, involving the two events in question. Is change of that
>> sort in progress?
>
> The event triggers are addressed in a whole new chapter of the docs,
> maybe that's why you didn't find the docs?

I found that chapter, just not the command :-).
--
When confronted by a difficult problem, solve it by reducing it to the
question, "How would the Lone Ranger handle this?"


From: Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>
To: Christopher Browne <cbbrowne(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Event Triggers: adding information
Date: 2013-01-29 22:51:57
Message-ID: m2pq0nzjoy.fsf@2ndQuadrant.fr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Christopher Browne <cbbrowne(at)gmail(dot)com> writes:
> Hmm. I think some information about the object is pretty needful.
>
> For the immediate case I'm poking at, namely looking for dropped tables,I
> could determine that which object is gone by inference; if I run the trigger
> as part of the ddl_command_end event, then I could run a query that
> searches the slony table sl_table, and if I find any tables for which there
> is no longer a corresponding table in pg_catalog.pg_class, then I infer
> which table got dropped.
>
> But I think I'd really rather know more explicitly which table is being dropped.
>
> Having the oid available in some trigger variable should suffice.

Ack. I think I'm going to send a patch tomorrow with support for
exposing the OID of the object (when at the time of firing the event
trigger it still exists) and support for DROP CASCADE objects thanks to
a SRF that you will be able to call from your event trigger function.

> It appears to me as though it's relevant to return an OID for all of the command
> tags.

Yeah, I've been working on that before and the refactoring in utility.c
is already commited by Robert, so that's definitely in-scope.

> Something useful to clarify in the documentation is what differences are
> meaningful between ddl_command_start and ddl_command_end to make
> it easier to determine which event one would most want to use.

Will add.

> Musing a bit... It seems to me that it might be a slick idea to run a
> trigger at
> both _start and _end, capturing metadata about the object into temp tables
> at both times, which would then allow the _end function to compare the data
> in the temp table to figure out what to do next. I wouldn't think
> that's apropos
> as default behaviour; that's something for the crafty developer that's building
> a trigger function to do.

Please remember that at the _end of a DROP the object no longer exists
in the catalogs, so you will get a NULL for its OID, and same at _start
of a CREATE command. For implementation reasons, same as CREATE for the
ALTER case.

> Having a parse tree for the query that initiates the event would be
> mighty useful,
> as would be a canonicalized form of the query.
>
> I think we could add some useful "protection" (e.g. - such as my example of
> an event trigger that generates "DROP TABLE FROM REPLICATION") using
> the present functionality, even perhaps without OIDs, but I don't
> think I'd want
> to get into trying to forward arbitrary DDL without having the canonicalized
> query available.

Thanks. As soon as the OID+CASCADE patch is done, I'm back on working on
Command String Normalisation.

> I think a bit more needs to be said about the meanings of the events
> and the command tags,
> but what I imagined missing wasn't.

Definitely, another pass on the docs will be needed here. I'd like to be
able to postpone that to beta times, as I'm arranging my schedule so as
to be available and participating then.

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support