Re: Deparsing DDL command strings

Lists: pgsql-hackers
From: Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Deparsing DDL command strings
Date: 2012-10-05 12:48:32
Message-ID: m2zk41m6e7.fsf@2ndQuadrant.fr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

Working on the Event Trigger next patch series, one of the challenge to
address is deparsing the DDL commands so that the User Defined Function
used by the trigger definition has that information.

I'm making good progress on that, it's some amount of code but pretty
straightforward. The only road blocker for now is best summarized by the
following comment in src/backend/commands/tablecmds.c

* Now add any newly specified column default values and CHECK constraints
* to the new relation. These are passed to us in the form of raw
* parsetrees; we need to transform them to executable expression trees
* before they can be added. The most convenient way to do that is to
* apply the parser's transformExpr routine, but transformExpr doesn't
* work unless we have a pre-existing relation. So, the transformation has
* to be postponed to this final step of CREATE TABLE.

So I have a Node *parsetree containing some CHECK and DEFAULT raw
expressions to work with. Those can reference non existing tables,
either to-be-created or already-dropped.

Should I work on transformExpr() so that it knows how to work with a non
existing relation, or should I write a new transformRawExpr() that knows
how to handle this case?

Or do we want to limit the deparsing feature and not accept some CHECK
and DEFAULT expressions (though not being able to cope with T_A_Const is
a bummer)? (I don't mean to do it, I still have to mention the choice).

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Deparsing DDL command strings
Date: 2012-10-05 14:03:03
Message-ID: 8117.1349445783@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr> writes:
> So I have a Node *parsetree containing some CHECK and DEFAULT raw
> expressions to work with. Those can reference non existing tables,
> either to-be-created or already-dropped.

Why don't you just pass the original query string, instead of writing
a mass of maintenance-requiring new code to reproduce it?

This would require (1) making sure the query string is still available
where needed. I think we are 99% of the way there but maybe not 100%.
(2) being able to identify the substring corresponding to the current
command, when we're processing a multi-command string. The parser could
easily provide that, I think --- we've just never insisted that it do
so before.

regards, tom lane


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>
Subject: Re: Deparsing DDL command strings
Date: 2012-10-05 14:13:47
Message-ID: 201210051613.48648.andres@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Friday, October 05, 2012 04:03:03 PM Tom Lane wrote:
> Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr> writes:
> > So I have a Node *parsetree containing some CHECK and DEFAULT raw
> > expressions to work with. Those can reference non existing tables,
> > either to-be-created or already-dropped.
>
> Why don't you just pass the original query string, instead of writing
> a mass of maintenance-requiring new code to reproduce it?
Its not easy to know which tables are referenced in e.g. an ALTER TABLE
statement if the original statement didn't schema qualify everything.

Its also not really possible to trigger cascading effects like the creating of
a sequence from a serial column that way...

Greetings,

Andres
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>
Subject: Re: Deparsing DDL command strings
Date: 2012-10-05 14:24:55
Message-ID: 8552.1349447095@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andres Freund <andres(at)2ndquadrant(dot)com> writes:
> On Friday, October 05, 2012 04:03:03 PM Tom Lane wrote:
>> Why don't you just pass the original query string, instead of writing
>> a mass of maintenance-requiring new code to reproduce it?

> Its not easy to know which tables are referenced in e.g. an ALTER TABLE
> statement if the original statement didn't schema qualify everything.

What he's talking about is deparsing the raw grammar output, which by
definition contains no more information than is in the query string.
Deparsing post-parse-analysis trees is a different problem (for which
code already exists, unlike the raw-tree case).

regards, tom lane


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>
Subject: Re: Deparsing DDL command strings
Date: 2012-10-05 15:28:27
Message-ID: 201210051728.28251.andres@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Friday, October 05, 2012 04:24:55 PM Tom Lane wrote:
> Andres Freund <andres(at)2ndquadrant(dot)com> writes:
> > On Friday, October 05, 2012 04:03:03 PM Tom Lane wrote:
> >> Why don't you just pass the original query string, instead of writing
> >> a mass of maintenance-requiring new code to reproduce it?
> >
> > Its not easy to know which tables are referenced in e.g. an ALTER TABLE
> > statement if the original statement didn't schema qualify everything.
>
> What he's talking about is deparsing the raw grammar output, which by
> definition contains no more information than is in the query string.
> Deparsing post-parse-analysis trees is a different problem (for which
> code already exists, unlike the raw-tree case).

Sure. I am not saying Dimitri's approach is perfect. I am just listing some of
reasons why I think just using the raw input string isn't sufficient...

Andres
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Deparsing DDL command strings
Date: 2012-10-05 15:54:46
Message-ID: m2k3v4lxrt.fsf@2ndQuadrant.fr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
> Why don't you just pass the original query string, instead of writing
> a mass of maintenance-requiring new code to reproduce it?

Do we have that original query string in all cases, including EXECUTE
like spi calls from any PL? What about commands that internally set a
parsetree to feed ProcessUtility() directly? Do we want to refactor them
all just now as a prerequisite?

Also, we need to normalize that command string. Tools needing to look at
it won't want to depend on random white spacing and other oddities.
Those tools could also use the Node *parsetree and be written only in C,
but then what about giving them a head start by having a parsetree
walker in our code base?

Then we want to qualify object names. Some type names have already been
taken care of apparently by the parser here, relation names not yet and
we need to cope with non existing relation names.

My freshly grown limited understanding is that we currently only know
how to produce a "cooked" parse tree from the raw one if all referenced
objects do exist in the catalogs, so that we will postpone some
"cooking" (transform*) until the main object in a CREATE command are
defined, right?

Is that something we want to revisit?

Another option would be to capture search_path and other parse time
impacting GUCs, call that the query environment, and have a way to
serialize and pass in the environment and restore it either on the same
host or on another (replication is an important use case here).

Yet another option would be to output both the original query string and
something that's meant for easy machine parsing yet is not the internal
representation of the query, so that we're free to hack the parser at
will in between releases, even minor. Building that new code friendly
document will require about the same amount of code as spitting out
normalized SQL, I believe.

Yet another option would be to go the "sax" way rather than the "dom"
one: instead of spitting out a new command string have the user register
callbacks and only implement walking down the parsetree and calling
those. I'm not sure how much maintenance work we would save here, and
I'm not seeing another reason why going that way.

Yet another option would be to only provide for a hook and some space in
the EventTriggerData structure for extensions to register themselves and
provide whatever deparsing they need. But then we need to figure out a
way for the user defined function to use the resulting opaque data, from
any PL language, if only to be able to call some extension's API to
process it. Looks like a very messy way to punt the work outside of
core.

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Deparsing DDL command strings
Date: 2012-10-05 16:15:01
Message-ID: 10767.1349453701@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr> writes:
> Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
>> Why don't you just pass the original query string, instead of writing
>> a mass of maintenance-requiring new code to reproduce it?

> Do we have that original query string in all cases, including EXECUTE
> like spi calls from any PL?

As I said, I believe we are pretty close to that if not there already.
There are a number of utility statements that *require* the source
string to be provided, because they do parse analysis and since 8.4 or
so the original string has been required for that. So we certainly have
the string available at ProcessUtility. I've not looked at the event
trigger patch at all, so I don't know what amount of refactoring is
going to be required below that ... but the point I'm trying to make is
that it would be a one-and-done task. Adding a requirement to be able
to decompile raw parse trees will be a permanent drag on every type of
SQL feature addition.

> Also, we need to normalize that command string. Tools needing to look at
> it won't want to depend on random white spacing and other oddities.

Instead, they'll get to depend on the oddities of parse transformations
(ie, what's done in the raw grammar vs. what's done in parse_analyze),
which is something we whack around regularly. Besides which, you've
merely asserted this requirement without providing any evidence that
it's important at all, let alone important enough to justify the kind of
maintenance burden that you want to saddle us with.

> Those tools could also use the Node *parsetree and be written only in C,
> but then what about giving them a head start by having a parsetree
> walker in our code base?

Well, as far as a raw parsetree *walker* goes, there already is one in
nodeFuncs.c. It does not follow that we need something that tries to
reconstruct SQL from that. It's not clear to me that there is any
production-grade use-case for which reconstructed SQL is more useful
than examining the parse tree. Now, if you're talking about half-baked
code that gets only some cases right, then yeah grepping reconstructed
SQL might serve. But I'm not excited about doing a lot of work to
support partial solutions.

> Then we want to qualify object names. Some type names have already been
> taken care of apparently by the parser here, relation names not yet and
> we need to cope with non existing relation names.

Which is exactly what you *won't* be able to do anything about when
looking at a raw parse tree. It's just a different presentation of the
query string.

> My freshly grown limited understanding is that we currently only know
> how to produce a "cooked" parse tree from the raw one if all referenced
> objects do exist in the catalogs,

Well, yeah. Anything else is magic not code.

regards, tom lane


From: Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Deparsing DDL command strings
Date: 2012-10-08 13:30:05
Message-ID: m2lifhgkgy.fsf@2ndQuadrant.fr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
> going to be required below that ... but the point I'm trying to make is
> that it would be a one-and-done task. Adding a requirement to be able
> to decompile raw parse trees will be a permanent drag on every type of
> SQL feature addition.

I'll show some examples of very involved command (CREATE and ALTER TABLE
are the most complex we have I think) and some very simple commands
(DROP TABLE is one of the simplest), so that we can make up our minds on
that angle.

>> Then we want to qualify object names. Some type names have already been
>> taken care of apparently by the parser here, relation names not yet and
>> we need to cope with non existing relation names.
>
> Which is exactly what you *won't* be able to do anything about when
> looking at a raw parse tree. It's just a different presentation of the
> query string.

So, I'm currently adding the deparsing to the existing only event we
have, which is ddl_command_start. That's maybe not the best place where
to do it, I even now wonder if we can do it there at all.

Doing the same thing at ddl_command_end would allow us have all the
information we need and leave nothing to magic guesses: full schema
qualification of all objects involved, main object(s) OIDs available,
all the jazz.

> Well, yeah. Anything else is magic not code.

Well, prepending an object name with the first entry of the current
search_path as its schema is not that far a stretch when the object is
being created, as far as I see it. It's more reasonable to document that
the rewritten no-ambiguities command string is only available for
ddl_command_end events, though.

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


From: Jim Nasby <jim(at)nasby(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Deparsing DDL command strings
Date: 2012-10-08 23:39:56
Message-ID: 5073644C.6060208@nasby.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 10/5/12 11:15 AM, Tom Lane wrote:
>> Also, we need to normalize that command string. Tools needing to look at
>> >it won't want to depend on random white spacing and other oddities.
> Instead, they'll get to depend on the oddities of parse transformations
> (ie, what's done in the raw grammar vs. what's done in parse_analyze),
> which is something we whack around regularly. Besides which, you've
> merely asserted this requirement without providing any evidence that
> it's important at all, let alone important enough to justify the kind of
> maintenance burden that you want to saddle us with.

I definitely want to be able to parse DDL commands to be able to either enforce things or to drive other parts of the system based on what's changing. Without the ability to capture (and parse) DDL commands I'm stuck creating wrapper functions around anything I want to capture and then trying to ensure that everyone uses the wrappers and not the raw DDL commands.

Event triggers that just spit out raw SQL give me the first part of this, but not the second part: I'm still stuck trying to parse raw SQL on my own. Having normalized SQL to parse should make that a bit easier, but ideally I'd like to be able to pull specific elements out of a command. I'd want to be able to do things like:

IF command is ALTER TABLE THEN
FOR EACH subcommand
IF subcommand IS DROP COLUMN THEN
do something that needs to know what column is being dropped
ELSE IF subcommand IS ADD COLUMN THEN
do something that needs to know the definition of the column being added

I don't think every bit of that has to be dealt with by the event trigger code itself. For example, if you're adding a column to a table and the entries have already been made in the catalog, you could query to get the details of the column definition if you were given an OID into pg_attributes.

Having said all that, an event system that spits back the raw SQL would certainly be better than nothing. But realize that people would still need to do parsing on it (ie: replication solutions will need to know what table just got ALTER'd).
--
Jim C. Nasby, Database Architect jim(at)nasby(dot)net
512.569.9461 (cell) http://jim.nasby.net


From: Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>
To: Jim Nasby <jim(at)nasby(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Deparsing DDL command strings
Date: 2012-10-09 07:57:07
Message-ID: m28vbgcc30.fsf@2ndQuadrant.fr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Jim Nasby <jim(at)nasby(dot)net> writes:
> I definitely want to be able to parse DDL commands to be able to
> either enforce things or to drive other parts of the system based on
> what's changing. Without the ability to capture (and parse) DDL
> commands I'm stuck creating wrapper functions around anything I want
> to capture and then trying to ensure that everyone uses the wrappers
> and not the raw DDL commands.

Are you mainly working on some Auditing system?

> Event triggers that just spit out raw SQL give me the first part of
> this, but not the second part: I'm still stuck trying to parse raw SQL
> on my own. Having normalized SQL to parse should make that a bit
> easier, but ideally I'd like to be able to pull specific elements out
> of a command. I'd want to be able to do things like:

The current design for event triggers is to spit out several things:

- command tag is already commited
- object id, can be null
- schema name, can be null
- object name
- operation either ALTER, CREATE or DROP, …
- object type TABLE, VIEW, FUNCTION, …
- normalized command string

After some more thinking, it appears that in several case you want to
have all those information filled in and you don't want to care if that
means your trigger needs to run at ddl_command_start or ddl_command_end.

The proposal I want to make here is to introduce a generic event (or an
event alias) named ddl_command_trace that the system provides at the
right spot where you have the information. That's useful when you don't
intend to divert the execution of the DDL and need to know all about it.

For a DROP operation, ddl_command_trace would be ddl_command_start, and
for a CREATE operation, that would be ddl_command_end, so that the
target object (still|already) exists when the trigger is fired.

> IF command is ALTER TABLE THEN

That's called TG_TAG, see

http://www.postgresql.org/docs/devel/static/plpgsql-trigger.html#PLPGSQL-EVENT-TRIGGER

> FOR EACH subcommand
> IF subcommand IS DROP COLUMN THEN
> do something that needs to know what column is being dropped
> ELSE IF subcommand IS ADD COLUMN THEN
> do something that needs to know the definition of the column being added

We decided not to publish any notion of subcommand at this stage.

> I don't think every bit of that has to be dealt with by the event
> trigger code itself. For example, if you're adding a column to a table
> and the entries have already been made in the catalog, you could query
> to get the details of the column definition if you were given an OID
> into pg_attributes.

It's easy enough to provide the OID of the newly created main command
target object, it's harder to provide in a generic way all the OID of
the objects you might be interested into, because each command has its
own set of such.

DROP can target multiple objects, they all are the main target. ALTER
target only a single object, but can link to dependent objects. CREATE
an operator class or a cast and you're talking about a bunch of
operators and functions to tie together. It's not that easy.

> Having said all that, an event system that spits back the raw SQL
> would certainly be better than nothing. But realize that people would
> still need to do parsing on it (ie: replication solutions will need to
> know what table just got ALTER'd).

You would have most of what you're asking. I think that looking into the
normalized string to get the information you need when you already know
you're looking at an ALTER TABLE statement and you already have the
object references (schema, name, oid) is going to make things easier.

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


From: Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Deparsing DDL command strings
Date: 2012-10-12 16:55:49
Message-ID: m2ipafy6i2.fsf@2ndQuadrant.fr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr> writes:
> I'll show some examples of very involved command (CREATE and ALTER TABLE
> are the most complex we have I think) and some very simple commands
> (DROP TABLE is one of the simplest), so that we can make up our minds on
> that angle.

So please find attached a demo patch to show up what it takes to deparse
complex command strings, and here's inline some example of why that's a
good idea to actually deparse them rather than hand out whatever the
user typed in:

\dy
List of event triggers
-[ RECORD 1 ]--------------------------
Name | regress_event_trigger_trace
Event | ddl_command_trace
Owner | dim
Enabled | enabled
Procedure | test_event_trigger
Tags |

foo=# drop table foo;
NOTICE: test_event_trigger: ddl_command_start DROP TABLE
NOTICE: test_event_trigger: DROP, TABLE
NOTICE: test_event_trigger: DROP TABLE public.foo RESTRICT;
DROP TABLE

foo=# create table foo(id serial primary key,
f2 text default 'plop' check (f2 != ''));
NOTICE: test_event_trigger: ddl_command_end CREATE TABLE
NOTICE: test_event_trigger: CREATE, TABLE
NOTICE: test_event_trigger: CREATE TABLE public.foo (id integer PRIMARY KEY DEFAULT nextval('foo_id_seq'::regclass) NOT NULL, f2 text DEFAULT 'plop' CHECK ((f2 <> ''::text)), CHECK ((f2 <> ''::text)));
CREATE TABLE

The user of that command string still has to know what to look for and
maybe should include a proper SQL parser, but at least it doesn't need
to do much guesswork about how the serial attached sequence will get
named by the system and such oddities.

The attached patch also includes support for the complete ALTER TABLE
command and some more (CREATE SEQUENCE, CREATE EXTENSION).

> Doing the same thing at ddl_command_end would allow us have all the
> information we need and leave nothing to magic guesses: full schema
> qualification of all objects involved, main object(s) OIDs available,
> all the jazz.

That's what is happening now in the attached patch, also with a new
event called 'ddl_command_trace' which will either map to _start or _end
depending on the operation (we want _start when doing DROP TABLE, we
want the operation to be complete before tracing it when talking about a
CREATE or an ALTER table).

And here's the scope we're talking about, including new command types,
new information passed down to user triggers, and the rewrite support
itself, isolated away:

git diff --stat postgres/master..
src/backend/catalog/heap.c | 5 +-
src/backend/commands/event_trigger.c | 241 ++++-
src/backend/tcop/utility.c | 187 ++--
src/backend/utils/adt/Makefile | 2 +-
src/backend/utils/adt/ddl_rewrite.c | 1415 +++++++++++++++++++++++++++
src/backend/utils/adt/ruleutils.c | 9 +-
src/backend/utils/cache/evtcache.c | 4 +
src/include/catalog/heap.h | 4 +
src/include/commands/event_trigger.h | 43 +-
src/include/utils/builtins.h | 14 +
src/include/utils/evtcache.h | 4 +-
src/pl/plpgsql/src/pl_comp.c | 40 +
src/pl/plpgsql/src/pl_exec.c | 53 +-
src/pl/plpgsql/src/plpgsql.h | 5 +
src/test/regress/expected/event_trigger.out | 40 +-
src/test/regress/sql/event_trigger.sql | 36 +-
16 files changed, 1938 insertions(+), 164 deletions(-)

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

Attachment Content-Type Size
ddl_rewrite.1.patch.gz application/octet-stream 16.2 KB

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Deparsing DDL command strings
Date: 2012-10-15 14:10:29
Message-ID: CA+TgmoapZG3=q2=bBLN_D=9YeC9m_pEPPeXacAQZzXn5H9L1BA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Oct 12, 2012 at 12:55 PM, Dimitri Fontaine
<dimitri(at)2ndquadrant(dot)fr> wrote:
> The user of that command string still has to know what to look for and
> maybe should include a proper SQL parser, but at least it doesn't need
> to do much guesswork about how the serial attached sequence will get
> named by the system and such oddities.

IMHO, it should be our explicit goal for clients not to need to parse
any SQL at all. I think that the communication between the server and
event triggers should be accomplished using magic variables. If the
data is too complex to be structured that way, then I think we should
use an established serialization format such as XML or (my personal
preference) JSON for which many parsers already exist. If we pass
sanitized SQL, then everyone who wants to do anything complicated will
need a parser for
sanitized-SQL-as-generated-by-the-PostgreSQL-event-trigger-mechanism,
and I think that's likely to be strictly worse than using one of the
serialization methods that already exists and is well understood.

Consider a hypothetical new feature where a table can be located in
either Paris or Pittsburgh. We add two new keywords to the grammer:
PARIS and PITTSBURGH. The syntax is extended to CREATE { PARIS |
PITTSBURGH } {schemaname}.{tablename} etc. Now, if we use
sanitized-SQL as a way of passing data to triggers, they all
immediately break, because the new key word is in exactly the location
where the table name used to be. If we use magic variables or JSON or
XML, we'll just add in another magic variable, or another field in the
JSON or XML object, and well-written triggers will ignore it and keep
working. Now, it may seem like I've chosen this example somewhat
unfairly since most syntax changes are a bit less obtrusive than that,
but we did do something not dissimilar to the above in 9.1, with
UNLOGGED.

There's a careful line to be walked here, because in order for event
triggers to be useful, we're going to have to pass them information -
and the way we do that becomes part of the API, and might get broken
by future changes. That sucks, but there's nothing we can do to
completely prevent it except not have event triggers at all, and I
want event triggers, and so do many other people. What we can and I
think must do is minimize the difficulty of writing and maintaining
event triggers, and that means making the API as clean as possible.
Like Tom, I'm very skeptical that this is the right horse to bet on.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Deparsing DDL command strings
Date: 2012-10-15 14:41:57
Message-ID: m2391fu79m.fsf@2ndQuadrant.fr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> IMHO, it should be our explicit goal for clients not to need to parse
> any SQL at all. I think that the communication between the server and
> event triggers should be accomplished using magic variables. If the

+1 on that. There's a but.

> data is too complex to be structured that way, then I think we should
> use an established serialization format such as XML or (my personal
> preference) JSON for which many parsers already exist. If we pass
> sanitized SQL, then everyone who wants to do anything complicated will
> need a parser for
> sanitized-SQL-as-generated-by-the-PostgreSQL-event-trigger-mechanism,
> and I think that's likely to be strictly worse than using one of the
> serialization methods that already exists and is well understood.

That's not an easy task by any means, we're talking about inventing a
stable alternative to the parse tree format (nodeToString) so that we
can whack the parsenodes as much as we need and still produce something
compatible for our users. Well at least we've already heard about more
than one use case, e.g. exposing the parser for syntax highlighting…

Now, if all you want to do is replay the exact same DDL on another
PostgreSQL instance, and if you happen to trust the master's server,
then the command string I'm currently spitting out is exactly what you
need. And having the more generalized representation of the parser data
would only mean that the trigger now has to rewrite the command string
itself.

> There's a careful line to be walked here, because in order for event
> triggers to be useful, we're going to have to pass them information -
> and the way we do that becomes part of the API, and might get broken
> by future changes. That sucks, but there's nothing we can do to

+1 here, again, we're on the same line, just having different use cases
in mind it seems.

> completely prevent it except not have event triggers at all, and I
> want event triggers, and so do many other people. What we can and I
> think must do is minimize the difficulty of writing and maintaining
> event triggers, and that means making the API as clean as possible.
> Like Tom, I'm very skeptical that this is the right horse to bet on.

I would thing that having one doesn't preclude having the other, and
while I hear Tom and you saying that it's a lot of maintenance work down
the road to have the command string, I'm highly skeptical of being able
to produce that external stable (enough) parser format in a way that
impose less code maintenance down the road.

Also, I'm thinking that publishing the normalized command string is
something that must be maintained in the core code, whereas the external
stable format might be done as a contrib extension, coded in C, working
with the Node *parsetree. Because it needs to ouput a compatible format
when applied to different major versions of PostgreSQL, I think it suits
quite well the model of C coded extensions.

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


From: Jim Nasby <jim(at)nasby(dot)net>
To: Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Deparsing DDL command strings
Date: 2012-10-29 20:10:31
Message-ID: 508EE2B7.4000804@nasby.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 10/9/12 2:57 AM, Dimitri Fontaine wrote:
> Jim Nasby <jim(at)nasby(dot)net> writes:
>> I definitely want to be able to parse DDL commands to be able to
>> either enforce things or to drive other parts of the system based on
>> what's changing. Without the ability to capture (and parse) DDL
>> commands I'm stuck creating wrapper functions around anything I want
>> to capture and then trying to ensure that everyone uses the wrappers
>> and not the raw DDL commands.
>
> Are you mainly working on some Auditing system?

Definitely not.

We need to deal with questions like "If we rename a table, what do we have to do in londiste to accommodate that?" Except we're dealing with more than just londiste. We have internal code that does things like track "seed tables", which are tables that we need to dump data for when we dump schema. We have other systems that care about the schema that's defined in a database, and changes that happen to that schema.

>> Event triggers that just spit out raw SQL give me the first part of
>> this, but not the second part: I'm still stuck trying to parse raw SQL
>> on my own. Having normalized SQL to parse should make that a bit
>> easier, but ideally I'd like to be able to pull specific elements out
>> of a command. I'd want to be able to do things like:
>
> The current design for event triggers is to spit out several things:
>
> - command tag is already commited
> - object id, can be null
> - schema name, can be null
> - object name
> - operation either ALTER, CREATE or DROP, …
> - object type TABLE, VIEW, FUNCTION, …
> - normalized command string
>
> After some more thinking, it appears that in several case you want to
> have all those information filled in and you don't want to care if that
> means your trigger needs to run at ddl_command_start or ddl_command_end.
>
> The proposal I want to make here is to introduce a generic event (or an
> event alias) named ddl_command_trace that the system provides at the
> right spot where you have the information. That's useful when you don't
> intend to divert the execution of the DDL and need to know all about it.
>
> For a DROP operation, ddl_command_trace would be ddl_command_start, and
> for a CREATE operation, that would be ddl_command_end, so that the
> target object (still|already) exists when the trigger is fired.

In some cases we may need to divert or reject DDL, but that's a secondary concern.

>> Having said all that, an event system that spits back the raw SQL
>> would certainly be better than nothing. But realize that people would
>> still need to do parsing on it (ie: replication solutions will need to
>> know what table just got ALTER'd).
>
> You would have most of what you're asking. I think that looking into the
> normalized string to get the information you need when you already know
> you're looking at an ALTER TABLE statement and you already have the
> object references (schema, name, oid) is going to make things easier.

Possibly. We certainly have cases where we need to know what's happening *inside* the DDL.
--
Jim C. Nasby, Database Architect jim(at)nasby(dot)net
512.569.9461 (cell) http://jim.nasby.net


From: Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>
To: Jim Nasby <jim(at)nasby(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Deparsing DDL command strings
Date: 2012-10-29 21:30:40
Message-ID: m2d301yni7.fsf@2ndQuadrant.fr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Jim Nasby <jim(at)nasby(dot)net> writes:
> We need to deal with questions like "If we rename a table, what do we have
> to do in londiste to accommodate that?" Except we're dealing with more than
> just londiste. We have internal code that does things like track "seed
> tables", which are tables that we need to dump data for when we dump schema.
> We have other systems that care about the schema that's defined in a
> database, and changes that happen to that schema.

Interesting use case, I hope we're offering you good progress.

>> - command tag is already commited
>> - object id, can be null
>> - schema name, can be null
>> - object name
>> - operation either ALTER, CREATE or DROP, …
>> - object type TABLE, VIEW, FUNCTION, …
>> - normalized command string
>>
>> The proposal I want to make here is to introduce a generic event (or an
>> event alias) named ddl_command_trace that the system provides at the
>> right spot where you have the information. That's useful when you don't
>> intend to divert the execution of the DDL and need to know all about it.
>
> In some cases we may need to divert or reject DDL, but that's a
> secondary concern.

Reject is already in, just RAISE ERROR from the trigger code. Divert is
another sell entirely, we currently miss that capability. I'm interested
into it for some DDLs. Which commands do you want to divert, and at
exactly what point in their execution? (think about privilege checks,
lock aquisition, etc).

>> You would have most of what you're asking. I think that looking into the
>> normalized string to get the information you need when you already know
>> you're looking at an ALTER TABLE statement and you already have the
>> object references (schema, name, oid) is going to make things easier.
>
> Possibly. We certainly have cases where we need to know what's
> happening *inside* the DDL.

In that cases you would probably need to resort to coding the trigger in
C so that you can abuse the parsetree. At least the fact that you're
doing funny things with some commands is easy to get at when doing \dy
from a psql prompt, an information that's completely lost when using the
standard process utility hook directly.

I don't see a way to pass down the parse tree in a format easy to use in
PLpgSQL anyway, but maybe we'll get there at some point. I want to say
that having to resort to C in some complex cases is good enough for a
first version of the feature.

Regards,
--
Dimitri Fontaine 06 63 07 10 78
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


From: Jim Nasby <jim(at)nasby(dot)net>
To: Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Deparsing DDL command strings
Date: 2012-10-30 00:54:40
Message-ID: 508F2550.4030807@nasby.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 10/29/12 4:30 PM, Dimitri Fontaine wrote:
>> In some cases we may need to divert or reject DDL, but that's a
>> >secondary concern.
> Reject is already in, just RAISE ERROR from the trigger code. Divert is
> another sell entirely, we currently miss that capability. I'm interested
> into it for some DDLs. Which commands do you want to divert, and at
> exactly what point in their execution? (think about privilege checks,
> lock aquisition, etc).

After further thought... I can't think of any case (right now) where we'd need to divert. We only care about firing up secondary effects from DDL.

>>> >>You would have most of what you're asking. I think that looking into the
>>> >>normalized string to get the information you need when you already know
>>> >>you're looking at an ALTER TABLE statement and you already have the
>>> >>object references (schema, name, oid) is going to make things easier.
>> >
>> >Possibly. We certainly have cases where we need to know what's
>> >happening*inside* the DDL.
> In that cases you would probably need to resort to coding the trigger in
> C so that you can abuse the parsetree. At least the fact that you're
> doing funny things with some commands is easy to get at when doing \dy
> from a psql prompt, an information that's completely lost when using the
> standard process utility hook directly.
>
> I don't see a way to pass down the parse tree in a format easy to use in
> PLpgSQL anyway, but maybe we'll get there at some point. I want to say
> that having to resort to C in some complex cases is good enough for a
> first version of the feature.

+1
--
Jim C. Nasby, Database Architect jim(at)nasby(dot)net
512.569.9461 (cell) http://jim.nasby.net


From: Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>
To: Jim Nasby <jim(at)nasby(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Deparsing DDL command strings
Date: 2012-11-04 20:19:46
Message-ID: m2liehktnh.fsf@2ndQuadrant.fr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr> writes:
> The current design for event triggers is to spit out several things:
>
> - command tag is already commited
> - object id, can be null
> - schema name, can be null
> - object name
> - operation either ALTER, CREATE or DROP, …
> - object type TABLE, VIEW, FUNCTION, …
> - normalized command string
>
> After some more thinking, it appears that in several case you want to
> have all those information filled in and you don't want to care if that
> means your trigger needs to run at ddl_command_start or ddl_command_end.
>
> The proposal I want to make here is to introduce a generic event (or an
> event alias) named ddl_command_trace that the system provides at the
> right spot where you have the information. That's useful when you don't
> intend to divert the execution of the DDL and need to know all about it.

Please find attached a patch that includes support for such a design.

> For a DROP operation, ddl_command_trace would be ddl_command_start, and
> for a CREATE operation, that would be ddl_command_end, so that the
> target object (still|already) exists when the trigger is fired.

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

Attachment Content-Type Size
event_trigger_infos.1.patch.gz application/octet-stream 22.0 KB