Batch API for After Triggers

Lists: pgsql-hackers
From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Batch API for After Triggers
Date: 2013-06-08 21:00:18
Message-ID: CA+U5nM+D4+2kWgVCjQb9f5sPTWk+MD=29k_wpMvTJzUrMvebCw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

While fiddling with FK tuning, Noah suggested batching trigger
executions together to avoid execution overhead.

It turns out there is no easy way to write triggers that can take
advantage of the knowledge that they are being executed as a set of
trigger executions. Some API is required to allow a trigger to
understand that there may be other related trigger executions in the
very near future, so it can attempt to amortise call overhead across
many invocations ("batching").

The attached patch adds two fields to the TriggerDesc trigger
functions are handed, allowing them to inspect (if they choose) the
additional fields and thus potentially use some form of batching.

This is backwards compatible with earlier trigger API.

Two fields are

int tg_tot_num_events;
int tg_event_num

So your trigger can work out it is e.g. number 3 of 56 invocations in
the current set of after triggers.

Going back to Noah's example, this would allow you to collect all 56
values and then execute a single statement with an array of 56 values
in it. Knowing there are 56 means you can wait until the 56th
invocation before executing the batched statement, without risking
skipping some checks because you've only got half a batch left.

If you don't do this, then you'd need to introduce the concept of a
"final function" similar to the way aggregates work. But that seems
much too complex to be real world useful.

This seemed a generally useful approach for any after trigger author,
not just for RI.

Comments please.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment Content-Type Size
batch_api_after_triggers.v1.patch application/octet-stream 6.8 KB

From: Kevin Grittner <kgrittn(at)ymail(dot)com>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Batch API for After Triggers
Date: 2013-06-08 21:25:06
Message-ID: 1370726706.21008.YahooMailNeo@web162903.mail.bf1.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Simon Riggs <simon(at)2ndQuadrant(dot)com> wrote:

> Comments please.

How much of this problem space do you think could be addressed by
providing OLD and NEW *relations* to AFTER EACH STATEMENT triggers?

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Batch API for After Triggers
Date: 2013-06-09 04:08:50
Message-ID: 20130609040850.GO7200@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

* Simon Riggs (simon(at)2ndQuadrant(dot)com) wrote:
> While fiddling with FK tuning, Noah suggested batching trigger
> executions together to avoid execution overhead.

I like the general idea, but I'd prefer a way to avoid having to queue
up tons of trigger events to be executed in the first place.
Aggregates do this by providing a way to store up information to be
processed by an eventual 'final' function. Another option, as Kevin
asked about, would be statement level triggers which are able to see
both the OLD and the NEW versions of the relation.

The per-row trigger option with a way to be called immediately and then
store what it cares about for a later final function strikes me as very
generalized and able to do things that the statement-level option
couldn't, but I'm not sure if there's a use-case that could solve which
the OLD/NEW statement trigger capability couldn't.

Thanks,

Stephen


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Kevin Grittner <kgrittn(at)ymail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Batch API for After Triggers
Date: 2013-06-09 08:58:49
Message-ID: CA+U5nMJS4424ocF9LKpnYx3U8t8+9nTV2qNM5cKn5nDNHDqqbA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 8 June 2013 22:25, Kevin Grittner <kgrittn(at)ymail(dot)com> wrote:
> Simon Riggs <simon(at)2ndQuadrant(dot)com> wrote:
>
>> Comments please.
>
> How much of this problem space do you think could be addressed by
> providing OLD and NEW *relations* to AFTER EACH STATEMENT triggers?

It's a reasonable question because those two things sound a little
like they might be related.

Providing the proposed additional info costs almost nothing since the
work to calculate that info is already performed. I've written this
patch since it was trivial to do so, while inspecting the code to see
if it was possible. As it now turns out, I'll be putting most effort
into the WHEN clause approach for FKs, but there's no reason why
others like Slony or pgmemcache wouldn't benefit here also - hence
posting the patch. The proposed API changes don't conflict in any way
with the feature you propose.

Providing the whole OLD and NEW sets as relations to a trigger would
require significant resources and wouldn't be done for performance
reasons AFAICS. There are also difficulties in semantics, since when
we have OLD and NEW at row level we know we are discussing the same
row. With sets of OLD and NEW we'd need to be able to link the
relations back together somehow, which couldn't be done by PK since
that could change. So we'd need to invent some semantics for a
"linking identifier" of some description. Which once we've done it
would be used by people to join them back together again, which is
what we already had in the first place. So my take is that it sounds
expressive, but definitely not performant.

Since my objective is performance, not standards, I don't see a reason
to work on that, yet. I might have time to work on it later, lets see.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Batch API for After Triggers
Date: 2013-06-09 09:15:09
Message-ID: CA+U5nMKDRD_oqG1DwA4pk4LL+8yk=swiYtj=6Aaq8otr8tCBtg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 9 June 2013 05:08, Stephen Frost <sfrost(at)snowman(dot)net> wrote:
> * Simon Riggs (simon(at)2ndQuadrant(dot)com) wrote:
>> While fiddling with FK tuning, Noah suggested batching trigger
>> executions together to avoid execution overhead.
>
> I like the general idea, but I'd prefer a way to avoid having to queue
> up tons of trigger events to be executed in the first place.

There's already a thread on that exact topic, for FKs, which is what
spawned this thread.

> Aggregates do this by providing a way to store up information to be
> processed by an eventual 'final' function.

As I mentioned in my post, I did consider that and then chose not to
do that. However, having a final func is a major modification in the
way that we specify trigger functions. We'd also need to cope with
recursive trigger execution, which would mean the final func would get
called potentially many times, so there's no way of knowing if the
final func is actually the last call needed. That sounded complex and
confusing to me.

The proposed API allows you to do exactly that anyway, more easily, by
just waiting until tg_event_num == tg_tot_num_events.

> Another option, as Kevin
> asked about, would be statement level triggers which are able to see
> both the OLD and the NEW versions of the relation.
>
> The per-row trigger option with a way to be called immediately

... it already exists and is known as the WHEN clause. This is the
mechanism I expect to use to tune FKs

> and then
> store what it cares about for a later final function strikes me as very
> generalized and able to do things that the statement-level option
> couldn't,

> but I'm not sure if there's a use-case that could solve which
> the OLD/NEW statement trigger capability couldn't.

I think the two things are quite different, as I explained on a
separate post to Kevin.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Batch API for After Triggers
Date: 2013-06-09 10:33:24
Message-ID: 20130609103323.GB12924@svana.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Jun 09, 2013 at 10:15:09AM +0100, Simon Riggs wrote:
> As I mentioned in my post, I did consider that and then chose not to
> do that. However, having a final func is a major modification in the
> way that we specify trigger functions. We'd also need to cope with
> recursive trigger execution, which would mean the final func would get
> called potentially many times, so there's no way of knowing if the
> final func is actually the last call needed. That sounded complex and
> confusing to me.
>
> The proposed API allows you to do exactly that anyway, more easily, by
> just waiting until tg_event_num == tg_tot_num_events.

Can you signal partial completion? For example, if a trigger know that
blocks of 10,000 are optimal and it sees tg_tot_num_events == 1,000,000
that it could do work every 10,000 entries, as in when:

(tg_event_num % 10000) == 0 || tg_event_num == tg_tot_num_events

Have a nice day,
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> He who writes carelessly confesses thereby at the very outset that he does
> not attach much importance to his own thoughts.
-- Arthur Schopenhauer


From: Craig Ringer <craig(at)2ndquadrant(dot)com>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: Kevin Grittner <kgrittn(at)ymail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Batch API for After Triggers
Date: 2013-06-09 11:58:42
Message-ID: 51B46DF2.5060109@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 06/09/2013 04:58 PM, Simon Riggs wrote:
> There are also difficulties in semantics, since when
> we have OLD and NEW at row level we know we are discussing the same
> row. With sets of OLD and NEW we'd need to be able to link the
> relations back together somehow, which couldn't be done by PK since
> that could change.

We don't currently have OLD and NEW relations so we're free to define
how this works pretty freely.

Rather than having OLD and NEW as separate relations, we could just have
one OLD_AND_NEW relation. In that relation we exploit Pg's composite
types to nest the old and new tuples in a single outer change record.

OLD_AND_NEW would look to PL/PgSQL as if it were:

CREATE TEMPORARY TABLE OLD_AND_NEW (
OLD tabletype NOT NULL,
NEW tabletype NOT NULL
);

...though presumably without the ability to create indexes on it and the
other things you can do to a real temp table. Though I can see cases
where that'd be awfully handy too.

For DELETE and INSERT we'd either provide different relations named OLD
and NEW respectively, or we'd use OLD_AND_NEW with one field or the
other blank. I'm not sure which would be best.

Alternately, we could break the usual rules for relations and define OLD
and NEW as ordered, so lock-step iteration would always return matching
pairs of rows. That's useless in SQL since there's no way to achieve
lock-step iteration, but if we provide a
"for_each_changed_row('some_function'::regproc)" that scans them in
lock-step and invokes `some_function` for each one...? (I haven't yet
done enough in the core to have any idea if this approach is completely
and absurdly impossible, or just ugly. Figured I'd throw it out there
anyway.)

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Kevin Grittner <kgrittn(at)ymail(dot)com>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Batch API for After Triggers
Date: 2013-06-09 13:56:11
Message-ID: 1370786171.54947.YahooMailNeo@web162904.mail.bf1.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Simon Riggs <simon(at)2ndQuadrant(dot)com> wrote:
> On 8 June 2013 22:25, Kevin Grittner <kgrittn(at)ymail(dot)com> wrote:
>> Simon Riggs <simon(at)2ndQuadrant(dot)com> wrote:

> There are also difficulties in semantics, since when
> we have OLD and NEW at row level we know we are discussing the same
> row. With sets of OLD and NEW we'd need to be able to link the
> relations back together somehow, which couldn't be done by PK since
> that could change. So we'd need to invent some semantics for a
> "linking identifier" of some description. Which once we've done it
> would be used by people to join them back together again, which is
> what we already had in the first place. So my take is that it sounds
> expressive, but definitely not performant.

I have used a feature like this in other database products, and can
say from experience that these relations in a statement trigger can
be very useful without the linkage you propose.  I can see how the
linkage could potentially be useful, but if that is the only
hang-up, we would be adding a powerful feature without it.

> Since my objective is performance, not standards, I don't see a reason
> to work on that, yet. I might have time to work on it later, lets see.

This seems like it has some overlap with the delta relations I will
need to generate for incremental maintenance of materialized views,
so we should coordinate on those efforts if they happen to occur
around the same time.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Batch API for After Triggers
Date: 2013-06-10 21:50:34
Message-ID: CA+Tgmob9xbiyCpNTZ3qTQWv4bzou1qjCbHiVCDxQ2CB5dd0UEQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Jun 8, 2013 at 5:00 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> While fiddling with FK tuning, Noah suggested batching trigger
> executions together to avoid execution overhead.
>
> It turns out there is no easy way to write triggers that can take
> advantage of the knowledge that they are being executed as a set of
> trigger executions. Some API is required to allow a trigger to
> understand that there may be other related trigger executions in the
> very near future, so it can attempt to amortise call overhead across
> many invocations ("batching").
>
> The attached patch adds two fields to the TriggerDesc trigger
> functions are handed, allowing them to inspect (if they choose) the
> additional fields and thus potentially use some form of batching.

I'm unclear how this could be used in practice. Are the events in a
"batch" guaranteed to, say, all be related to the same relation?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Batch API for After Triggers
Date: 2013-06-11 08:01:10
Message-ID: CA+U5nMJkiAo1Cx0fNbt-s8i=eGk=VZONmdUuuuLZ=UdEcLbMuQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 10 June 2013 22:50, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Sat, Jun 8, 2013 at 5:00 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>> While fiddling with FK tuning, Noah suggested batching trigger
>> executions together to avoid execution overhead.
>>
>> It turns out there is no easy way to write triggers that can take
>> advantage of the knowledge that they are being executed as a set of
>> trigger executions. Some API is required to allow a trigger to
>> understand that there may be other related trigger executions in the
>> very near future, so it can attempt to amortise call overhead across
>> many invocations ("batching").
>>
>> The attached patch adds two fields to the TriggerDesc trigger
>> functions are handed, allowing them to inspect (if they choose) the
>> additional fields and thus potentially use some form of batching.
>
> I'm unclear how this could be used in practice.

As described, you can use it to prepare batches.

> Are the events in a
> "batch" guaranteed to, say, all be related to the same relation?

Good point. I was too focused on the single large statements I was
considering. There would need to some logic to look at relations as
well.

For individual statements it can work well, since all or at least the
majority of events are for a particular relation and we can know which
one that is.

It probably wouldn't work that well for deferred trigger events that
covered a spread of different relations.

Perhaps that's an argument for a final function after all.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Kevin Grittner <kgrittn(at)ymail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Batch API for After Triggers
Date: 2013-06-15 18:58:30
Message-ID: CA+U5nMKm6LuR-wpQEdQGTca2PGPQYbqQ5A3tREyM2D1ZsZa7kw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 9 June 2013 14:56, Kevin Grittner <kgrittn(at)ymail(dot)com> wrote:
> Simon Riggs <simon(at)2ndQuadrant(dot)com> wrote:
>> On 8 June 2013 22:25, Kevin Grittner <kgrittn(at)ymail(dot)com> wrote:
>>> Simon Riggs <simon(at)2ndQuadrant(dot)com> wrote:
>
>> There are also difficulties in semantics, since when
>> we have OLD and NEW at row level we know we are discussing the same
>> row. With sets of OLD and NEW we'd need to be able to link the
>> relations back together somehow, which couldn't be done by PK since
>> that could change. So we'd need to invent some semantics for a
>> "linking identifier" of some description. Which once we've done it
>> would be used by people to join them back together again, which is
>> what we already had in the first place. So my take is that it sounds
>> expressive, but definitely not performant.
>
> I have used a feature like this in other database products, and can
> say from experience that these relations in a statement trigger can
> be very useful without the linkage you propose. I can see how the
> linkage could potentially be useful, but if that is the only
> hang-up, we would be adding a powerful feature without it.

What I'm trying to do is tune FKs, which are currently implemented as
after row triggers. Hence why I'm looking at ways to speed up after
row triggers.

Suggesting I work on after statement triggers would imply you think
that FKs should/could be rewritten as after statement triggers. Is
that what you mean?

Doing it that way would mean rewriting a lot of code and would still
have problems 2 and 3 identified above. I can't imagine anybody would
go for that, but if you have a sketch of how it might work we can
consider it.

>> Since my objective is performance, not standards, I don't see a reason
>> to work on that, yet. I might have time to work on it later, lets see.
>
> This seems like it has some overlap with the delta relations I will
> need to generate for incremental maintenance of materialized views,
> so we should coordinate on those efforts if they happen to occur
> around the same time.

IMHO you should use Andres' logical changeset extraction for
incremental maintenance for mat views. Doing mat view maintenance
using triggers would be choosing the lower performance option.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Kevin Grittner <kgrittn(at)ymail(dot)com>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Batch API for After Triggers
Date: 2013-06-15 19:46:13
Message-ID: 1371325573.99485.YahooMailNeo@web162901.mail.bf1.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Simon Riggs <simon(at)2ndQuadrant(dot)com> wrote:
> On 9 June 2013 14:56, Kevin Grittner <kgrittn(at)ymail(dot)com> wrote:
>> Simon Riggs <simon(at)2ndQuadrant(dot)com> wrote:
>>> On 8 June 2013 22:25, Kevin Grittner <kgrittn(at)ymail(dot)com> wrote:
>>>> Simon Riggs <simon(at)2ndQuadrant(dot)com> wrote:
>>
>>> There are also difficulties in semantics, since when
>>> we have OLD and NEW at row level we know we are discussing the same
>>> row. With sets of OLD and NEW we'd need to be able to link the
>>> relations back together somehow, which couldn't be done by PK since
>>> that could change. So we'd need to invent some semantics for a
>>> "linking identifier" of some description. Which once we've done it
>>> would be used by people to join them back together again, which is
>>> what we already had in the first place. So my take is that it sounds
>>> expressive, but definitely not performant.
>>
>> I have used a feature like this in other database products, and can
>> say from experience that these relations in a statement trigger can
>> be very useful without the linkage you propose.  I can see how the
>> linkage could potentially be useful, but if that is the only
>> hang-up, we would be adding a powerful feature without it.
>
> What I'm trying to do is tune FKs, which are currently implemented as
> after row triggers. Hence why I'm looking at ways to speed up after
> row triggers.
>
> Suggesting I work on after statement triggers would imply you think
> that FKs should/could be rewritten as after statement triggers. Is
> that what you mean?

Well, I didn't make any suggestion -- I was asking questions, and
then when you commented on a hypothetical feature I responded.

That said, I was asking the question because I used SQL Server for
years when it had triggers but no foreign key definitions.  There
was a pattern for enforcing foreign key relationships in AFTER EACH
STATEMENT triggers using the old and new relations (with no linkage
between particular updated rows) which we used so heavily I could
write the triggers for a given foreign key mechanically in just a
minute or two.  This technique had good performance and seems to
have all the semantics you're looking for, so I was wondering
whether that might be a better approach to this problem.  It sure
seems like it would have fewer moving parts.  I know it was very
reliable with S2PL concurrency control, but there may be problems
with adapting it to MVCC that I'm not seeing without a deeper look.

If you're interested, I could try to prod those areas of my memory
to recall the pattern, or find examples of it somewhere.

> Doing it that way would mean rewriting a lot of code and would still
> have problems 2 and 3 identified above. I can't imagine anybody would
> go for that, but if you have a sketch of how it might work we can
> consider it.

I don't, but if there is interest I could probably sketch the outlines.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Craig Ringer <craig(at)2ndquadrant(dot)com>
Cc: Kevin Grittner <kgrittn(at)ymail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Batch API for After Triggers
Date: 2013-06-17 13:12:05
Message-ID: CA+U5nMK690nqQeDbZxin+rVPa6-ntHcU-qw0ZRgnc5C84t_TTA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 9 June 2013 12:58, Craig Ringer <craig(at)2ndquadrant(dot)com> wrote:
> On 06/09/2013 04:58 PM, Simon Riggs wrote:
>> There are also difficulties in semantics, since when
>> we have OLD and NEW at row level we know we are discussing the same
>> row. With sets of OLD and NEW we'd need to be able to link the
>> relations back together somehow, which couldn't be done by PK since
>> that could change.
>
> We don't currently have OLD and NEW relations so we're free to define
> how this works pretty freely.
>
> Rather than having OLD and NEW as separate relations, we could just have
> one OLD_AND_NEW relation. In that relation we exploit Pg's composite
> types to nest the old and new tuples in a single outer change record.
>
> OLD_AND_NEW would look to PL/PgSQL as if it were:
>
> CREATE TEMPORARY TABLE OLD_AND_NEW (
> OLD tabletype NOT NULL,
> NEW tabletype NOT NULL
> );
>
> ...though presumably without the ability to create indexes on it and the
> other things you can do to a real temp table. Though I can see cases
> where that'd be awfully handy too.
>
> For DELETE and INSERT we'd either provide different relations named OLD
> and NEW respectively, or we'd use OLD_AND_NEW with one field or the
> other blank. I'm not sure which would be best.
>
> Alternately, we could break the usual rules for relations and define OLD
> and NEW as ordered, so lock-step iteration would always return matching
> pairs of rows. That's useless in SQL since there's no way to achieve
> lock-step iteration, but if we provide a
> "for_each_changed_row('some_function'::regproc)" that scans them in
> lock-step and invokes `some_function` for each one...? (I haven't yet
> done enough in the core to have any idea if this approach is completely
> and absurdly impossible, or just ugly. Figured I'd throw it out there
> anyway.)

I think the best way, if we did do this, would be to have a number of
different relations defined:

OLD
NEW
INSERTED
DELETED
all of which would be defined same as main table

and also one called
UPDATED
which would have two row vars called OLD and NEW
so you would access it like e.g. IF UPDATED.OLD.id = 7

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Craig Ringer <craig(at)2ndquadrant(dot)com>, Kevin Grittner <kgrittn(at)ymail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Batch API for After Triggers
Date: 2013-06-17 17:25:03
Message-ID: CAFj8pRCzr4Mev8CAqdoM9gnzoxFfk9kB_49EH6_GT7anDpLZxQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2013/6/17 Simon Riggs <simon(at)2ndquadrant(dot)com>:
> On 9 June 2013 12:58, Craig Ringer <craig(at)2ndquadrant(dot)com> wrote:
>> On 06/09/2013 04:58 PM, Simon Riggs wrote:
>>> There are also difficulties in semantics, since when
>>> we have OLD and NEW at row level we know we are discussing the same
>>> row. With sets of OLD and NEW we'd need to be able to link the
>>> relations back together somehow, which couldn't be done by PK since
>>> that could change.
>>
>> We don't currently have OLD and NEW relations so we're free to define
>> how this works pretty freely.
>>
>> Rather than having OLD and NEW as separate relations, we could just have
>> one OLD_AND_NEW relation. In that relation we exploit Pg's composite
>> types to nest the old and new tuples in a single outer change record.
>>
>> OLD_AND_NEW would look to PL/PgSQL as if it were:
>>
>> CREATE TEMPORARY TABLE OLD_AND_NEW (
>> OLD tabletype NOT NULL,
>> NEW tabletype NOT NULL
>> );
>>
>> ...though presumably without the ability to create indexes on it and the
>> other things you can do to a real temp table. Though I can see cases
>> where that'd be awfully handy too.
>>
>> For DELETE and INSERT we'd either provide different relations named OLD
>> and NEW respectively, or we'd use OLD_AND_NEW with one field or the
>> other blank. I'm not sure which would be best.
>>
>> Alternately, we could break the usual rules for relations and define OLD
>> and NEW as ordered, so lock-step iteration would always return matching
>> pairs of rows. That's useless in SQL since there's no way to achieve
>> lock-step iteration, but if we provide a
>> "for_each_changed_row('some_function'::regproc)" that scans them in
>> lock-step and invokes `some_function` for each one...? (I haven't yet
>> done enough in the core to have any idea if this approach is completely
>> and absurdly impossible, or just ugly. Figured I'd throw it out there
>> anyway.)
>
>
> I think the best way, if we did do this, would be to have a number of
> different relations defined:
>
> OLD
> NEW
> INSERTED
> DELETED
> all of which would be defined same as main table
>
> and also one called
> UPDATED
> which would have two row vars called OLD and NEW
> so you would access it like e.g. IF UPDATED.OLD.id = 7
>

nice idea

+1

Pavel

> --
> Simon Riggs http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Training & Services
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers


From: Kevin Grittner <kgrittn(at)ymail(dot)com>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Batch API for After Triggers
Date: 2013-06-17 19:53:45
Message-ID: 1371498825.21517.YahooMailNeo@web162905.mail.bf1.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Simon Riggs <simon(at)2ndQuadrant(dot)com> wrote:
> On 9 June 2013 12:58, Craig Ringer <craig(at)2ndquadrant(dot)com> wrote:

>> We don't currently have OLD and NEW relations so we're free to
>> define how this works pretty freely.

> I think the best way, if we did do this, would be to have a
> number of different relations defined:
>
> OLD
> NEW
> INSERTED
> DELETED
> all of which would be defined same as main table
>
> and also one called
> UPDATED
> which would have two row vars called OLD and NEW
> so you would access it like e.g. IF UPDATED.OLD.id = 7

Well, there is the SQL standard, which has a couple paragraphs on
the topic which we might want to heed.  For a delete there is just
an old table; for an insert just a new one.  For an update you have
both, with the same cardinality.  The rows in the old and new
tables have a correspondence, but that is only visible to FOR EACH
ROW triggers.  For something like RI, why would you need to
establish correspondence?  A row with the referenced key either
exists after the statement completes, or it doesn't -- why would we
care whether it is an updated version of the same row?

Syntax for how to refer to the these is defined by the standard.

As usual, I don't object to adding capabilities as long as the
standard syntax is also supported with standard semantics.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Craig Ringer <craig(at)2ndquadrant(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Kevin Grittner <kgrittn(at)ymail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Batch API for After Triggers
Date: 2013-06-17 22:30:17
Message-ID: 51BF8DF9.6050706@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 06/18/2013 01:25 AM, Pavel Stehule wrote:
>> > and also one called
>> > UPDATED
>> > which would have two row vars called OLD and NEW
>> > so you would access it like e.g. IF UPDATED.OLD.id = 7
>> >
> nice idea
>
> +1
Much better naming than OLD_AND_NEW.

I'm not so sure about

OLD
NEW
INSERTED
DELETED

in that I imagine we'd want to pick one pair and stick with it. Since
using "INSERTED" / "DELETED" makes "UPDATED" make sense, and since "OLD"
and "NEW" are already used to refer to the magic variables of those
names in for each row triggers, I think INSERTED / UPDATED / DELETED is
the way to go.

INSERTED and UPDATED could just be views of the same data as UPDATED
that show only the OLD or only the NEW composite type fields. That'd
allow you to write a trigger without TG_OP tests in many cases, as
UPDATED would always contain what you wanted. It seems slightly weird to
have INSERTED and DELETED populated for an UPDATE, but when an UPDATE is
logically an INSERT+DELETE anyway...

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Kevin Grittner <kgrittn(at)ymail(dot)com>
Cc: Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Batch API for After Triggers
Date: 2013-06-18 08:04:47
Message-ID: CA+U5nM+cxuBU=pLYZD=9JmAnG0n4c5U4NoCyrzXZnTfiVGcDSw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 17 June 2013 20:53, Kevin Grittner <kgrittn(at)ymail(dot)com> wrote:
> Simon Riggs <simon(at)2ndQuadrant(dot)com> wrote:
>> On 9 June 2013 12:58, Craig Ringer <craig(at)2ndquadrant(dot)com> wrote:
>
>>> We don't currently have OLD and NEW relations so we're free to
>>> define how this works pretty freely.
>
>> I think the best way, if we did do this, would be to have a
>> number of different relations defined:
>>
>> OLD
>> NEW
>> INSERTED
>> DELETED
>> all of which would be defined same as main table
>>
>> and also one called
>> UPDATED
>> which would have two row vars called OLD and NEW
>> so you would access it like e.g. IF UPDATED.OLD.id = 7
>
> Well, there is the SQL standard, which has a couple paragraphs on
> the topic which we might want to heed.

Yes, I already did in my proposal above.
OLD and NEW are all we need to fulfill the standard.

> For a delete there is just
> an old table; for an insert just a new one. For an update you have
> both, with the same cardinality. The rows in the old and new
> tables have a correspondence, but that is only visible to FOR EACH
> ROW triggers.

Yes, those are the relevant parts. SQL:2008 4.38 is the paragraphs
that describe this (for later reference).

What the standard doesn't cover is recursive calls, that might
generate new events of different kinds. So an UPDATE statement might
have caused DELETEs etc.. So we'd need a way to get access to DELETED
rows even when the OLD relation covers only the UPDATED rows.

For row level triggers we support macros like TRIGGER_FIRED_BY_INSERT.
Having an INSERT relation is just the logical equivalent for statement
level triggers.

> For something like RI, why would you need to
> establish correspondence? A row with the referenced key either
> exists after the statement completes, or it doesn't -- why would we
> care whether it is an updated version of the same row?

I wasn't doing this for RI specifically, I was looking at what we'd
need to provide a full facilitiy.

It's not very easy to see how we can support RI via statement level triggers.

By definiton, statement level triggers happen after all row level
triggers have fired. So implementing row level RI by using statement
level triggers that follow the standard isn't possible. The main
things we'd need to cope with would be recursive trigger calls, for
example DELETE cascades. The firing of the triggers generates more
trigger events which delete more rows etc.. If we want to implement RI
triggers using some form of set processing we would need to do that in
the middle of handling the after row events, i.e. execute a set, then
re-check for a new set of events and execute them.

Directly using the statement-level standard triggers isn't the way, I
conclude after some detailed thinking. But there could be some
intermediate form that makes sense.

> Syntax for how to refer to the these is defined by the standard.
>
> As usual, I don't object to adding capabilities as long as the
> standard syntax is also supported with standard semantics.

Agreed.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Craig Ringer <craig(at)2ndquadrant(dot)com>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Kevin Grittner <kgrittn(at)ymail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Batch API for After Triggers
Date: 2013-06-18 08:05:56
Message-ID: CA+U5nMLNmZBWkAqxUddYowmaF=eHsEDh9wzbW7HoAyNeGpUFPQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 17 June 2013 23:30, Craig Ringer <craig(at)2ndquadrant(dot)com> wrote:

> INSERTED and UPDATED could just be views...

Yes, that would be my suggestion.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services