Re: autonomous transactions

Lists: pgsql-hackers
From: "Colin 't Hart" <colinthart(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: TODO note
Date: 2010-09-15 07:37:50
Message-ID: AANLkTi=uogmYxLKWmUfFSg-Ki2bejsQiO2g5GTMxvdW2@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

I note that the implementation of tab completion for SET TRANSACTION in PSQL
could benefit from the implementation of autonomous transactions (also
TODO).

Regards,

Colin


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: "Colin 't Hart" <colinthart(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: TODO note
Date: 2010-09-15 17:30:10
Message-ID: AANLkTik0ZBV1GcG6GeN2+swe2PKe1_osG1Z62J0SVXbm@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Sep 15, 2010 at 3:37 AM, Colin 't Hart <colinthart(at)gmail(dot)com> wrote:
> I note that the implementation of tab completion for SET TRANSACTION in PSQL
> could benefit from the implementation of autonomous transactions (also
> TODO).

I think it's safe to say that if we ever manage to get autonomous
transactions working, there are a GREAT MANY things which will benefit
from that. There's probably an easier way to get at that Todo item,
though, if someone feels like beating on it.

One problem with autonomous transactions is that you have to figure
out where to store all the state associated with the autonomous
transaction and its subtransactions. Another is that you have to
avoid an unacceptable slowdown in the tuple-visibility checks in the
process.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Darren Duncan <darren(at)darrenduncan(dot)net>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, "Colin 't Hart" <colinthart(at)gmail(dot)com>
Subject: autonomous transactions (was Re: TODO note)
Date: 2010-09-15 18:32:55
Message-ID: 4C911157.30908@darrenduncan.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas wrote:
> On Wed, Sep 15, 2010 at 3:37 AM, Colin 't Hart <colinthart(at)gmail(dot)com> wrote:
>> I note that the implementation of tab completion for SET TRANSACTION in PSQL
>> could benefit from the implementation of autonomous transactions (also
>> TODO).
>
> I think it's safe to say that if we ever manage to get autonomous
> transactions working, there are a GREAT MANY things which will benefit
> from that. There's probably an easier way to get at that Todo item,
> though, if someone feels like beating on it.
>
> One problem with autonomous transactions is that you have to figure
> out where to store all the state associated with the autonomous
> transaction and its subtransactions. Another is that you have to
> avoid an unacceptable slowdown in the tuple-visibility checks in the
> process.

As I understand it, in many ways, autonomous transactions are like distinct
database client sessions, but that the client in this case is another database
session, especially if the autonomous transaction can make a commit that
persists even if the initial session afterwards does a rollback.

Similarly, using autonomous transactions is akin to multi-processing. Normal
distinct database client sessions are like distinct processes, but usually are
started externally to the DBMS, but autonomous transactions are like processes
started within the DBMS.

Also, under the assumption that everything in a DBMS session should be subject
to transactions, so that both data-manipulation and data-definition can be
rolled back, autonomous transactions are like a generalization of supporting
sequence generators that remember their incremented state even when the action
that incremented it is rolled back; the sequence generator update is effectively
an autonomous transaction, in that case.

The point being, the answer to how to implement autonomous transactions could be
as simple as, do the same thing as how you manage multiple concurrent client
sessions, more or less. If each client gets its own Postgres OS process, then
an autonomous transaction just farms out to another one of those which does the
work. Or maybe there could be a lighter weight version of this.

Does this design principle seem reasonable?

If autonomous transactions could be used a lot, then maybe the other process
could be kept connected and be fed other subsequent autonomous actions, such as
if it is being used to implement an activity log, so some kind of IPC would be
going on.

-- Darren Duncan


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Darren Duncan <darren(at)darrenduncan(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org, "Colin 't Hart" <colinthart(at)gmail(dot)com>
Subject: Re: autonomous transactions (was Re: TODO note)
Date: 2010-09-15 18:57:29
Message-ID: AANLkTikoC3QBdLb6cju+-Yanq6SL4HJxvM3tvwRFgNpL@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Sep 15, 2010 at 2:32 PM, Darren Duncan <darren(at)darrenduncan(dot)net> wrote:
> The point being, the answer to how to implement autonomous transactions
> could be as simple as, do the same thing as how you manage multiple
> concurrent client sessions, more or less.  If each client gets its own
> Postgres OS process, then an autonomous transaction just farms out to
> another one of those which does the work.  Or maybe there could be a lighter
> weight version of this.
>
> Does this design principle seem reasonable?

I guess so, but the devil is in the details. I suspect that we don't
actually want to fork a new backend for every autonomous transactions.
That would be pretty expensive, and we already have an expensive way
of emulating this functionality using dblink. Finding all of the bits
that think there's only one top-level transaction per backend and
generalizing them to support multiple top-level transactions per
backend doesn't sound easy, though, especially since you must do it
without losing performance.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Darren Duncan <darren(at)darrenduncan(dot)net>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: autonomous transactions (was Re: TODO note)
Date: 2010-09-15 19:22:27
Message-ID: 4C911CF3.7060305@darrenduncan.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas wrote:
> On Wed, Sep 15, 2010 at 2:32 PM, Darren Duncan <darren(at)darrenduncan(dot)net> wrote:
>> The point being, the answer to how to implement autonomous transactions
>> could be as simple as, do the same thing as how you manage multiple
>> concurrent client sessions, more or less. If each client gets its own
>> Postgres OS process, then an autonomous transaction just farms out to
>> another one of those which does the work. Or maybe there could be a lighter
>> weight version of this.
>>
>> Does this design principle seem reasonable?
>
> I guess so, but the devil is in the details. I suspect that we don't
> actually want to fork a new backend for every autonomous transactions.
> That would be pretty expensive, and we already have an expensive way
> of emulating this functionality using dblink. Finding all of the bits
> that think there's only one top-level transaction per backend and
> generalizing them to support multiple top-level transactions per
> backend doesn't sound easy, though, especially since you must do it
> without losing performance.

As you say, the devil is in the details, but I see this as mainly being an
implementation issue, where an essentially same task could abstract different
possible implementations, some more light or heavyweight.

This is loosely how I look at the issue conceptually, meaning like the illusion
that the DBMS presents to the user:

The DBMS is a multi-process virtual machine, the database being worked on is the
file system or disk, and uncommitted transactions are data structures in memory
that may have multiple versions. Each autonomous transaction is associated with
a single process. A process can either be started by the user (client
connection) or by another process (autonomous transaction). Regardless of how a
process is started, the way to manage multiple autonomous tasks is that each has
its own process. Tasks that are not mutually autonomous would be within the
same process. Child transactions or savepoints have the same process as their
parent when the parent can rollback their commits.

Whether the DBMS uses multiple OS threads or multiple OS processes or uses
coroutines or whatever is an implementation detail.

A point here being that over time Postgres can evolve to use either multiple OS
processes or multiple threads or a coroutine system within a single
thread/process, to provide the illusion of each autonomous transaction being an
independent process, and the data structures and algorithms for managing
autonomous transactions can be similar to or the same as multiple client
connections, since conceptually they are alike.

-- Darren Duncan


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Darren Duncan <darren(at)darrenduncan(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Colin 't Hart <colinthart(at)gmail(dot)com>
Subject: Re: autonomous transactions (was Re: TODO note)
Date: 2010-09-15 22:21:36
Message-ID: 1284589049-sup-6998@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Excerpts from Robert Haas's message of mié sep 15 14:57:29 -0400 2010:

> I guess so, but the devil is in the details. I suspect that we don't
> actually want to fork a new backend for every autonomous transactions.
> That would be pretty expensive, and we already have an expensive way
> of emulating this functionality using dblink. Finding all of the bits
> that think there's only one top-level transaction per backend and
> generalizing them to support multiple top-level transactions per
> backend doesn't sound easy, though,

Yeah, and the transaction handling code is already pretty complex.

> especially since you must do it without losing performance.

Presumably we'd have fast paths for the main transaction, and
any autonomous transactions beside that one would incur in some
slowdown.

I think the complex parts are, first, figuring out what to do with
global variables that currently represent a transaction (they are
sprinkled all over the place); and second, how to represent the
autonomous transactions in shared memory without requiring the PGPROC
array to be arbitrarily resizable.

The other alternative would be to bolt the autonomous transaction
somehow in the current subtransaction stack thing and marking it in some
different way so that we can reuse the games we play with "push/pop"
there. That still leaves us with the PGPROC problem.

--
Álvaro Herrera <alvherre(at)commandprompt(dot)com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Darren Duncan <darren(at)darrenduncan(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, "Colin 't Hart" <colinthart(at)gmail(dot)com>
Subject: Re: autonomous transactions (was Re: TODO note)
Date: 2010-09-16 00:43:29
Message-ID: AANLkTik9deBX0NWZ8CQEu75DtSWf_vRXo5d73w=Va+47@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Sep 15, 2010 at 6:21 PM, Alvaro Herrera
<alvherre(at)commandprompt(dot)com> wrote:
> Excerpts from Robert Haas's message of mié sep 15 14:57:29 -0400 2010:
>
>> I guess so, but the devil is in the details.  I suspect that we don't
>> actually want to fork a new backend for every autonomous transactions.
>>  That would be pretty expensive, and we already have an expensive way
>> of emulating this functionality using dblink.  Finding all of the bits
>> that think there's only one top-level transaction per backend and
>> generalizing them to support multiple top-level transactions per
>> backend doesn't sound easy, though,
>
> Yeah, and the transaction handling code is already pretty complex.

Yep.

>> especially since you must do it without losing performance.
>
> Presumably we'd have fast paths for the main transaction, and
> any autonomous transactions beside that one would incur in some
> slowdown.
>
> I think the complex parts are, first, figuring out what to do with
> global variables that currently represent a transaction (they are
> sprinkled all over the place); and second, how to represent the
> autonomous transactions in shared memory without requiring the PGPROC
> array to be arbitrarily resizable.
>
> The other alternative would be to bolt the autonomous transaction
> somehow in the current subtransaction stack thing and marking it in some
> different way so that we can reuse the games we play with "push/pop"
> there.  That still leaves us with the PGPROC problem.

I wonder if we could use/generalize pg_subtrans in some way to handle
the PGPROC problem. I haven't thought about it much, though.

One thing that strikes me (maybe this is obvious) is that the
execution of the main transaction and the autonomous transaction are
not interleaved: it's a stack. So in terms of globals and stuff,
assuming you knew which things needed to be updated, you could push
all that stuff off to the side, do whatever with the new transaction,
and then restore all the context afterwards. That doesn't help in
terms of PGPROC, of course, but for backend-local state it seems
workable.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Markus Wanner <markus(at)bluegap(dot)ch>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Colin 't Hart <colinthart(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: TODO note
Date: 2010-09-16 09:02:49
Message-ID: 4C91DD39.8070709@bluegap.ch
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

On 09/15/2010 07:30 PM, Robert Haas wrote:
> One problem with autonomous transactions is that you have to figure
> out where to store all the state associated with the autonomous
> transaction and its subtransactions. Another is that you have to
> avoid an unacceptable slowdown in the tuple-visibility checks in the
> process.

It just occurs to me that this is the other potential use case for
bgworkers: autonomous transactions. Simply store any kind of state in
the bgworker and use one per autonomous transaction.

What's left to be done: implement communication between the controlling
backend (with the client connection) and the bgworker (imessages), drop
the bgworker's session to user privileges (and re-raise to superuser
after the job) and implement better error handling, as those would have
to be propagated back to the controlling backend.

Regards

Markus Wanner


From: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Darren Duncan <darren(at)darrenduncan(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, "Colin 't Hart" <colinthart(at)gmail(dot)com>
Subject: Re: autonomous transactions
Date: 2010-09-16 09:19:47
Message-ID: 87zkvihwp8.fsf@hi-media-techno.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> One thing that strikes me (maybe this is obvious) is that the
> execution of the main transaction and the autonomous transaction are
> not interleaved: it's a stack. So in terms of globals and stuff,
> assuming you knew which things needed to be updated, you could push
> all that stuff off to the side, do whatever with the new transaction,
> and then restore all the context afterwards.

I think they call that dynamic scope, in advanced programming
language. I guess that's calling for a quote of Greenspun's Tenth Rule:

Any sufficiently complicated C or Fortran program contains an ad hoc
informally-specified bug-ridden slow implementation of half of Common
Lisp.

So the name of the game could be to find out a way to implement (a
limited form of) dynamic scoping in PostgreSQL, in C, then find out all
and any backend local variable that needs that to support autonomous
transactions, then make it happen… Right?

Regards,
--
dim


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Darren Duncan <darren(at)darrenduncan(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, "Colin 't Hart" <colinthart(at)gmail(dot)com>
Subject: Re: autonomous transactions
Date: 2010-09-16 14:19:53
Message-ID: AANLkTikc_mUooPjKk0nE7UwuDtm5L3+BMtdU=tq6yV4S@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Sep 16, 2010 at 5:19 AM, Dimitri Fontaine
<dfontaine(at)hi-media(dot)com> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> One thing that strikes me (maybe this is obvious) is that the
>> execution of the main transaction and the autonomous transaction are
>> not interleaved: it's a stack.  So in terms of globals and stuff,
>> assuming you knew which things needed to be updated, you could push
>> all that stuff off to the side, do whatever with the new transaction,
>> and then restore all the context afterwards.
>
> I think they call that dynamic scope, in advanced programming
> language. I guess that's calling for a quote of Greenspun's Tenth Rule:
>
>  Any sufficiently complicated C or Fortran program contains an ad hoc
>  informally-specified bug-ridden slow implementation of half of Common
>  Lisp.
>
> So the name of the game could be to find out a way to implement (a
> limited form of) dynamic scoping in PostgreSQL, in C, then find out all
> and any backend local variable that needs that to support autonomous
> transactions, then make it happen… Right?

Interestingly, PostgreSQL was originally written in LISP, and there
are remnants of that in the code today; for example, our heavy use of
List nodes. But I don't think that has much to do with this project.
I plan to reserve judgment on the best way of managing the relevant
state until such time as someone has gone to the trouble of
identifying what state that is.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Darren Duncan <darren(at)darrenduncan(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, "Colin 't Hart" <colinthart(at)gmail(dot)com>
Subject: Re: autonomous transactions
Date: 2010-09-16 14:42:05
Message-ID: 12288.1284648125@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> I plan to reserve judgment on the best way of managing the relevant
> state until such time as someone has gone to the trouble of
> identifying what state that is.

The really fundamental problem here is that you never will be able to
identify all such state. Even assuming that you successfully completed
the herculean task of fixing the core backend, what of add-on code?

(This is also why I'm quite unimpressed with the idea of trying to
get backends to switch to a different database after startup.)

regards, tom lane


From: Darren Duncan <darren(at)darrenduncan(dot)net>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: autonomous transactions
Date: 2010-09-17 03:28:12
Message-ID: 4C92E04C.6030707@darrenduncan.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas wrote:
> On Thu, Sep 16, 2010 at 5:19 AM, Dimitri Fontaine <dfontaine(at)hi-media(dot)com> wrote:
>> I think they call that dynamic scope, in advanced programming
>> language. I guess that's calling for a quote of Greenspun's Tenth Rule:
>>
>> Any sufficiently complicated C or Fortran program contains an ad hoc
>> informally-specified bug-ridden slow implementation of half of Common
>> Lisp.
>>
>> So the name of the game could be to find out a way to implement (a
>> limited form of) dynamic scoping in PostgreSQL, in C, then find out all
>> and any backend local variable that needs that to support autonomous
>> transactions, then make it happen… Right?
>
> Interestingly, PostgreSQL was originally written in LISP, and there
> are remnants of that in the code today; for example, our heavy use of
> List nodes. But I don't think that has much to do with this project.
> I plan to reserve judgment on the best way of managing the relevant
> state until such time as someone has gone to the trouble of
> identifying what state that is.

It would probably do Pg some good to try and recapture its functional language
roots where reasonably possible. I believe that, design-wise, functional
languages really are the best way to do object-relational databases, given that
pure functions and immutable data structures are typically the best way to
express anything one would do with them. -- Darren Duncan