Quick Links

Portals and nested transactions

Lists:	pgsql-hackers

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Alvaro Herrera <alvherre(at)dcc(dot)uchile(dot)cl>
Cc:	pgsql-hackers(at)postgreSQL(dot)org
Subject:	Portals and nested transactions
Date:	2004-07-13 20:57:06
Message-ID:	18506.1089752226@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

I've been thinking about what to do with cursors in subtransactions.
The problem really includes both cursors (created with DECLARE CURSOR)
and portals (created with the V3-protocol Bind message) since they are
the same kind of animal internally, namely a Portal. In previous
discussion I think everyone agreed that we would like the following
properties:

1. A Portal created within a successful (committed) subtransaction
remains open and usable by the parent transaction, as well as by subsequent
child subtransactions.

2. If a subtransaction uses (fetches from) a pre-existing Portal, the
Portal state change persists after subxact commit.

What was not totally settled was what to do on subtransaction abort:

Q1: Should Portals successfully created within the failed subxact
be closed? Or should they remain open?

Q2: If the subxact changed the state of a pre-existing Portal, should
that state change roll back? In particular, can a Close Portal
operation roll back?

Taking a "transactional" view means answering "yes" to both questions
(so that all portal state returns to what it was at subxact entry).
But there was also support for a "nontransactional" view in which both
questions are answered "no".

The discussion sort of trailed off there because we had no ideas how to
implement either. I will now sketch some implementation ideas about how
to do the nontransactional way. We could support the transactional
behavior as well, but not very efficiently (at least not in the first
cut).

An important limitation that I think we must make is that any error
occurring while a specific Portal is executing "kills" that Portal;
you cannot do anything further with it except close it, even if the
Portal would otherwise have survived the subtransaction abort caused
by the error. The reason for this is that we can't be sure we have
consistent internal state for the Portal when an error occurred at a
random point. (Example: a btree index scan could have released lock on
one buffer and gotten an error while trying to read the next page of the
index; it's not certain that the scan data structures accurately reflect
this intermediate state.) Later on we might be able to relax this
restriction, but it will take a lot of care to decide which errors are
"safe". So for the moment, an error during a FETCH (or protocol-level
Execute) leaves that Portal in a state where any subsequent fetch or
execute draws "ERROR: portal execution cannot be continued".

How to do it non-transactionally
--------------------------------

The key insight I had while thinking about this is that subtransactions
are the wrong units for managing ownership of resources used by queries
(buffers, locks, etc). When portals can outlive subtransactions, those
resources really need to be thought of as belonging to the portals not
the subtransactions.

However I think we *also* need to allow subtransaction to own resources
--- at least locks. We usually want to hold table locks until main
transaction end, and it would be bad to have to keep Portals around
just to remember some locks. It would be better to reassign ownership
of the locks to the current transaction when a Portal is closed.

What I think we ought to do to support this is to invent the concept
of "ResourceOwner" objects, which will be very much like MemoryContexts
except that they represent held buffer pins, table locks, and anything
else that we decide needs to be managed in this fashion (rtree index scans
are one example). In particular we'll let ResourceOwners have child
ResourceOwner objects so that there can be forests of the things, just
as with MemoryContexts.

There would be a CurrentResourceOwner global variable analogous to
CurrentMemoryContext, which would for instance tell PinBuffer which
ResourceOwner to affix ownership of the pin to.

(I am half tempted to unify ResourceOwners and MemoryContexts completely,
but that's probably overkill, since we have many short-lived
MemoryContexts that would never be appropriate owners of query-level
resources. In particular I think CurrentResourceOwner would usually be
different from CurrentMemoryContext.)

Depending on the resource in question, we could let ResourceOwners point
to owned objects (for instance, I'm thinking of storing an array of Buffer
numbers in a ResourceOwner to represent buffer pins) or vice versa (for
instance, the best way to keep track of rtree indexscan ownership is
probably to store a pointer to the ResourceOwner object in the rtree
indexscan struct).

This infrastructure shouldn't be much work to create, since we have the
MemoryContext stuff available to serve as a model. Once we have it,
we'll create a ResourceOwner for each transaction or subtransaction as
well as one for each Portal. (We need one for a transaction because, for
example, query parsing requires buffer and lock access, and that happens
before we create a Portal to execute the query.)

The reason this solves our problems is that while executing a portal's
query, CurrentResourceOwner will point to the portal's ResourceOwner
not the current subtransaction's. Therefore any buffers, locks, etc
acquired or released by the query are effectively owned by the portal
and not by the subtransaction. Subtransaction abort would release only
resources associated with the subtransaction itself, not those associated
with the portals it has happened to touch.

A nice property of this solution is that we can get rid of much of the
subtransaction entry/exit overhead that exists in current CVS tip.
There's no particular reason for the buffer manager to save and restore
buffer pin counts, for example.

One of the reasons why the pre-existing code needed to check and zero
buffer pin counts at transaction abort is that it cannot assume that all
pins held on behalf of a query were properly released by query abort; that
would have required making global assumptions about all code everywhere
being careful to record held pins in places that query abort cleanup would
find out about them. In the ResourceOwner paradigm, the equivalent
correctness guarantee needs only local correctness in a few routines:
for instance, PinBuffer has to be sure it cannot error out between
acquiring a buffer pin and recording the pin in CurrentResourceOwner.
(Thus, for instance, it should make sure there is room in the
ResourceOwner's buffer-number array *before* it grabs the pin.)
This should give us much the same level of confidence that we have now
for memory management: there are no permanent memory leaks when you use
palloc.

How to do it transactionally
----------------------------

With the above design, we can handle nontransactional Portals easily:
we just don't touch the state of (non-failed) Portals when we exit a
subtransaction.

It's easy to see how to handle Portal creation and deletion
transactionally; it's pretty much the same algorithm we use already
in other places such as OnCommit. Just stamp each Portal with the
creating or would-be-deleting subxact's XID.

The hard part is rolling back a pre-existing Portal to its prior state
at subxact abort. It seems completely infeasible to do this at a low
level --- we'd never find all the state involved, and if we could get it
all there would be too much to save/restore efficiently.

What I think we could do, though, is record the Portal's high-level state
as the number of rows fetched from it. On abort, rewind the Portal and
then fetch that number of rows again (this is the same method used by
MOVE ABSOLUTE). We could optimize things a little bit by not doing this
repositioning until and unless the Portal is actually used again. Still,
it wouldn't be cheap...

Of course this only handles SELECT-query portals, not portals that contain
data-modification commands. But the latter cannot be suspended partway
through anyhow, so there is no scenario where we need to recover to a
partly-executed state. (Recall what I said before about not allowing
continuation of a portal that itself got an error.)

Comments?

regards, tom lane

From:	Alvaro Herrera <alvherre(at)dcc(dot)uchile(dot)cl>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	pgsql-hackers(at)postgreSQL(dot)org
Subject:	Re: Portals and nested transactions
Date:	2004-07-14 17:12:18
Message-ID:	20040714171218.GA2595@dcc.uchile.cl
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Jul 13, 2004 at 04:57:06PM -0400, Tom Lane wrote:
> I've been thinking about what to do with cursors in subtransactions.
> The problem really includes both cursors (created with DECLARE CURSOR)
> and portals (created with the V3-protocol Bind message) since they are
> the same kind of animal internally, namely a Portal.

So within this proposal, a query executed by normal means will get its
resources saved in the transaction ResourceOwner? How is the "unnamed
portal" affected by it? Supposing that the unnamed portal is treated
like any other portal (with its own ResourceOwner), we have to make sure
to shut it down properly if something "goes wrong". Not sure how this
applies to portals created by SPI.

> Q1: Should Portals successfully created within the failed subxact
> be closed? Or should they remain open?
>
> Q2: If the subxact changed the state of a pre-existing Portal, should
> that state change roll back? In particular, can a Close Portal
> operation roll back?

IMHO the transactional view is better; if we take the other approach,
then users can't just use a simple "retry loop" around a subtransaction.

> The discussion sort of trailed off there because we had no ideas how to
> implement either. I will now sketch some implementation ideas about how
> to do the nontransactional way.

Sounds excellent to me.

> We could support the transactional behavior as well, but not very
> efficiently (at least not in the first cut).

I think we should decide what behavior is best now, and not change it in
a later release. If it's going to be somewhat inefficient, try to
minimise it. But just as I decided not to support the "nested
transaction" syntax and instead change to the savepoint syntax, lets
keep things consistent. IMHO anyway.

On the other hand, some people supported the idea that v3 Bind portals
should behave nontransactionally, while DECLARE portals should behave
transactionally. Maybe we could make that a property of the portal, or
even a user-selectable property (where we would define a reasonable
default behavior).

--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"In a specialized industrial society, it would be a disaster
to have kids running around loose." (Paul Graham)

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Alvaro Herrera <alvherre(at)dcc(dot)uchile(dot)cl>
Cc:	pgsql-hackers(at)postgreSQL(dot)org
Subject:	Re: Portals and nested transactions
Date:	2004-07-14 19:11:54
Message-ID:	8633.1089832314@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Alvaro Herrera <alvherre(at)dcc(dot)uchile(dot)cl> writes:
> On Tue, Jul 13, 2004 at 04:57:06PM -0400, Tom Lane wrote:
>> I've been thinking about what to do with cursors in subtransactions.

> So within this proposal, a query executed by normal means will get its
> resources saved in the transaction ResourceOwner?

No, *all* queries are executed within portals. The reason we need a
transaction ResourceOwner is because query parsing/planning happens in
advance of creating the portal, so we need someplace to keep track of
resources acquired during that process.

> How is the "unnamed portal" affected by it?

Same as the rest.

I don't recall whether SPI creates actual portals, but we'd definitely
want it to create a new ResourceOwner for queries it runs.

> On the other hand, some people supported the idea that v3 Bind portals
> should behave nontransactionally, while DECLARE portals should behave
> transactionally. Maybe we could make that a property of the portal, or
> even a user-selectable property (where we would define a reasonable
> default behavior).

This is certainly possible. Whether it's a good idea needs further
discussion...

regards, tom lane

From:	Mike Rylander <miker(at)purplefrog(dot)com>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Portals and nested transactions
Date:	2004-07-14 19:41:07
Message-ID:	cd437v$1s8e$1@news.hub.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Tom Lane wrote:

> Alvaro Herrera <alvherre(at)dcc(dot)uchile(dot)cl> writes:
>> On Tue, Jul 13, 2004 at 04:57:06PM -0400, Tom Lane wrote:
>>> I've been thinking about what to do with cursors in subtransactions.
>
>> So within this proposal, a query executed by normal means will get its
>> resources saved in the transaction ResourceOwner?
>
> No, *all* queries are executed within portals. The reason we need a
> transaction ResourceOwner is because query parsing/planning happens in
> advance of creating the portal, so we need someplace to keep track of
> resources acquired during that process.
>
>> How is the "unnamed portal" affected by it?
>
> Same as the rest.
>
> I don't recall whether SPI creates actual portals, but we'd definitely
> want it to create a new ResourceOwner for queries it runs.
>
>> On the other hand, some people supported the idea that v3 Bind portals
>> should behave nontransactionally, while DECLARE portals should behave
>> transactionally. Maybe we could make that a property of the portal, or
>> even a user-selectable property (where we would define a reasonable
>> default behavior).
>
> This is certainly possible. Whether it's a good idea needs further
> discussion...

I didn't want to be the first to speak up on this as I'm relatively new to
the group (so thank you Alvaro), but I would definitely perfer the option
of either trans or non-trans behavior. I can see using the non-trans
behavior in a cursor based FOR loop with a savepoint/subtrans allowing me
to fail on row x and continue on to row x+1 immediately. Then, after
choosing trans-mode, I could implement a multi-strategy row processor.

Of course, just to be difficult, my ideal default would be:

Q1 -- Portals close
Q2 -- Portals do NOT roll back to previous state.

However, I do see the logical inconsistency in that. But then again,
subtransactions/savepoints are not ACID, so it seems to be implementation
dependent.

>
> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
> (send "unregister YourEmailAddressHere" to majordomo(at)postgresql(dot)org)

--
--miker

From:	Mike Rylander <miker(at)purplefrog(dot)com>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Portals and nested transactions
Date:	2004-07-14 19:52:42
Message-ID:	cd43tr$1s8e$2@news.hub.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Mike Rylander wrote:

> Tom Lane wrote:
>
>> Alvaro Herrera <alvherre(at)dcc(dot)uchile(dot)cl> writes:
>>> On Tue, Jul 13, 2004 at 04:57:06PM -0400, Tom Lane wrote:
>>>> I've been thinking about what to do with cursors in subtransactions.
>>
>>> So within this proposal, a query executed by normal means will get its
>>> resources saved in the transaction ResourceOwner?
>>
>> No, *all* queries are executed within portals. The reason we need a
>> transaction ResourceOwner is because query parsing/planning happens in
>> advance of creating the portal, so we need someplace to keep track of
>> resources acquired during that process.
>>
>>> How is the "unnamed portal" affected by it?
>>
>> Same as the rest.
>>
>> I don't recall whether SPI creates actual portals, but we'd definitely
>> want it to create a new ResourceOwner for queries it runs.
>>
>>> On the other hand, some people supported the idea that v3 Bind portals
>>> should behave nontransactionally, while DECLARE portals should behave
>>> transactionally. Maybe we could make that a property of the portal, or
>>> even a user-selectable property (where we would define a reasonable
>>> default behavior).
>>
>> This is certainly possible. Whether it's a good idea needs further
>> discussion...
>
> I didn't want to be the first to speak up on this as I'm relatively new to
> the group (so thank you Alvaro), but I would definitely perfer the option
> of either trans or non-trans behavior. I can see using the non-trans
> behavior in a cursor based FOR loop with a savepoint/subtrans allowing me
> to fail on row x and continue on to row x+1 immediately. Then, after
> choosing trans-mode, I could implement a multi-strategy row processor.
>
> Of course, just to be difficult, my ideal default would be:
>
> Q1 -- Portals close
> Q2 -- Portals do NOT roll back to previous state.
>
> However, I do see the logical inconsistency in that. But then again,
> subtransactions/savepoints are not ACID, so it seems to be implementation
> dependent.
>

To make that a little more specific, something along the lines of:

DECLARE name [ BINARY ] [ INSENSITIVE ] [ [ NO ] SCROLL ]
CURSOR [ { WITH | WITHOUT } HOLD ] FOR query
[ FOR { READ ONLY | UPDATE [ OF column [, ...] ] } ]
[ IN { LEXICAL | GLOBAL } SCOPE
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

... or some such... I think in perl. :)

>>
>> regards, tom lane
>>
>> ---------------------------(end of broadcast)---------------------------
>> TIP 2: you can get off all lists at once with the unregister command
>> (send "unregister YourEmailAddressHere" to majordomo(at)postgresql(dot)org)
>

--
--miker

From:	Oliver Jowett <oliver(at)opencloud(dot)com>
To:	Alvaro Herrera <alvherre(at)dcc(dot)uchile(dot)cl>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgreSQL(dot)org
Subject:	Re: Portals and nested transactions
Date:	2004-07-14 21:56:36
Message-ID:	40F5AC14.1030500@opencloud.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Alvaro Herrera wrote:

If this is going to happen, either the protocol-level portals need
access to all the functionality of DECLARE, or it needs to be done as a
user-selectable property of DECLARE. Currently the JDBC driver uses only
protocol-level portals, but as soon as we want to support large
scrollable or holdable ResultSets (effectively unsupported by the
current driver) it will have to use DECLARE to get access to SCROLL /
WITH HOLD.

-O

From:	Alvaro Herrera <alvherre(at)dcc(dot)uchile(dot)cl>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Portals and nested transactions
Date:	2004-07-15 02:46:41
Message-ID:	20040715024641.GB29825@dcc.uchile.cl
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Jul 14, 2004 at 03:11:54PM -0400, Tom Lane wrote:
> Alvaro Herrera <alvherre(at)dcc(dot)uchile(dot)cl> writes:
> > On Tue, Jul 13, 2004 at 04:57:06PM -0400, Tom Lane wrote:
> >> I've been thinking about what to do with cursors in subtransactions.
>
> > So within this proposal, a query executed by normal means will get its
> > resources saved in the transaction ResourceOwner?
>
> No, *all* queries are executed within portals. The reason we need a
> transaction ResourceOwner is because query parsing/planning happens in
> advance of creating the portal, so we need someplace to keep track of
> resources acquired during that process.

Ah-ha, got it (should have known better).

Do you want me to do the legwork for this to happen, or was your initial
plan to do it yourself? Either way is OK with me ...

--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"Granting software the freedom to evolve guarantees only different results,
not better ones." (Zygo Blaxell)

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Alvaro Herrera <alvherre(at)dcc(dot)uchile(dot)cl>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Portals and nested transactions
Date:	2004-07-15 04:20:59
Message-ID:	16357.1089865259@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Alvaro Herrera <alvherre(at)dcc(dot)uchile(dot)cl> writes:
> Do you want me to do the legwork for this to happen, or was your initial
> plan to do it yourself? Either way is OK with me ...

I'm working on it, should have it done in a day or so.

regards, tom lane