Re: nested transactions

From: Manfred Koizar <mkoi-pg(at)aon(dot)at>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: nested transactions
Date: 2002-11-29 17:03:56
Message-ID: 72ueuukn2vleinke8008vsbcd8o7kqkd2n@4ax.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, 28 Nov 2002 12:59:21 -0500 (EST), Bruce Momjian
<pgman(at)candle(dot)pha(dot)pa(dot)us> wrote:
>Yes, locking is one possible solution, but no one likes that. One hack
>lock idea would be to create a subtransaction-only lock, [...]
>
>> [...] without
>> having to touch the xids in the tuple headers.
>
>Yes, you could do that, but we can easily just set the clog bits
>atomically,

From what I read above I don't think we can *easily* set more than one
transaction's bits atomically.

> and it will not be needed --- the tuple bits really don't
>help us, I think.

Yes, this is what I said, or at least tried to say. I just wanted to
make clear how this new approach (use the fourth status) differs from
older proposals (replace subtransaction ids in tuple headers).

>OK, we put it in a file. And how do we efficiently clean it up?
>Remember, it is only to be used for a _brief_ period of time. I think a
>file system solution is doable if we can figure out a way not to create
>a file for every xid.

I don't want to create one file for every transaction, but rather a
huge (sparse) array of parent xids. This array is divided into
manageable chunks, represented by files, "pg_subtrans_NNNN". These
files are only created when necessary. At any time only a tiny part
of the whole array is kept in shared buffers. This concept is similar
or almost equal to pg_clog, which is an array of doublebits.

>Maybe we write the xid's to a file in a special directory in sorted
>order, and backends can do a btree search of each file in that directory
>looking for the xid, and then knowing the master xid, look up that
>status, and once all the children xid's are updated, you delete the
>file.

Yes, dense arrays or btrees are other possible implementations. But
for simplicity I'd do it pg_clog style.

>Yes, but again, the xid status of subtransactions is only update just
>before commit of the main transaction, so there is little value to
>having those visible.

Having them visible solves the atomicity problem without requiring
long locks. Updating the status of a single (main or sub) transaction
is atomic, just like it is now.

Here is what is to be done for some operations:

BEGIN main transaction:
Get a new xid (no change to current behaviour).
pg_clog[xid] is still 00, meaning active.
pg_subtrans[xid] is still 0, meaning no parent.

BEGIN subtransaction:
Push current transaction info onto local stack.
Get a new xid.
Record parent xid in pg_subtrans[xid].
pg_clog[xid] is still 00.

ROLLBACK subtransaction:
Set pg_clog[xid] to 10 (aborted).
Optionally set clog bits for subsubtransactions to 10.
Pop transaction info from stack.

COMMIT subtransaction:
Set pg_clog[xid] to 11 (committed subtrans).
Don't touch clog bits for subsubtransactions!
Pop transaction info from stack.

ROLLBACK main transaction:
Set pg_clog[xid] to 10 (aborted).
Optionally set clog bits for subtransactions to 10.

COMMIT main transaction:
Set pg_clog[xid] to 01 (committed).
Optionally set clog bits for subtransactions from 11 to 01.
Don't touch clog bits for aborted subtransactions!

Visibility check by other transactions: If a tuple is visited and its
XMIN/XMAX_IS_COMMITTED/ABORTED flags are not yet set, pg_clog has to
be consulted to find out the status of the inserting/deleting
transaction xid. If pg_clog[xid] is ...

00: transaction still active

10: aborted

01: committed

11: committed subtransaction, have to check parent

Only in this last case do we have to get parentxid from pg_subtrans.
Now we look at pg_clog[parentxid]. If we find ...

00: parent still active, so xid is considered active, too

10: parent aborted, so xid is considered aborted,
optionally set pg_clog[xid] = 10

01: parent committed, so xid is considered committed,
optionally set pg_clog[xid] = 01

11: recursively check grandparent(s) ...

For brevity the following operations are not covered in detail:
. Visibility checks for tuples inserted/deleted by a (sub)transaction
belonging to the current transaction tree (have to check local
transaction stack whenever we look at a xid or switch to a parent xid)
. HeapTupleSatisfiesUpdate (sometimes has to wait for parent
transaction)

The trick here is, that subtransaction status is immediately updated
in pg_clog on commit/abort. Main transaction commit is atomic (just
set its commit bit). Status 11 is short-lived, it is replaced with
the final status by one or more of

- COMMIT/ROLLBACK of the main transaction
- a later visibility check (as a side effect)
- VACUUM

pg_subtrans cleanup: A pg_subtrans_NNNN file covers a known range of
transaction ids. As soon as none of these transactions has a pg_clog
status of 11, the pg_subtrans_NNNN file can be removed. VACUUM can do
this, and it won't even have to check the heap.

Servus
Manfred

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Joe Conway 2002-11-29 17:14:07 Re: One SQL to access two databases.
Previous Message wade 2002-11-29 16:36:10 Re: Query performance. 7.2.3 Vs. 7.3