Re: pg_subtrans and WAL

Lists: pgsql-hackers
From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)dcc(dot)uchile(dot)cl>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: pg_subtrans and WAL
Date: 2004-08-10 16:24:06
Message-ID: 29088.1092155046@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

My discovery last night of a WAL synchronization error in pg_clog led me
to take a look at pg_subtrans too. I soon realized that in fact we are
not WAL-logging pg_subtrans updates at all: subtransaction start sets up
a pg_subtrans entry but makes no WAL entry for this action.

Seems like this is a problem.

It may be that we do not care because pg_subtrans doesn't have to be
valid after a crash, but I haven't seen any proof of that theory.
And if that theory is correct, then it is a seriously bad design to be
using the same code infrastructure for both pg_clog and pg_subtrans.
Every fsync on pg_subtrans is wasted effort if that is going to be our
approach. We should in fact just delete pg_subtrans and re-init it to
zeroes during postmaster start...

regards, tom lane


From: Alvaro Herrera Munoz <alvherre(at)dcc(dot)uchile(dot)cl>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_subtrans and WAL
Date: 2004-08-10 17:49:19
Message-ID: 20040810174919.GB29997@dcc.uchile.cl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Aug 10, 2004 at 12:24:06PM -0400, Tom Lane wrote:
> My discovery last night of a WAL synchronization error in pg_clog led me
> to take a look at pg_subtrans too. I soon realized that in fact we are
> not WAL-logging pg_subtrans updates at all: subtransaction start sets up
> a pg_subtrans entry but makes no WAL entry for this action.
>
> Seems like this is a problem.

Rats :-(

You are probably right. In fact this was on my todo list and I overlooked
it -- it was mentioned during early discussion.

Right now I am heading to Bolivia for a conference, so I am not able to
do anything codewise right now, and I'll most likely have very intermitent
connection ... I'll be back on monday 16.

--
Alvaro Herrera (<alvherre[(at)]dcc(dot)uchile(dot)cl>)
"Estoy de acuerdo contigo en que la verdad absoluta no existe...
El problema es que la mentira sí existe y tu estás mintiendo" (G. Lama)


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera Munoz <alvherre(at)dcc(dot)uchile(dot)cl>
Cc: Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_subtrans and WAL
Date: 2004-08-10 17:53:40
Message-ID: 29830.1092160420@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Alvaro Herrera Munoz <alvherre(at)dcc(dot)uchile(dot)cl> writes:
> ... I'll be back on monday 16.

Okay. It's certainly not something we must fix this week.

regards, tom lane


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Alvaro Herrera Munoz <alvherre(at)dcc(dot)uchile(dot)cl>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_subtrans and WAL
Date: 2004-08-16 16:09:31
Message-ID: 200408161609.i7GG9VQ20496@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Added to open items:

* determine proper crash recovery/logging for pg_subtrans

---------------------------------------------------------------------------

Alvaro Herrera Munoz wrote:
> On Tue, Aug 10, 2004 at 12:24:06PM -0400, Tom Lane wrote:
> > My discovery last night of a WAL synchronization error in pg_clog led me
> > to take a look at pg_subtrans too. I soon realized that in fact we are
> > not WAL-logging pg_subtrans updates at all: subtransaction start sets up
> > a pg_subtrans entry but makes no WAL entry for this action.
> >
> > Seems like this is a problem.
>
> Rats :-(
>
> You are probably right. In fact this was on my todo list and I overlooked
> it -- it was mentioned during early discussion.
>
> Right now I am heading to Bolivia for a conference, so I am not able to
> do anything codewise right now, and I'll most likely have very intermitent
> connection ... I'll be back on monday 16.
>
> --
> Alvaro Herrera (<alvherre[(at)]dcc(dot)uchile(dot)cl>)
> "Estoy de acuerdo contigo en que la verdad absoluta no existe...
> El problema es que la mentira s? existe y tu est?s mintiendo" (G. Lama)
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo(at)postgresql(dot)org
>

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Alvaro Herrera <alvherre(at)dcc(dot)uchile(dot)cl>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: pg_subtrans and WAL
Date: 2004-08-20 05:55:35
Message-ID: 20040820055535.GA10856@dcc.uchile.cl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Aug 10, 2004 at 12:24:06PM -0400, Tom Lane wrote:

> It may be that we do not care because pg_subtrans doesn't have to be
> valid after a crash, but I haven't seen any proof of that theory.

Let's suppose we crash between creating a child transaction and marking
it as done. What we have to ensure after a crash is that if we marked
the parent as committed, the child has to be marked committed too. The
WAL record carries this information already, and on recovery, the child
will be marked COMMIT.

The whole point of the subtrans info is to be available _while_ the
transaction tree is running. If there is a crash, then by definition no
backend can be running when we return, so pg_subtrans info is useless at
that point. We only need pg_clog to be correct.

> And if that theory is correct, then it is a seriously bad design to be
> using the same code infrastructure for both pg_clog and pg_subtrans.
> Every fsync on pg_subtrans is wasted effort if that is going to be our
> approach.

Right, but AFAICS both pg_clog and pg_subtrans are only fsync'ed during
checkpoint and shutdown, so it doesn't seem that costly. We could
certainly skip calling CheckPointSUBTRANS() or making it a noop ...

> We should in fact just delete pg_subtrans and re-init it to zeroes
> during postmaster start...

Is it worth the duplicated code? It won't be consulted anyway for
pre-crash Xids, because TransactionIdIsInProgress will return early by
means of RecentGlobalXmin.

On a related note: if we mark a Xid with SUBTRANS COMMIT and later crash
without updating it, the main Xid will remain in in-progress status. At
what point is it marked aborted? I can see such a status change only in
XactLockTableWait. This may be important because we will change the
subtransaction to aborted state only if we see the parent in aborted
state too.

--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"Las cosas son buenas o malas segun las hace nuestra opinión" (Lisias)


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)dcc(dot)uchile(dot)cl>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: pg_subtrans and WAL
Date: 2004-08-20 17:36:39
Message-ID: 751.1093023399@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Alvaro Herrera <alvherre(at)dcc(dot)uchile(dot)cl> writes:
> On Tue, Aug 10, 2004 at 12:24:06PM -0400, Tom Lane wrote:
>> It may be that we do not care because pg_subtrans doesn't have to be
>> valid after a crash, but I haven't seen any proof of that theory.

> The whole point of the subtrans info is to be available _while_ the
> transaction tree is running. If there is a crash, then by definition no
> backend can be running when we return, so pg_subtrans info is useless at
> that point. We only need pg_clog to be correct.

But we also have to be sure that we don't try to access the useless info
anyway. For instance some pre-crash subxacts might remain marked
SUBCOMMITTED in clog indefinitely. I think this could be worked around:
for example, TransactionIdDidCommit could assume that any SUBCOMMITTED
xact older than RecentGlobalXmin must represent a child of a crashed
parent. It shouldn't be too hard to guarantee that we never touch
pg_subtrans for XIDs older than RecentGlobalXmin. We don't have that
guarantee in place at the moment though.

>> And if that theory is correct, then it is a seriously bad design to be
>> using the same code infrastructure for both pg_clog and pg_subtrans.
>> Every fsync on pg_subtrans is wasted effort if that is going to be our
>> approach.

> Right, but AFAICS both pg_clog and pg_subtrans are only fsync'ed during
> checkpoint and shutdown, so it doesn't seem that costly. We could
> certainly skip calling CheckPointSUBTRANS() or making it a noop ...

The point is that the behaviors are fundamentally different. We have no
need for any WAL log entries for pg_subtrans; we should never fsync it;
and the rules for deciding when and where to truncate it are a lot
different (or at least should be different). I thought from the
beginning that the slru layer underneath pg_clog was bad from the point
of view of obfuscating the code, because it forced an awkward division
of labor between clog.c and slru.c. Now that I realize that there's not
that much behavior that we really want to share, I wonder whether we
shouldn't revert that change and make subtrans.c stand on its own.

> On a related note: if we mark a Xid with SUBTRANS COMMIT and later crash
> without updating it, the main Xid will remain in in-progress status. At
> what point is it marked aborted?

I do not think there's any guarantee that it ever will be so marked.
Certainly it could be a very long time until someone exhibits any
interest in that particular Xid's status...

regards, tom lane


From: Alvaro Herrera <alvherre(at)dcc(dot)uchile(dot)cl>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: pg_subtrans and WAL
Date: 2004-08-20 19:56:46
Message-ID: 20040820195645.GA7160@dcc.uchile.cl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Aug 20, 2004 at 01:36:39PM -0400, Tom Lane wrote:
> Alvaro Herrera <alvherre(at)dcc(dot)uchile(dot)cl> writes:
> > On Tue, Aug 10, 2004 at 12:24:06PM -0400, Tom Lane wrote:
> >> It may be that we do not care because pg_subtrans doesn't have to be
> >> valid after a crash, but I haven't seen any proof of that theory.
>
> > The whole point of the subtrans info is to be available _while_ the
> > transaction tree is running. If there is a crash, then by definition no
> > backend can be running when we return, so pg_subtrans info is useless at
> > that point. We only need pg_clog to be correct.
>
> But we also have to be sure that we don't try to access the useless info
> anyway.

Hmm. I've skimmed over callers of TransactionIdDid{Commit,Abort} and
they are in the following places:

- tqual.c (all after TransactionIdIsInProgress)
- heapam.c (all after XactLockTableWait)
- one in xact.c (while aborting, assert that it isn't committed)
- one in xlog.c (in #ifdef UNDO)

- lmgr.c (in XactLockTableWait)
- two in xlogutils.c

Those last two are the ones that could cause trouble; I think the others
are safe. Actually I'm not sure what the ones in xlogutils.c are about.

> For instance some pre-crash subxacts might remain marked SUBCOMMITTED
> in clog indefinitely. I think this could be worked around: for
> example, TransactionIdDidCommit could assume that any SUBCOMMITTED
> xact older than RecentGlobalXmin must represent a child of a crashed
> parent.

This would be a slight modification. sinval.c needs to export
RecentGlobalXmin, and TransactionIdDidCommit check it. It's a small
change AFAICS. Do you want me to submit a patch?

> The point is that the behaviors are fundamentally different. We have no
> need for any WAL log entries for pg_subtrans; we should never fsync it;
> and the rules for deciding when and where to truncate it are a lot
> different (or at least should be different). I thought from the
> beginning that the slru layer underneath pg_clog was bad from the point
> of view of obfuscating the code, because it forced an awkward division
> of labor between clog.c and slru.c. Now that I realize that there's not
> that much behavior that we really want to share, I wonder whether we
> shouldn't revert that change and make subtrans.c stand on its own.

Maybe you are right. I think this is a bigger change and it's not badly
needed, so it can wait to 8.1. Do you agree?

> > On a related note: if we mark a Xid with SUBTRANS COMMIT and later crash
> > without updating it, the main Xid will remain in in-progress status. At
> > what point is it marked aborted?
>
> I do not think there's any guarantee that it ever will be so marked.
> Certainly it could be a very long time until someone exhibits any
> interest in that particular Xid's status...

Right, that's what I thought.

--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"Granting software the freedom to evolve guarantees only different results,
not better ones." (Zygo Blaxell)


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)dcc(dot)uchile(dot)cl>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: pg_subtrans and WAL
Date: 2004-08-20 20:18:53
Message-ID: 2147.1093033133@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Alvaro Herrera <alvherre(at)dcc(dot)uchile(dot)cl> writes:
> On Fri, Aug 20, 2004 at 01:36:39PM -0400, Tom Lane wrote:
>> I thought from the
>> beginning that the slru layer underneath pg_clog was bad from the point
>> of view of obfuscating the code, because it forced an awkward division
>> of labor between clog.c and slru.c. Now that I realize that there's not
>> that much behavior that we really want to share, I wonder whether we
>> shouldn't revert that change and make subtrans.c stand on its own.

> Maybe you are right. I think this is a bigger change and it's not badly
> needed, so it can wait to 8.1. Do you agree?

Not really. I think truncating pg_subtrans in a more aggressive fashion
is a "must do" for 8.0 because otherwise it will be a disk hog (it is 16
times bigger than pg_clog remember). And I'm unconvinced that we don't
have crash-safety issues right now, seeing that subtrans updates are not
WAL-logged but we are using pg_clog logic that is built around the
assumption that commits *are* WAL-logged. What I want to do at minimum
is install patches that ensure we won't look back further than
RecentGlobalXmin; truncate subtrans on that basis; and change the
subtrans code to forcibly zero the current active page of pg_subtrans
during postmaster start, rather than trusting what's on disk.

By the time we do that it might be easiest to just go ahead and make the
split. I am not fearful of reverting the clog code to a prior state as
far as testing goes --- we already know the pre-slru code worked.

I'll try to look at this over the weekend.

regards, tom lane