Re: PITR - Some data is not recovered.

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: OKADA Satoshi <okada(dot)satoshi(at)lab(dot)ntt(dot)co(dot)jp>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: PITR - Some data is not recovered.
Date: 2004-08-10 21:31:07
Message-ID: 1690.1092173467@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

OKADA Satoshi <okada(dot)satoshi(at)lab(dot)ntt(dot)co(dot)jp> writes:
> I'm testing PITR using pgbench and postgresql ver.8.0bata.
> I think that result of recovery is wrong.

Many thanks for this bug report. I have developed a patch (attached)
that seems to fix it for me. Please test and see if you can still
cause the problem.

BTW, I found that it was much easier to cause the bug to happen if you
force a long commit delay. To do this you need to have
fsync = on
commit_siblings = 1
commit_delay = 100000
and while the pgbench run is going, start a new psql and do
begin;
create table foo(f1 int);
(or any other command that modifies the database). If you are watching
things with "top" you should notice a drastic decrease in the CPU
consumption of the backend connected to pgbench as soon as you do this.
Then proceed with taking the on-line backup. You can end the extra psql
session after the backup is done, to allow pgbench to go back to
normal speed.

Doing it this way I got a failure just about every time (3 failures in 3
tries) whereas without, I had seen only one failure in four tries.

regards, tom lane

Index: src/backend/access/transam/xact.c
===================================================================
RCS file: /cvsroot/pgsql-server/src/backend/access/transam/xact.c,v
retrieving revision 1.177
diff -c -r1.177 xact.c
*** src/backend/access/transam/xact.c 3 Aug 2004 15:57:26 -0000 1.177
--- src/backend/access/transam/xact.c 10 Aug 2004 20:55:48 -0000
***************
*** 574,586 ****
START_CRIT_SECTION();

/*
! * We only need to log the commit in XLOG if the transaction made
! * any transaction-controlled XLOG entries or will delete files.
* (If it made no transaction-controlled XLOG entries, its XID
* appears nowhere in permanent storage, so no one else will ever care
! * if it committed.)
*/
madeTCentries = (MyLastRecPtr.xrecoff != 0);
if (madeTCentries || nrels > 0)
{
XLogRecData rdata[3];
--- 574,601 ----
START_CRIT_SECTION();

/*
! * If our transaction made any transaction-controlled XLOG entries,
! * we need to lock out checkpoint start between writing our XLOG
! * record and updating pg_clog. Otherwise it is possible for the
! * checkpoint to set REDO after the XLOG record but fail to flush the
! * pg_clog update to disk, leading to loss of the transaction commit
! * if we crash a little later. Slightly klugy fix for problem
! * discovered 2004-08-10.
! *
* (If it made no transaction-controlled XLOG entries, its XID
* appears nowhere in permanent storage, so no one else will ever care
! * if it committed; so it doesn't matter if we lose the commit flag.)
! *
! * Note we only need a shared lock.
*/
madeTCentries = (MyLastRecPtr.xrecoff != 0);
+ if (madeTCentries)
+ LWLockAcquire(CheckpointStartLock, LW_SHARED);
+
+ /*
+ * We only need to log the commit in XLOG if the transaction made
+ * any transaction-controlled XLOG entries or will delete files.
+ */
if (madeTCentries || nrels > 0)
{
XLogRecData rdata[3];
***************
*** 668,673 ****
--- 683,692 ----
TransactionIdCommitTree(nchildren, children);
}

+ /* Unlock checkpoint lock if we acquired it */
+ if (madeTCentries)
+ LWLockRelease(CheckpointStartLock);
+
END_CRIT_SECTION();
}

***************
*** 850,855 ****
--- 869,876 ----
*
* We do not flush XLOG to disk unless deleting files, since the
* default assumption after a crash would be that we aborted, anyway.
+ * For the same reason, we don't need to worry about interlocking
+ * against checkpoint start.
*/
if (MyLastRecPtr.xrecoff != 0 || nrels > 0)
{
Index: src/backend/access/transam/xlog.c
===================================================================
RCS file: /cvsroot/pgsql-server/src/backend/access/transam/xlog.c,v
retrieving revision 1.158
diff -c -r1.158 xlog.c
*** src/backend/access/transam/xlog.c 9 Aug 2004 16:26:01 -0000 1.158
--- src/backend/access/transam/xlog.c 10 Aug 2004 20:55:49 -0000
***************
*** 4699,4704 ****
--- 4699,4713 ----
checkPoint.ThisTimeLineID = ThisTimeLineID;
checkPoint.time = time(NULL);

+ /*
+ * We must hold CheckpointStartLock while determining the checkpoint
+ * REDO pointer. This ensures that any concurrent transaction commits
+ * will be either not yet logged, or logged and recorded in pg_clog.
+ * See notes in RecordTransactionCommit().
+ */
+ LWLockAcquire(CheckpointStartLock, LW_EXCLUSIVE);
+
+ /* And we need WALInsertLock too */
LWLockAcquire(WALInsertLock, LW_EXCLUSIVE);

/*
***************
*** 4731,4736 ****
--- 4740,4746 ----
ControlFile->checkPointCopy.redo.xrecoff)
{
LWLockRelease(WALInsertLock);
+ LWLockRelease(CheckpointStartLock);
LWLockRelease(CheckpointLock);
END_CRIT_SECTION();
return;
***************
*** 4789,4794 ****
--- 4799,4807 ----
* GetSnapshotData needs to get XidGenLock while holding SInvalLock,
* so there's a risk of deadlock. Need to find a better solution. See
* pgsql-hackers discussion of 17-Dec-01.
+ *
+ * XXX actually, the whole UNDO code is dead code and unlikely to ever
+ * be revived, so the lack of a good solution here is not troubling.
*/
#ifdef NOT_USED
checkPoint.undo = GetUndoRecPtr();
***************
*** 4798,4808 ****
#endif

/*
! * Now we can release insert lock, allowing other xacts to proceed
! * even while we are flushing disk buffers.
*/
LWLockRelease(WALInsertLock);

/*
* Get the other info we need for the checkpoint record.
*/
--- 4811,4823 ----
#endif

/*
! * Now we can release insert lock and checkpoint start lock, allowing
! * other xacts to proceed even while we are flushing disk buffers.
*/
LWLockRelease(WALInsertLock);

+ LWLockRelease(CheckpointStartLock);
+
/*
* Get the other info we need for the checkpoint record.
*/
Index: src/include/storage/lwlock.h
===================================================================
RCS file: /cvsroot/pgsql-server/src/include/storage/lwlock.h,v
retrieving revision 1.12
diff -c -r1.12 lwlock.h
*** src/include/storage/lwlock.h 11 Jun 2004 16:43:24 -0000 1.12
--- src/include/storage/lwlock.h 10 Aug 2004 20:55:49 -0000
***************
*** 36,41 ****
--- 36,42 ----
WALWriteLock,
ControlFileLock,
CheckpointLock,
+ CheckpointStartLock,
RelCacheInitLock,
BgWriterCommLock,

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Gavin Sherry 2004-08-10 21:31:45 Re: ErrorContextCallback
Previous Message Matthew T. O'Connor 2004-08-10 20:40:40 Re: pg_autovacuum Win32 Service Code