Re: [PATCHES] Infrastructure changes for recovery (v8)

Lists: pgsql-hackerspgsql-patches
From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Cc: List pgsql-patches <pgsql-patches(at)postgresql(dot)org>
Subject: Infrastructure changes for recovery (v8)
Date: 2008-09-30 22:52:31
Message-ID: 1222815151.4445.1397.camel@ebony.2ndQuadrant
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Patch now includes all previous agreed changes, plus I've found what
looks to be a workable method of removing the shutdown checkpoint
without loss of robustness.

Patch summary

Tuning
* Bgwriter performs dirty block cleaning during recovery
* Bgwriter performs restartpoints, offloading this task from Startup
process to allow it to continue with recovery actions
* Shutdown checkpoint removed at end of recovery. Bgwriter performs
immediate checkpoint instead, so we have same protection, but
connections and transactions can be started earlier than previously.
* PreAllocXLogs() not performed by startup process, so we do not delay
startup while we write zeroes to next WAL file. bgwriter does that now.
* XLogCtl structure padding for enhanced scalability

Recovery State Changes
* If archive recovery proceeds past a safe stopping point we signal the
postmaster that database is now in a consistent state, PM_RECOVERY. This
state change is also linked to startup of the bgwriter and stats
processes (and will in the future be the place where read only backends
may connect also)
* optional recovery_safe_start_location parameter now provided in
recovery.conf, to allow a consistency point to be manually defined if a
base backup was not taken using standard pg_start/stop backup functions
* New minSafeStopPoint added to controlfile to allow us to determine
consistency if archive recovery crashes/restarts. Value is updated each
time we access new WAL file.
* stats file removed earlier in recovery, so we may accumulate new stats
during recovery
* End of recovery is now marked by a clear global state change. Change
is global, atomic and fast - tested for using IsRecoveryProcessingMode()

Additional Safeguards
* Locks are placed around all ControlFile operations
* XLogInsert() and AssignTransactionId() now have specific checks to
prevent their use during recovery
* Makes StartupMultiXact() atomic. Adds comments to show that
StartCLOG() is already atomic, though StartupSUBTRANS() is not (this
will be addressed in a later patch, so not touched here)
* recovery.conf is not removed until slightly later now, to protect
against crash at the end of startup
* New WAL record XLOG_RECOVERY_END is now only place where timelineid
may change

Other Changes
* log_restartpoints removed, use log_checkpoints in postgresql.conf
* pg_controldata and pg_resetxlog changed to show safe start point
* designed to work in EXEC_BACKEND mode for Windows
* additional function signature for pg_start_backup('label', true |
false) to allow definition of immediate checkpoint/not
* doc changes for recovery.conf parameters
* fixes bug discovered while other testing: if pg_stop_backup() is run
when xlogswitch has just occurred then we do not switch log files, yet
we return current filename even though nothing of value in it. If
archive_timeout not enabled we would wait forever for pg_stop_backup()
to return.
* Substantial comments throughout

Patch is now v8.

doc/src/sgml/backup.sgml | 30 !
doc/src/sgml/func.sgml | 12
src/backend/access/transam/clog.c | 3
src/backend/access/transam/multixact.c | 14
src/backend/access/transam/subtrans.c | 3
src/backend/access/transam/xact.c | 3
src/backend/access/transam/xlog.c | 783 ++++++++++++++-!!!!!!!!!!!!!!!
src/backend/postmaster/bgwriter.c | 418 +++--!!!!!!!!!
src/backend/postmaster/postmaster.c | 62 +!
src/backend/storage/buffer/README | 9
src/bin/pg_controldata/pg_controldata.c | 3
src/bin/pg_resetxlog/pg_resetxlog.c | 2
src/include/access/xlog.h | 14
src/include/access/xlog_internal.h | 4
src/include/catalog/pg_control.h | 3
src/include/postmaster/bgwriter.h | 6
src/include/storage/pmsignal.h | 1
src/test/regress/expected/opr_sanity.out | 7
18 files changed, 579 insertions(+), 79 deletions(-), 719 modifications(!)

Please review everybody. Many thanks.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support

Attachment Content-Type Size
recovery_infrastruc.v8.patch text/x-patch 85.6 KB

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCHES] Infrastructure changes for recovery (v8)
Date: 2008-10-08 11:43:01
Message-ID: 48EC9CC5.6060803@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Simon Riggs wrote:
> * optional recovery_safe_start_location parameter now provided in
> recovery.conf, to allow a consistency point to be manually defined if a
> base backup was not taken using standard pg_start/stop backup functions

Do we want to cater for that? It only seems safe if you have
full_page_writes turned on, and you perform a checkpoint first. But if
you do that, why don't you just use pg_start_backup()?

> Other Changes
> * log_restartpoints removed, use log_checkpoints in postgresql.conf

Is this something that would make sense regardless of the rest of the
patch? If so, we could apply that separately, which would make this
patch a little less overwhelming to review.

> * additional function signature for pg_start_backup('label', true |
> false) to allow definition of immediate checkpoint/not

Wouldn't this need a new entry in pg_proc.h? Again, perhaps we should do
this as a separate patch.

> * fixes bug discovered while other testing: if pg_stop_backup() is run
> when xlogswitch has just occurred then we do not switch log files, yet
> we return current filename even though nothing of value in it. If
> archive_timeout not enabled we would wait forever for pg_stop_backup()
> to return.
> * Substantial comments throughout

These comments on CheckPointLock seem contradictory:

> --- 247,256 ----
> * ControlFileLock: must be held to read/update control file or create
> * new log file.
> *
> ! * CheckpointLock: must be held to do a checkpoint or restartpoint, ensuring
> ! * we get just one of those at any time. In 8.4+ recovery, both startup and
> ! * bgwriter processes may take restartpoints, so this locking must be strict
> ! * to ensure there are no mistakes.
> *
> *----------
> */

and

> --- 5901,5916 ----
> XLogRecPtr recptr;
> XLogCtlInsert *Insert = &XLogCtl->Insert;
> XLogRecData rdata;
> uint32 _logId;
> uint32 _logSeg;
> TransactionId *inCommitXids;
> int nInCommit;
> + bool leavingArchiveRecovery = false;
>
> /*
> * Acquire CheckpointLock to ensure only one checkpoint happens at a time.
> ! * That shouldn't be happening, but checkpoints are an important aspect
> ! * of our resilience, so we take no chances.
> */
> LWLockAcquire(CheckpointLock, LW_EXCLUSIVE);
>

If I've understood the patch correctly, only bgwriter does checkpoints
and restart points now?

There's a trivial merge conflict in bgwriter.c, due to the FSM patch.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCHES] Infrastructure changes for recovery (v8)
Date: 2008-10-08 12:24:20
Message-ID: 1223468660.4747.294.camel@ebony.2ndQuadrant
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


On Wed, 2008-10-08 at 14:43 +0300, Heikki Linnakangas wrote:
> Simon Riggs wrote:
> > * optional recovery_safe_start_location parameter now provided in
> > recovery.conf, to allow a consistency point to be manually defined if a
> > base backup was not taken using standard pg_start/stop backup functions
>
> Do we want to cater for that? It only seems safe if you have
> full_page_writes turned on, and you perform a checkpoint first. But if
> you do that, why don't you just use pg_start_backup()?

I'm easy on that one. It is a supported backup method, so without this
it would not be possible to utilise Hot Standby in conjunction with this
backup technique. Not many people use it, but I guess some do.

> > Other Changes
> > * log_restartpoints removed, use log_checkpoints in postgresql.conf
>
> Is this something that would make sense regardless of the rest of the
> patch? If so, we could apply that separately, which would make this
> patch a little less overwhelming to review.

Maybe, it's fairly minor.

> > * additional function signature for pg_start_backup('label', true |
> > false) to allow definition of immediate checkpoint/not
>
> Wouldn't this need a new entry in pg_proc.h? Again, perhaps we should do
> this as a separate patch.

That's concerning. I remember adding the entry and assigning a new oid,
but it isn't in the patch. The multi-argument version was definitely
tested, that's how I discovered the bug also fixed in the patch.

> > * fixes bug discovered while other testing: if pg_stop_backup() is run
> > when xlogswitch has just occurred then we do not switch log files, yet
> > we return current filename even though nothing of value in it. If
> > archive_timeout not enabled we would wait forever for pg_stop_backup()
> > to return.

OK, I'll strip all of the above out, for separate consideration.

> > * Substantial comments throughout
>
> These comments on CheckPointLock seem contradictory:
>
> > --- 247,256 ----
> > * ControlFileLock: must be held to read/update control file or create
> > * new log file.
> > *
> > ! * CheckpointLock: must be held to do a checkpoint or restartpoint, ensuring
> > ! * we get just one of those at any time. In 8.4+ recovery, both startup and
> > ! * bgwriter processes may take restartpoints, so this locking must be strict
> > ! * to ensure there are no mistakes.
> > *
> > *----------
> > */
>
> and
>
> > --- 5901,5916 ----
> > XLogRecPtr recptr;
> > XLogCtlInsert *Insert = &XLogCtl->Insert;
> > XLogRecData rdata;
> > uint32 _logId;
> > uint32 _logSeg;
> > TransactionId *inCommitXids;
> > int nInCommit;
> > + bool leavingArchiveRecovery = false;
> >
> > /*
> > * Acquire CheckpointLock to ensure only one checkpoint happens at a time.
> > ! * That shouldn't be happening, but checkpoints are an important aspect
> > ! * of our resilience, so we take no chances.
> > */
> > LWLockAcquire(CheckpointLock, LW_EXCLUSIVE);
> >
>
> If I've understood the patch correctly, only bgwriter does checkpoints
> and restart points now?

Tom requested that we retain the ability for Startup process to perform
restartpoints up until the point that bgwriter spawns, then after that
bgwriter performs them.

The form is this

PM_START startup process performs restartpoints
transition when database is consistent state
PM_RECOVERY bgwriter process performs restartpoints
delicate transition between two states
PM_RUN bgwriter process performs checkpoints

> There's a trivial merge conflict in bgwriter.c, due to the FSM patch.

OK, will look.

Thanks for looking.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCHES] Infrastructure changes for recovery (v8)
Date: 2008-10-08 13:34:58
Message-ID: 1223472898.4747.310.camel@ebony.2ndQuadrant
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


On Wed, 2008-10-08 at 13:24 +0100, Simon Riggs wrote:
> On Wed, 2008-10-08 at 14:43 +0300, Heikki Linnakangas wrote:

> > Again, perhaps we should do
> > this as a separate patch.

New patch enclosed with other stuff stripped out. 15% lighter...

New thread spawned for bug fix patch. Will resubmit others when we can
be sure they'll not cause patch conflicts.

> OK, I'll strip all of the above out, for separate consideration.
> The form is this

> PM_START startup process performs restartpoints
> transition when database is consistent state
> PM_RECOVERY bgwriter process performs restartpoints
> delicate transition between two states
> PM_RUN bgwriter process performs checkpoints

Above added as comments in patch.

The patch agonises over the two state transitions above. First
transition needs to be exactly correct otherwise we might be using an
inconsistent database during recovery. Second transition is harder
because it isn't just the startup process working alone any more.

The key to understanding it is all in concurrent behaviour. Startup and
bgwriter chat together through bgwriter shared memory and call functions
back and forth between xlog.c and bgwriter.c.

I haven't retested the patch yet, but it passes make check. I'll be
rechecking it later today, starting in about 2 hours time.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support

Attachment Content-Type Size
recovery_infrastruc.v9.patch text/x-patch 76.2 KB

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, List pgsql-patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: [PATCHES] Infrastructure changes for recovery (v8)
Date: 2008-11-07 23:44:51
Message-ID: 1226101491.31701.29.camel@dell.linuxdev.us.dell.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Tue, 2008-09-30 at 23:52 +0100, Simon Riggs wrote:
> * optional recovery_safe_start_location parameter now provided in
> recovery.conf, to allow a consistency point to be manually defined if a
> base backup was not taken using standard pg_start/stop backup functions

If using synchronous replication, it seems like this may be useful. For
instance, if the primary server A fails (let's assume power off
failure), then you make the secondary server B the new primary and start
committing transactions, and then you want to bring A back up as a
secondary to B.

Will server A know where to start recovering from, even if many
checkpoints have happened on server B in the meantime? Is there a way to
avoid wiping A and making a new base backup?

Are the safety issues that Heikki brought up potentially solvable, or am
I asking for the impossible?

And also, what if server A is shut down cleanly? Is there any way at all
to get it into recovery mode to catch up with B, or would it require a
new base backup?

I haven't read through the entire thread, so I apologize if this
question has been answered elsewhere.

Regards,
Jeff Davis


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCHES] Infrastructure changes for recovery (v8)
Date: 2008-11-08 09:49:36
Message-ID: 1226137776.27904.189.camel@ebony.2ndQuadrant
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


On Fri, 2008-11-07 at 15:44 -0800, Jeff Davis wrote:

> Is there a way to avoid wiping A and making a new base backup?

rsync

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCHES] Infrastructure changes for recovery (v8)
Date: 2008-11-17 13:51:30
Message-ID: 492176E2.1000006@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Simon Riggs wrote:
> diff --git a/src/backend/access/transam/subtrans.c b/src/backend/access/transam/
> index 063b366..5e64cb4 100644
> --- a/src/backend/access/transam/subtrans.c
> +++ b/src/backend/access/transam/subtrans.c
> @@ -226,6 +226,9 @@ ZeroSUBTRANSPage(int pageno)
> *
> * oldestActiveXID is the oldest XID of any prepared transaction, or nextXid
> * if there are none.
> + *
> + * Note that this is not atomic and is not yet safe to perform while other
> + * processes might access subtrans.
> */
> void
> StartupSUBTRANS(TransactionId oldestActiveXID)

I'm a bit confused by that comment. Does that need to be fixed? It
sounds like it does, because other processes might access subtrans when
StartupSUBTRANS is called, with the patch to allow read-only queries
during recovery. Or is that done in the hot standby patch?

However, I don't see why that isn't safe. StartupSUBTRANS takes the
SubtransControlLock in exclusive mode while it zeroes out subtrans.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCHES] Infrastructure changes for recovery (v8)
Date: 2008-11-17 14:18:39
Message-ID: 49217D3F.9050603@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Simon Riggs wrote:
> @@ -3845,6 +3850,52 @@ sigusr1_handler(SIGNAL_ARGS)
>
> PG_SETMASK(&BlockSig);
>
> + if (CheckPostmasterSignal(PMSIGNAL_RECOVERY_START))
> + {
> + Assert(pmState == PM_STARTUP);
> +
> + /*
> + * Go to shutdown mode if a shutdown request was pending.
> + */
> + if (Shutdown > NoShutdown)
> + {
> + pmState = PM_WAIT_BACKENDS;
> + /* PostmasterStateMachine logic does the rest */
> + }
> + else
> + {
> + /*
> + * Startup process has entered recovery
> + */
> + pmState = PM_RECOVERY;

Hmm, I smell a race condition here:

1. Startup process goes into consistent state, and signals postmaster
2. Startup process finishes WAL replay and dies
3. Postmaster wakes up in reaper(), noting that the startup process
dies, and goes into PM_RUN mode.
4. The above signal handler for postmaster is run, causing an assertion
failure, or putting postmaster back into PM_RECOVERY mode if assertions
are disabled.

Highly unlikely in practice, given how much code needs to run in the
startup process between signaling the postmaster and exiting, but it
seems theoretically possible. Do we care, and if we do, how can we fix it?

> +
> + /*
> + * Load the flat authorization file into postmaster's ca
> + * startup process won't have recomputed this from the d
> + * yet, so it may change following recovery.
> + */
> + load_role();

Is there a race condition here too, if the startup process is writing
the auth file at the same time? I guess we'd have the same problem with
flat files in general, so maybe there's something else that mitigates
the problem?

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCHES] Infrastructure changes for recovery (v8)
Date: 2008-11-17 14:54:12
Message-ID: 49218594.5040000@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

This comment in XLogFlush is no longer accurate:

> * The current approach is to ERROR under normal conditions, but only
> * WARNING during recovery, so that the system can be brought up even if
> * there's a corrupt LSN. Note that for calls from xact.c, the ERROR will
> * be promoted to PANIC since xact.c calls this routine inside a critical
> * section. However, calls from bufmgr.c are not within critical sections
> * and so we will not force a restart for a bad LSN on a data page.
> */
> if (XLByteLT(LogwrtResult.Flush, record))
> elog(ERROR,
> "xlog flush request %X/%X is not satisfied --- flushed only to %X/%X",
> record.xlogid, record.xrecoff,
> LogwrtResult.Flush.xlogid, LogwrtResult.Flush.xrecoff);

Because of this hunk:

> ***************
> *** 1822,1828 **** XLogFlush(XLogRecPtr record)
> * and so we will not force a restart for a bad LSN on a data page.
> */
> if (XLByteLT(LogwrtResult.Flush, record))
> ! elog(InRecovery ? WARNING : ERROR,
> "xlog flush request %X/%X is not satisfied --- flushed only to %X/%X",
> record.xlogid, record.xrecoff,
> LogwrtResult.Flush.xlogid, LogwrtResult.Flush.xrecoff);
> --- 1874,1880 ----
> * and so we will not force a restart for a bad LSN on a data page.
> */
> if (XLByteLT(LogwrtResult.Flush, record))
> ! elog(ERROR,
> "xlog flush request %X/%X is not satisfied --- flushed only to %X/%X",
> record.xlogid, record.xrecoff,
> LogwrtResult.Flush.xlogid, LogwrtResult.Flush.xrecoff);

I'm not sure what the most robust behavior would be.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCHES] Infrastructure changes for recovery (v8)
Date: 2008-11-17 15:31:16
Message-ID: 1226935876.3790.10.camel@ebony.2ndQuadrant
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


On Mon, 2008-11-17 at 15:51 +0200, Heikki Linnakangas wrote:
> Simon Riggs wrote:
> > diff --git a/src/backend/access/transam/subtrans.c b/src/backend/access/transam/
> > index 063b366..5e64cb4 100644
> > --- a/src/backend/access/transam/subtrans.c
> > +++ b/src/backend/access/transam/subtrans.c
> > @@ -226,6 +226,9 @@ ZeroSUBTRANSPage(int pageno)
> > *
> > * oldestActiveXID is the oldest XID of any prepared transaction, or nextXid
> > * if there are none.
> > + *
> > + * Note that this is not atomic and is not yet safe to perform while other
> > + * processes might access subtrans.
> > */
> > void
> > StartupSUBTRANS(TransactionId oldestActiveXID)
>
> I'm a bit confused by that comment. Does that need to be fixed?

It is, in a later version. Apologies if you're reviewing the wrong one.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCHES] Infrastructure changes for recovery (v8)
Date: 2008-11-17 15:33:13
Message-ID: 1226935993.3790.12.camel@ebony.2ndQuadrant
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


On Mon, 2008-11-17 at 16:18 +0200, Heikki Linnakangas wrote:

> > +
> > + /*
> > + * Load the flat authorization file into
> postmaster's ca
> > + * startup process won't have recomputed
> this from the d
> > + * yet, so it may change following recovery.
> > + */
> > + load_role();
>
> Is there a race condition here too, if the startup process is writing
> the auth file at the same time? I guess we'd have the same problem
> with flat files in general, so maybe there's something else that
> mitigates the problem?

The flat file is not race condition proof. When the file is read it is
just a guide and the real data is re-accessed from catalog. So the
problem you see does exist, but is handled elsewhere - not in this
patch.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCHES] Infrastructure changes for recovery (v8)
Date: 2008-11-17 15:39:42
Message-ID: 1226936382.3790.19.camel@ebony.2ndQuadrant
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


On Mon, 2008-11-17 at 16:18 +0200, Heikki Linnakangas wrote:
> Simon Riggs wrote:
> > @@ -3845,6 +3850,52 @@ sigusr1_handler(SIGNAL_ARGS)
> >
> > PG_SETMASK(&BlockSig);
> >
> > + if (CheckPostmasterSignal(PMSIGNAL_RECOVERY_START))
> > + {
> > + Assert(pmState == PM_STARTUP);
> > +
> > + /*
> > + * Go to shutdown mode if a shutdown request was pending.
> > + */
> > + if (Shutdown > NoShutdown)
> > + {
> > + pmState = PM_WAIT_BACKENDS;
> > + /* PostmasterStateMachine logic does the rest */
> > + }
> > + else
> > + {
> > + /*
> > + * Startup process has entered recovery
> > + */
> > + pmState = PM_RECOVERY;
>
>
> Hmm, I smell a race condition here:
>
> 1. Startup process goes into consistent state, and signals postmaster
> 2. Startup process finishes WAL replay and dies
> 3. Postmaster wakes up in reaper(), noting that the startup process
> dies, and goes into PM_RUN mode.
> 4. The above signal handler for postmaster is run, causing an assertion
> failure, or putting postmaster back into PM_RECOVERY mode if assertions
> are disabled.
>
> Highly unlikely in practice, given how much code needs to run in the
> startup process between signaling the postmaster and exiting, but it
> seems theoretically possible. Do we care, and if we do, how can we fix it?

Might be possible - it does depend on the sequence of actions its true.
Agree not likely to happen, except as the result of another bug.

I'll change it to a test for

if (pmState == PM_STARTUP)
pmState = PM_RECOVERY;

The assertion was mainly for documentation, its not protecting anything
critical (IIRC).

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support


From: "Fujii Masao" <masao(dot)fujii(at)gmail(dot)com>
To: "Simon Riggs" <simon(at)2ndquadrant(dot)com>
Cc: "Heikki Linnakangas" <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCHES] Infrastructure changes for recovery (v8)
Date: 2008-11-18 01:53:13
Message-ID: 3f0b79eb0811171753u76c2b78du8e77acfc3b5280c0@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Tue, Nov 18, 2008 at 12:39 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>
> On Mon, 2008-11-17 at 16:18 +0200, Heikki Linnakangas wrote:
>> Simon Riggs wrote:
>> > @@ -3845,6 +3850,52 @@ sigusr1_handler(SIGNAL_ARGS)
>> >
>> > PG_SETMASK(&BlockSig);
>> >
>> > + if (CheckPostmasterSignal(PMSIGNAL_RECOVERY_START))
>> > + {
>> > + Assert(pmState == PM_STARTUP);
>> > +
>> > + /*
>> > + * Go to shutdown mode if a shutdown request was pending.
>> > + */
>> > + if (Shutdown > NoShutdown)
>> > + {
>> > + pmState = PM_WAIT_BACKENDS;
>> > + /* PostmasterStateMachine logic does the rest */
>> > + }
>> > + else
>> > + {
>> > + /*
>> > + * Startup process has entered recovery
>> > + */
>> > + pmState = PM_RECOVERY;
>>
>>
>> Hmm, I smell a race condition here:
>>
>> 1. Startup process goes into consistent state, and signals postmaster
>> 2. Startup process finishes WAL replay and dies
>> 3. Postmaster wakes up in reaper(), noting that the startup process
>> dies, and goes into PM_RUN mode.
>> 4. The above signal handler for postmaster is run, causing an assertion
>> failure, or putting postmaster back into PM_RECOVERY mode if assertions
>> are disabled.
>>
>> Highly unlikely in practice, given how much code needs to run in the
>> startup process between signaling the postmaster and exiting, but it
>> seems theoretically possible. Do we care, and if we do, how can we fix it?
>
> Might be possible - it does depend on the sequence of actions its true.
> Agree not likely to happen, except as the result of another bug.
>
> I'll change it to a test for
>
> if (pmState == PM_STARTUP)
> pmState = PM_RECOVERY;

Likewise, should we also change the assertion against the pid of the
background process (BgWriterPID, PgStatPID)?

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


From: "Pavan Deolasee" <pavan(dot)deolasee(at)gmail(dot)com>
To: "Simon Riggs" <simon(at)2ndquadrant(dot)com>
Cc: "Heikki Linnakangas" <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCHES] Infrastructure changes for recovery (v8)
Date: 2008-11-20 05:36:21
Message-ID: 2e78013d0811192136o5b722ddyb1562658508dcc5a@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Mon, Nov 17, 2008 at 9:01 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:

>
> It is, in a later version. Apologies if you're reviewing the wrong one.
>
>
The most recent version I can find is v9, but I remember you mentioned v10
somewhere else.
Can you please confirm if v9 is the latest version and point to the latest
version if not ? I've some free cycles and would like to help with the
review process.

Thanks,
Pavan

--

Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCHES] Infrastructure changes for recovery (v8)
Date: 2008-11-20 09:42:02
Message-ID: 1227174122.7015.5.camel@ebony.2ndQuadrant
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


On Thu, 2008-11-20 at 11:06 +0530, Pavan Deolasee wrote:
>
>
> On Mon, Nov 17, 2008 at 9:01 PM, Simon Riggs <simon(at)2ndquadrant(dot)com>
> wrote:
>
>
> It is, in a later version. Apologies if you're reviewing the
> wrong one.
>
>
>
> The most recent version I can find is v9, but I remember you mentioned
> v10 somewhere else.
> Can you please confirm if v9 is the latest version and point to the
> latest version if not ? I've some free cycles and would like to help
> with the review process.

The latest Hot Standby patch includes the latest version of
"infrastructure changes" patch. Thanks for reviewing.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support


From: "Pavan Deolasee" <pavan(dot)deolasee(at)gmail(dot)com>
To: "Simon Riggs" <simon(at)2ndquadrant(dot)com>
Cc: "Heikki Linnakangas" <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCHES] Infrastructure changes for recovery (v8)
Date: 2008-11-20 09:49:36
Message-ID: 2e78013d0811200149t29173598m5eb70f62c1dc02d4@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Thu, Nov 20, 2008 at 3:12 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:

>
>
> The latest Hot Standby patch includes the latest version of
> "infrastructure changes" patch. Thanks for reviewing.
>
>
Do you intend to split the patch into smaller pieces ? The latest hot
standby patch is almost 10K+ lines. Splitting that would definitely help the
review process.

Thanks,
Pavan

--
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCHES] Infrastructure changes for recovery (v8)
Date: 2008-11-20 10:10:05
Message-ID: 1227175805.7015.22.camel@ebony.2ndQuadrant
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


On Thu, 2008-11-20 at 15:19 +0530, Pavan Deolasee wrote:

> Do you intend to split the patch into smaller pieces ? The latest hot
> standby patch is almost 10K+ lines. Splitting that would definitely
> help the review process.

If it helps you, then I'll do it. Hang on an hour or so.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCHES] Infrastructure changes for recovery (v8)
Date: 2008-12-18 01:31:21
Message-ID: 1229563881.4793.264.camel@ebony.2ndQuadrant
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


On Thu, 2008-11-20 at 10:10 +0000, Simon Riggs wrote:
> On Thu, 2008-11-20 at 15:19 +0530, Pavan Deolasee wrote:
>
> > Do you intend to split the patch into smaller pieces ? The latest hot
> > standby patch is almost 10K+ lines. Splitting that would definitely
> > help the review process.
>
> If it helps you, then I'll do it. Hang on an hour or so.

I've posted a slightly subdivided patch now via Wiki.

Putting "infrastructure" and "hot standby" together was fairly easy, but
splitting them apart has not been and I was unable to complete that
after a lot of hacking.

If you wouldn't mind looking at the major subsystems some more, I'm
happy to attempt some further parceling to make it easier for you to
review. I'm not completely certain the "infra" v "hot standby" is a good
split point anyway.

Please let me know how I can make the reviewer's job easier. Diagrams,
writeups, whatever. Thanks,

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCHES] Infrastructure changes for recovery (v8)
Date: 2008-12-18 02:32:12
Message-ID: 20081218023212.GO4453@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Simon Riggs escribió:

> Please let me know how I can make the reviewer's job easier. Diagrams,
> writeups, whatever. Thanks,

A link perhaps?

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCHES] Infrastructure changes for recovery (v8)
Date: 2008-12-18 02:50:49
Message-ID: 1229568649.4793.296.camel@ebony.2ndQuadrant
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


On Wed, 2008-12-17 at 23:32 -0300, Alvaro Herrera wrote:
> Simon Riggs escribió:
>
> > Please let me know how I can make the reviewer's job easier. Diagrams,
> > writeups, whatever. Thanks,
>
> A link perhaps?

There is much confusion on this point for which I'm very sorry.

I originally wrote "infra" patch to allow it to be committed separately
in the Sept commitfest, to reduce size of the forthcoming hotstandby
patch. That didn't happen (no moans there) so the eventual "hotstandby"
patch includes all of what was the infra patch, plus the new code.

So currently there is no separate "infra" patch. The two line items on
the CommitFest page are really just one large project. I would be in
favour of removing the "infra" lines from the CommitFest page.

Of course we can consider "hotstandby" patch in parts, but
deconstructing it into wholly separate patches doesn't make much sense
now and would raise many questions about why certain code exists with no
apparent function or why certain design choices made.

If you were to review a part of this, I might ask that you look at the
changes to XidInMVCCSnapshot(), GetSnapshotData() and
AssignTransactionId(), which relate specifically to subtransaction
handling. Comments explain the new approach.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCHES] Infrastructure changes for recovery (v8)
Date: 2008-12-22 20:18:57
Message-ID: 494FF631.90908@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Simon Riggs wrote:
> On Wed, 2008-12-17 at 23:32 -0300, Alvaro Herrera wrote:
>> Simon Riggs escribió:
>>
>>> Please let me know how I can make the reviewer's job easier. Diagrams,
>>> writeups, whatever. Thanks,
>> A link perhaps?
>
> There is much confusion on this point for which I'm very sorry.
>
> I originally wrote "infra" patch to allow it to be committed separately
> in the Sept commitfest, to reduce size of the forthcoming hotstandby
> patch. That didn't happen (no moans there) so the eventual "hotstandby"
> patch includes all of what was the infra patch, plus the new code.
>
> So currently there is no separate "infra" patch. The two line items on
> the CommitFest page are really just one large project. I would be in
> favour of removing the "infra" lines from the CommitFest page.

I think it's useful to review the "infra" part of the patch separately,
so I split it out of the big patch again. I haven't looked at this in
detail yet, but it compiles and passes regression tests.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Attachment Content-Type Size
recovery-infra-separated-again-1.patch text/x-diff 86.9 KB

From: "Fujii Masao" <masao(dot)fujii(at)gmail(dot)com>
To: "Heikki Linnakangas" <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: "Simon Riggs" <simon(at)2ndquadrant(dot)com>, "Alvaro Herrera" <alvherre(at)commandprompt(dot)com>, "Pavan Deolasee" <pavan(dot)deolasee(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCHES] Infrastructure changes for recovery (v8)
Date: 2008-12-23 05:30:10
Message-ID: 3f0b79eb0812222130i5f0bfb00s9cc8edd712e11377@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Tue, Dec 23, 2008 at 5:18 AM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> Simon Riggs wrote:
>>
>> On Wed, 2008-12-17 at 23:32 -0300, Alvaro Herrera wrote:
>>>
>>> Simon Riggs escribió:
>>>
>>>> Please let me know how I can make the reviewer's job easier. Diagrams,
>>>> writeups, whatever. Thanks,
>>>
>>> A link perhaps?
>>
>> There is much confusion on this point for which I'm very sorry.
>>
>> I originally wrote "infra" patch to allow it to be committed separately
>> in the Sept commitfest, to reduce size of the forthcoming hotstandby
>> patch. That didn't happen (no moans there) so the eventual "hotstandby"
>> patch includes all of what was the infra patch, plus the new code.
>>
>> So currently there is no separate "infra" patch. The two line items on
>> the CommitFest page are really just one large project. I would be in
>> favour of removing the "infra" lines from the CommitFest page.
>
> I think it's useful to review the "infra" part of the patch separately, so I
> split it out of the big patch again. I haven't looked at this in detail yet,
> but it compiles and passes regression tests.

Super! I would fix synch rep code based on this patch.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCHES] Infrastructure changes for recovery (v8)
Date: 2008-12-23 09:54:33
Message-ID: 1230026073.4793.752.camel@ebony.2ndQuadrant
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


On Mon, 2008-12-22 at 22:18 +0200, Heikki Linnakangas wrote:

> I think it's useful to review the "infra" part of the patch separately,
> so I split it out of the big patch again. I haven't looked at this in
> detail yet, but it compiles and passes regression tests.

OK, thanks, much appreciated.

The patches were fairly distinct in their features, though choosing an
exact split line could be done in a number of ways.

I'll look through this in more detail today and get back to you.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support


From: "Fujii Masao" <masao(dot)fujii(at)gmail(dot)com>
To: "Simon Riggs" <simon(at)2ndquadrant(dot)com>
Cc: "Heikki Linnakangas" <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCHES] Infrastructure changes for recovery (v8)
Date: 2008-12-29 06:06:58
Message-ID: 3f0b79eb0812282206w5e930a54m8c9e7768cc3d819b@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Hi,

On Tue, Nov 18, 2008 at 12:39 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>
> On Mon, 2008-11-17 at 16:18 +0200, Heikki Linnakangas wrote:
>> Simon Riggs wrote:
>> > @@ -3845,6 +3850,52 @@ sigusr1_handler(SIGNAL_ARGS)
>> >
>> > PG_SETMASK(&BlockSig);
>> >
>> > + if (CheckPostmasterSignal(PMSIGNAL_RECOVERY_START))
>> > + {
>> > + Assert(pmState == PM_STARTUP);
>> > +
>> > + /*
>> > + * Go to shutdown mode if a shutdown request was pending.
>> > + */
>> > + if (Shutdown > NoShutdown)
>> > + {
>> > + pmState = PM_WAIT_BACKENDS;
>> > + /* PostmasterStateMachine logic does the rest */
>> > + }
>> > + else
>> > + {
>> > + /*
>> > + * Startup process has entered recovery
>> > + */
>> > + pmState = PM_RECOVERY;
>>
>>
>> Hmm, I smell a race condition here:
>>
>> 1. Startup process goes into consistent state, and signals postmaster
>> 2. Startup process finishes WAL replay and dies
>> 3. Postmaster wakes up in reaper(), noting that the startup process
>> dies, and goes into PM_RUN mode.
>> 4. The above signal handler for postmaster is run, causing an assertion
>> failure, or putting postmaster back into PM_RECOVERY mode if assertions
>> are disabled.
>>
>> Highly unlikely in practice, given how much code needs to run in the
>> startup process between signaling the postmaster and exiting, but it
>> seems theoretically possible. Do we care, and if we do, how can we fix it?
>
> Might be possible - it does depend on the sequence of actions its true.
> Agree not likely to happen, except as the result of another bug.
>
> I'll change it to a test for
>
> if (pmState == PM_STARTUP)
> pmState = PM_RECOVERY;
>
> The assertion was mainly for documentation, its not protecting anything
> critical (IIRC).

This seems to have not been fixed yet in the latest patch.

http://archives.postgresql.org/message-id/494FF631.90908@enterprisedb.com
recovery-infra-separated-again-1.patch

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCHES] Infrastructure changes for recovery (v8)
Date: 2008-12-29 10:27:56
Message-ID: 1230546476.4793.1192.camel@ebony.2ndQuadrant
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


On Mon, 2008-12-29 at 15:06 +0900, Fujii Masao wrote:

> This seems to have not been fixed yet in the latest patch.
>
> http://archives.postgresql.org/message-id/494FF631.90908@enterprisedb.com
> recovery-infra-separated-again-1.patch

I'll add it to my issues-reported list so we can check for regressions.

Thanks,

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support