Re: [HACKERS] Infrastructure changes for recovery

Lists: pgsql-hackerspgsql-patches
From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Infrastructure changes for recovery
Date: 2008-08-07 11:44:19
Message-ID: 1218109459.4549.423.camel@ebony.2ndQuadrant
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

I would like to propose some changes to the infrastructure for recovery.
These changes are beneficial in themselves, but also form the basis for
other work we might later contemplate.

Currently
* the startup process performs restartpoints during recovery
* the death of the startup process is tied directly to the change of
state in the postmaster following recovery

I propose to
* have startup process signal postmaster when it starts Redo phase (if
it starts it)
* have startup process signal postmaster again when it has completed
recovery, so that the change of state is via explicit signal rather than
death of the child process

Decoupling things in this way allows us to
1. arrange for the bgwriter to start during Redo, so it can:
i) clean dirty blocks for the startup process
ii) perform restartpoints in background
Both of these aspects will increase performance of recovery

2. provide a starting point for other changes in both startup process
and postmaster. These would include
i) have startup process do other work after startup completes, such as
executing transactions to rebuild damaged indexes, etc

ii) have postmaster allow connections while Redo is taking place, as one
part of allowing query access to standby database

The above two points have not been discussed and require separate
justification. However, any work on them is impossible without these
infrastructure changes.

These changes are part of a general strategy of moving in beneficial
steps towards various other goals, rather than attempting to create a
super-patch on 1 Nov that conflicts with other patches incoming at that
time. These parts are likely to conflict with synch replication work, so
I want to resolve as much as possible on Sept Commitfest.

The patch would include the required changes for bgwriter also.

Any objections/alterations to these proposals, please?

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Infrastructure changes for recovery
Date: 2008-08-07 14:48:06
Message-ID: 29481.1218120486@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Simon Riggs <simon(at)2ndquadrant(dot)com> writes:
> I propose to
> * have startup process signal postmaster when it starts Redo phase (if
> it starts it)

Doesn't seem like "starts recovery" is the point at which you can start
letting clients into the DB. What you want is to reach a point at which
you're sure that the DB is internally consistent, though perhaps not
fully synced with the master. In a PITR recovery scenario this would
correspond to reaching the minimum safe stop point. In true crash
recovery I don't think you can let people in till you're done.

regards, tom lane


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Infrastructure changes for recovery
Date: 2008-08-07 14:56:10
Message-ID: 20080807145610.GD4171@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Simon Riggs wrote:

> I propose to
> * have startup process signal postmaster when it starts Redo phase (if
> it starts it)

I think the first is a good idea -- at least, if you can get the startup
process to use the normal ReadBuffer code path instead of
XLogReadBuffer. I don't really know what's needed for this to work, but
there seem to be several layers of stuff that need to be up for this to
work.

> * have startup process signal postmaster again when it has completed
> recovery, so that the change of state is via explicit signal rather than
> death of the child process

I'm not sure that this is very useful, because the startup process
cannot do anything much. It is an auxiliary process, which means it
can't connect to a database and it can't run transactions.

This other idea:

> ii) have postmaster allow connections while Redo is taking place, as one
> part of allowing query access to standby database

is interesting and I'm sure it would be very welcome. Of course, it is
first necessary to be able to run transactions in true "read only" mode,
which perhaps is nowadays not so difficult given the work with VXids.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Infrastructure changes for recovery
Date: 2008-08-07 15:28:20
Message-ID: 1218122900.4549.538.camel@ebony.2ndQuadrant
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


On Thu, 2008-08-07 at 10:48 -0400, Tom Lane wrote:
> Simon Riggs <simon(at)2ndquadrant(dot)com> writes:
> > I propose to
> > * have startup process signal postmaster when it starts Redo phase (if
> > it starts it)
>
> Doesn't seem like "starts recovery" is the point at which you can start
> letting clients into the DB. What you want is to reach a point at which
> you're sure that the DB is internally consistent, though perhaps not
> fully synced with the master. In a PITR recovery scenario this would
> correspond to reaching the minimum safe stop point. In true crash
> recovery I don't think you can let people in till you're done.

Ack to both, no worries: just worded it a little too loosely.

For crash recovery we could let them in earlier, but I think its going
to recover faster if we don't. So, yes, only during archive recovery and
therefore only from min safe stopping point. That will mean bgwriter is
only active during archive recovery, but that's not important, since we
(almost) never perform restartpoints during crash recovery.

For other background I should also mention that this architecture
proposal is different from Florian's SoC proposals, which had a separate
recovery process to perform the work after the min safe stopping point.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support


From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Infrastructure changes for recovery
Date: 2008-08-07 15:55:42
Message-ID: 1218124543.4549.552.camel@ebony.2ndQuadrant
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


On Thu, 2008-08-07 at 10:56 -0400, Alvaro Herrera wrote:

> > * have startup process signal postmaster again when it has completed
> > recovery, so that the change of state is via explicit signal rather than
> > death of the child process
>
> I'm not sure that this is very useful, because the startup process
> cannot do anything much. It is an auxiliary process, which means it
> can't connect to a database and it can't run transactions.

Yes, but its going to be a lot easier to make it do transactions than it
will be to spawn something else that does? Or so I thought...

Specifically, I want to have a rmgr_post_cleanup() call to perform such
actions.

Who do you think should run those calls? They might need transactions,
or not, depending upon the rmgr. I envisage initially that rmgrs could
mark indexes as Invalid in pg_class, which requires a transaction (or
does it?).

If we did alter rmgrs to allow them to actually rebuild an index, then
it would be desirable to have them do it in parallel, just like AV
workers.

Would it be possible to use have startup to run rmgr_post_cleanup, then
provide a facility to have them run tasks through AV workers?

Sounds like a separate patch anyway.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support


From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Infrastructure changes for recovery
Date: 2008-08-07 16:05:14
Message-ID: 1218125114.4549.562.camel@ebony.2ndQuadrant
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


On Thu, 2008-08-07 at 10:56 -0400, Alvaro Herrera wrote:
> Simon Riggs wrote:

> > ii) have postmaster allow connections while Redo is taking place, as
> one
> > part of allowing query access to standby database
>
> is interesting and I'm sure it would be very welcome. Of course, it
> is first necessary to be able to run transactions in true "read only"
> mode, which perhaps is nowadays not so difficult given the work with
> VXids.

Maybe I am assuming that part is easier than I thought. Florian's
contribution with the VXid work should not be understated.

I was going to spread the setting of the global variable InRedo to all
backends, so they would be able to answer the question IsInRedo().

Anybody attempting to assign a new TransactionId would check this and
then exit with an ERROR. There would also be a check in XLogInsert(), in
case the first case doesn't catch everything or everyone. Transactions
would also run as read-only transactions to keep the front door locked.

Anything else you think needs bolting down?

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: List pgsql-patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: [HACKERS] Infrastructure changes for recovery
Date: 2008-08-31 20:04:34
Message-ID: 1220213074.4371.173.camel@ebony.2ndQuadrant
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


On Thu, 2008-08-07 at 12:44 +0100, Simon Riggs wrote:
> I would like to propose some changes to the infrastructure for recovery.
> These changes are beneficial in themselves, but also form the basis for
> other work we might later contemplate.
>
> Currently
> * the startup process performs restartpoints during recovery
> * the death of the startup process is tied directly to the change of
> state in the postmaster following recovery
>
> I propose to
> * have startup process signal postmaster when it starts Redo phase (if
> it starts it)

> Decoupling things in this way allows us to
> 1. arrange for the bgwriter to start during Redo, so it can:
> i) clean dirty blocks for the startup process
> ii) perform restartpoints in background
> Both of these aspects will increase performance of recovery

Taking into account comments from Tom and Alvaro

Included patch with the following changes:

* new postmaster mode known as consistent recovery, entered only when
recovery passes safe/consistent point. InRedo is now set in all
processes when started/connected in consistent recovery mode.

* bgwriter and stats process starts in consistent recovery mode.
bgwriter changes mode when startup process completes.

* bgwriter now performs restartpoints and also cleans shared_buffers
while the startup process performs redo apply

* recovery.conf parameter log_restartpoints is now deprecated, since
function overlaps with log_checkpoints too much. I've kept the
distinction between restartpoints and checkpoints in code, to avoid
convoluted code. Minor change, not critical.

[Replying to one of Alvaro's other comments: Startup process still uses
XLogReadBuffer. I'm not planning on changing that either, at least not
in this patch.]

Patch doesn't conflict with rmgr plugin patch.

Passes make check, but that's easy.
Various other tests all seem to be working.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support

Attachment Content-Type Size
recovery_infrastruc.v2.patch text/x-patch 31.3 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: List pgsql-patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: [HACKERS] Infrastructure changes for recovery
Date: 2008-09-08 17:34:01
Message-ID: 1905.1220895241@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
> Included patch with the following changes:

> * new postmaster mode known as consistent recovery, entered only when
> recovery passes safe/consistent point. InRedo is now set in all
> processes when started/connected in consistent recovery mode.

I looked at this and didn't like the InRedo signaling at all. In the
first place, it looks like you're expecting the flag to be passed down
via fork(), but that won't work in EXEC_BACKEND mode. (Yes, easily
fixed, but it's wrong as-is.) In the second place, the method of
signaling it to the bgwriter looks to have race conditions: in
principle, at least, I think the startup process could try to clear
the shared-memory InRedo flag before the bgwriter has gotten as far as
setting it. (This might work if the startup process can't exit redo
mode without the bgwriter having finished a checkpoint, but such a
dependency is too rube goldbergian for me.)

ISTM that it would probably be better if there were exactly one InRedo
flag in shared memory, probably in xlog.c's shared state, with the
postmaster not being responsible for setting or clearing it; rather
the startup process should do those things.

> * bgwriter and stats process starts in consistent recovery mode.
> bgwriter changes mode when startup process completes.

I'm not sure about the interaction of this. In particular, what about
recovery restart points before we have reached the safe stop point?
I don't think we want to give up the capability of having those.

Also, it seems pretty bogus to update the in-memory ControlFile
checkpoint values before the restart point is actually done. It looks
to me like what you have done is to try to use those fields as signaling
for the restart request in addition to their existing purposes, which
I think is confusing and probably dangerous. I'd rather there were a
different signaling path and ControlFile maintains its currrent
definition.

I found the changes in bgwriter.c unreadable. You've evidently
moved blocks of code around, but exactly what did you mean to
change? Why is there so much duplicated code now?

> I've kept the distinction between restartpoints and checkpoints in
> code, to avoid convoluted code.

Hmm, I dunno, it seems like that might be a bad choice. Are you sure
it's not cleaner to just use the regular checkpoint code?

regards, tom lane


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: List pgsql-patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: [HACKERS] Infrastructure changes for recovery
Date: 2008-09-08 18:01:50
Message-ID: 1220896910.3913.242.camel@ebony.2ndQuadrant
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


On Mon, 2008-09-08 at 13:34 -0400, Tom Lane wrote:
> Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
> > Included patch with the following changes:
>
> > * new postmaster mode known as consistent recovery, entered only when
> > recovery passes safe/consistent point. InRedo is now set in all
> > processes when started/connected in consistent recovery mode.
>
> I looked at this and didn't like the InRedo signaling at all. In the
> first place, it looks like you're expecting the flag to be passed down
> via fork(), but that won't work in EXEC_BACKEND mode. (Yes, easily
> fixed, but it's wrong as-is.)

OK, thanks. I didn't check that.

> In the second place, the method of
> signaling it to the bgwriter looks to have race conditions: in
> principle, at least, I think the startup process could try to clear
> the shared-memory InRedo flag before the bgwriter has gotten as far as
> setting it. (This might work if the startup process can't exit redo
> mode without the bgwriter having finished a checkpoint, but such a
> dependency is too rube goldbergian for me.)

We never interrupt what the bgwriter does. If it is in the middle of
doing a checkpoint when recovery finishes, then Startup process just
waits behind it and then issues a shutdown checkpoint. If there's loads
of blocks to be written it doesn't matter who writes them. There's only
one soup spoon, and it doesn't matter who stirs the soup.

> ISTM that it would probably be better if there were exactly one InRedo
> flag in shared memory, probably in xlog.c's shared state, with the
> postmaster not being responsible for setting or clearing it; rather
> the startup process should do those things.

So replace InRedo with an IsInRedo() function that accesses XLogCtl?

Sounds good

> > * bgwriter and stats process starts in consistent recovery mode.
> > bgwriter changes mode when startup process completes.
>
> I'm not sure about the interaction of this. In particular, what about
> recovery restart points before we have reached the safe stop point?
> I don't think we want to give up the capability of having those.

Maybe. I felt it made the code cleaner to give that up. Realistically
its a fairly short window of time. But shouldn't be too hard to put
back.

> Also, it seems pretty bogus to update the in-memory ControlFile
> checkpoint values before the restart point is actually done. It looks
> to me like what you have done is to try to use those fields as signaling
> for the restart request in addition to their existing purposes, which
> I think is confusing and probably dangerous. I'd rather there were a
> different signaling path and ControlFile maintains its currrent
> definition.

OK

> I found the changes in bgwriter.c unreadable. You've evidently
> moved blocks of code around, but exactly what did you mean to
> change? Why is there so much duplicated code now?

The patch utility did that. Some code reuse confused it. It's real easy
to read though if you apply the patch and then read the main loop.
You'll see what I mean.

> > I've kept the distinction between restartpoints and checkpoints in
> > code, to avoid convoluted code.
>
> Hmm, I dunno, it seems like that might be a bad choice. Are you sure
> it's not cleaner to just use the regular checkpoint code?

When I tried to write it, it just looked to my eyes like every single
line had a caveat which looked ugly and multiplied the testing. You're
the code dude, always happy to structure things as you suggest. If
you're sure, that is.

Thanks for the review.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: List pgsql-patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: [HACKERS] Infrastructure changes for recovery
Date: 2008-09-08 18:18:16
Message-ID: 2792.1220897896@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
> On Mon, 2008-09-08 at 13:34 -0400, Tom Lane wrote:
>> Hmm, I dunno, it seems like that might be a bad choice. Are you sure
>> it's not cleaner to just use the regular checkpoint code?

> When I tried to write it, it just looked to my eyes like every single
> line had a caveat which looked ugly and multiplied the testing. You're
> the code dude, always happy to structure things as you suggest. If
> you're sure, that is.

No, was just wondering if the other way would be better. If you're
sure not, that's fine.

regards, tom lane


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: List pgsql-patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: [HACKERS] Infrastructure changes for recovery
Date: 2008-09-10 21:06:37
Message-ID: 1221080797.3913.768.camel@ebony.2ndQuadrant
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


On Mon, 2008-09-08 at 13:34 -0400, Tom Lane wrote:

> ISTM that it would probably be better if there were exactly one InRedo
> flag in shared memory, probably in xlog.c's shared state, with the
> postmaster not being responsible for setting or clearing it; rather
> the startup process should do those things.

Done

> > * bgwriter and stats process starts in consistent recovery mode.
> > bgwriter changes mode when startup process completes.
>
> I'm not sure about the interaction of this. In particular, what about
> recovery restart points before we have reached the safe stop point?
> I don't think we want to give up the capability of having those.
>
> Also, it seems pretty bogus to update the in-memory ControlFile
> checkpoint values before the restart point is actually done. It looks
> to me like what you have done is to try to use those fields as signaling
> for the restart request in addition to their existing purposes, which
> I think is confusing and probably dangerous. I'd rather there were a
> different signaling path and ControlFile maintains its currrent
> definition.

Done

Testing takes a while on this, I probably won't complete it until
Friday. So enclosed patch is for eyeballs only at this stage.

I added in the XLogCtl padding we've discussed before, while I'm there.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support

Attachment Content-Type Size
recovery_infrastruc.v3.patch text/x-patch 37.6 KB

From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, List pgsql-patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: [HACKERS] Infrastructure changes for recovery
Date: 2008-09-12 18:14:21
Message-ID: 20080912181421.GI8854@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Simon Riggs wrote:

> --- 5716,5725 ----
> CheckpointStats.ckpt_sync_end_t,
> &sync_secs, &sync_usecs);
>
> ! elog(LOG, "%s complete: wrote %d buffers (%.1f%%); "
> "%d transaction log file(s) added, %d removed, %d recycled; "
> "write=%ld.%03d s, sync=%ld.%03d s, total=%ld.%03d s",
> + (checkpoint ? " checkpoint" : "restartpoint"),
> CheckpointStats.ckpt_bufs_written,
> (double) CheckpointStats.ckpt_bufs_written * 100 / NBuffers,
> CheckpointStats.ckpt_segs_added,

Very minor nit: this really needs a rework. It is relatively OK in the
previous code, but it was already stuffing too much in a single message.
Maybe

ereport(LOG,
(errmsg(checkpoint ? "checkpoint complete" : "restartpoint complete"),
errdetail("Wrote %d buffers (%.1f%%); "
"%d transaction log file(s) added, %d removed, %d recycled; "
"write=%ld.%03d s, sync=%ld.%03d s, total=%ld.%03d s.",
...
)))

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, List pgsql-patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: [HACKERS] Infrastructure changes for recovery
Date: 2008-09-12 18:56:06
Message-ID: 1221245766.3913.1130.camel@ebony.2ndQuadrant
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


On Fri, 2008-09-12 at 14:14 -0400, Alvaro Herrera wrote:
> Simon Riggs wrote:
>
> > --- 5716,5725 ----
> > CheckpointStats.ckpt_sync_end_t,
> > &sync_secs, &sync_usecs);
> >
> > ! elog(LOG, "%s complete: wrote %d buffers (%.1f%%); "
> > "%d transaction log file(s) added, %d removed, %d recycled; "
> > "write=%ld.%03d s, sync=%ld.%03d s, total=%ld.%03d s",
> > + (checkpoint ? " checkpoint" : "restartpoint"),
> > CheckpointStats.ckpt_bufs_written,
> > (double) CheckpointStats.ckpt_bufs_written * 100 / NBuffers,
> > CheckpointStats.ckpt_segs_added,
>
> Very minor nit: this really needs a rework.

All I changed was the word "restartpoint"... its otherwise identical to
existing message. I'd rather not change that.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, List pgsql-patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: [HACKERS] Infrastructure changes for recovery
Date: 2008-09-12 19:31:44
Message-ID: 20080912193144.GJ8854@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Simon Riggs wrote:
>
> On Fri, 2008-09-12 at 14:14 -0400, Alvaro Herrera wrote:
> > Simon Riggs wrote:
> >
> > > --- 5716,5725 ----
> > > CheckpointStats.ckpt_sync_end_t,
> > > &sync_secs, &sync_usecs);
> > >
> > > ! elog(LOG, "%s complete: wrote %d buffers (%.1f%%); "
> > > "%d transaction log file(s) added, %d removed, %d recycled; "
> > > "write=%ld.%03d s, sync=%ld.%03d s, total=%ld.%03d s",
> > > + (checkpoint ? " checkpoint" : "restartpoint"),
> > > CheckpointStats.ckpt_bufs_written,
> > > (double) CheckpointStats.ckpt_bufs_written * 100 / NBuffers,
> > > CheckpointStats.ckpt_segs_added,
> >
> > Very minor nit: this really needs a rework.
>
> All I changed was the word "restartpoint"... its otherwise identical to
> existing message. I'd rather not change that.

The new message is not translatable, the original was.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, List pgsql-patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: [HACKERS] Infrastructure changes for recovery
Date: 2008-09-12 19:55:25
Message-ID: 1221249325.3913.1161.camel@ebony.2ndQuadrant
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


On Fri, 2008-09-12 at 15:31 -0400, Alvaro Herrera wrote:
> Simon Riggs wrote:
> >
> > All I changed was the word "restartpoint"... its otherwise identical to
> > existing message. I'd rather not change that.
>
> The new message is not translatable, the original was.

OK, perhaps the word "restartpoint" is redundant anyhow. Anybody looking
at the log can see we're in recovery. So I'll switch it back to say just
checkpoint whether we do restartpoint or checkpoints.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: List pgsql-patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: [HACKERS] Infrastructure changes for recovery
Date: 2008-09-16 19:45:18
Message-ID: 15776.1221594318@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Simon Riggs <simon(at)2ndQuadrant(dot)com> wrote:
> Testing takes a while on this, I probably won't complete it until
> Friday. So enclosed patch is for eyeballs only at this stage.

What's the status on that patch?

regards, tom lane


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: List pgsql-patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: [HACKERS] Infrastructure changes for recovery
Date: 2008-09-17 08:31:45
Message-ID: 1221640305.3913.2027.camel@ebony.2ndQuadrant
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


On Tue, 2008-09-16 at 15:45 -0400, Tom Lane wrote:
> Simon Riggs <simon(at)2ndQuadrant(dot)com> wrote:
> > Testing takes a while on this, I probably won't complete it until
> > Friday. So enclosed patch is for eyeballs only at this stage.
>
> What's the status on that patch?

Checking EXEC_BACKEND case and that reporting back later today.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: List pgsql-patches <pgsql-patches(at)postgresql(dot)org>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCHES] Infrastructure changes for recovery
Date: 2008-09-18 08:05:45
Message-ID: 1221725145.3913.2315.camel@ebony.2ndQuadrant
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


On Tue, 2008-09-16 at 15:45 -0400, Tom Lane wrote:
> Simon Riggs <simon(at)2ndQuadrant(dot)com> wrote:
> > Testing takes a while on this, I probably won't complete it until
> > Friday. So enclosed patch is for eyeballs only at this stage.
>
> What's the status on that patch?

Having some trouble trying to get a clean state change from recovery to
normal mode.

Startup needs to be able to write WAL at the end of recovery so it can
write a ShutdownCheckpoint, yet must not be allowed to write WAL before
that. Other services are still not fully up yet. So every other process
apart from Startup musn't write WAL until Startup has fully completed
its actions, which is sometime later.

bgwriter needs to know about the impending state change so it can finish
off any checkpoint its currently working on. But then can't start doing
normal processing yet either. So it must have a third state that is
between recovery and normal processing. When normal processing is
reached, it *must* write WAL immediately from that point onwards, yet
using the new timeline that startup decides upon.

So the idea of a single database-wide boolean state seems impossible. We
need a local canInsertWAL flag that is set at different times in
different processes.

Global states changes are now

not started
started - postmaster process then startup process
safestoppingpoint - bgwriter starts
startup_does_shutdown_checkpoint - bgwriter finishes up, just waits,
startup is now allowed to insert WAL for checkpoint record
startup completes StartupXLog - bgwriter begins normal processing
startup exits - postmaster in normal state

We *might* solve this by making the final checkpoint that Startup writes
into an online checkpoint. That would then allow a straight and safe
transition for bgwriter from just one state to the other. But that
leaves some other ugliness that I don't want to deal with, 'cos there's
other fish frying.

Feels like I should shutdown the bgwriter after recovery and then allow
it to be cranked up again after normal processing starts, and do all of
this through postmaster state changes. That way bgwriter doesn't need to
do a dynamic state change.

Comments anyone?

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: List pgsql-patches <pgsql-patches(at)postgresql(dot)org>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCHES] Infrastructure changes for recovery
Date: 2008-09-18 12:18:00
Message-ID: 1221740280.3913.2398.camel@ebony.2ndQuadrant
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


On Thu, 2008-09-18 at 09:05 +0100, Simon Riggs wrote:

> Feels like I should shutdown the bgwriter after recovery and then
> allow it to be cranked up again after normal processing starts, and do
> all of this through postmaster state changes. That way bgwriter
> doesn't need to do a dynamic state change.

This approach appears to be working nicely so far. Some ugly bits of
former patch removed.

Patch passes basic tests and changes state cleanly.

Restarting test cycle on this patch now, confirm tomorrow.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support

Attachment Content-Type Size
recovery_infrastruc.v5.patch text/x-patch 36.1 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: List pgsql-patches <pgsql-patches(at)postgresql(dot)org>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCHES] Infrastructure changes for recovery
Date: 2008-09-18 13:06:36
Message-ID: 16992.1221743196@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
> Having some trouble trying to get a clean state change from recovery to
> normal mode.

> Startup needs to be able to write WAL at the end of recovery so it can
> write a ShutdownCheckpoint, yet must not be allowed to write WAL before
> that. Other services are still not fully up yet. So every other process
> apart from Startup musn't write WAL until Startup has fully completed
> its actions, which is sometime later.
> ...
> We *might* solve this by making the final checkpoint that Startup writes
> into an online checkpoint.

Do we really need a checkpoint there at all? I can't recall right now
if there was a good reason why Vadim made it do that. I have a feeling
that it might have had to do with an assumption that the recovery
environment was completely distinct from live environment; which is a
statement that's certainly not going to be true in a query-answering
slave.

regards, tom lane


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: List pgsql-patches <pgsql-patches(at)postgresql(dot)org>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCHES] Infrastructure changes for recovery
Date: 2008-09-18 13:50:47
Message-ID: 1221745847.3913.2408.camel@ebony.2ndQuadrant
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


On Thu, 2008-09-18 at 09:06 -0400, Tom Lane wrote:
> Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
> > Having some trouble trying to get a clean state change from recovery to
> > normal mode.
>
> > Startup needs to be able to write WAL at the end of recovery so it can
> > write a ShutdownCheckpoint, yet must not be allowed to write WAL before
> > that. Other services are still not fully up yet. So every other process
> > apart from Startup musn't write WAL until Startup has fully completed
> > its actions, which is sometime later.
> > ...
> > We *might* solve this by making the final checkpoint that Startup writes
> > into an online checkpoint.
>
> Do we really need a checkpoint there at all? I can't recall right now
> if there was a good reason why Vadim made it do that. I have a feeling
> that it might have had to do with an assumption that the recovery
> environment was completely distinct from live environment; which is a
> statement that's certainly not going to be true in a query-answering
> slave.

Hmm, a question I hadn't considered.

We definitely don't want to do PreallocXlogFiles().

"Timelines only change at shutdown checkpoints". But its just as easy to
write a short timeline change record rather than the checkpoint itself,
so we can sort that out.

It should be sufficient to request bgwriter to perform an immediate
checkpoint, rather than have startup process perform it. That way
startup can exit and we can change to normal processing faster.

If we crash before the next checkpoint then we would revert to archive
recovery or crash recovery. The last restartpoint might well be
somewhere else. In streaming mode we would presumably have that data
locally, so not really a problem. But we musn't mark control file in
production yet, but we can do so following the next checkpoint.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: List pgsql-patches <pgsql-patches(at)postgresql(dot)org>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCHES] Infrastructure changes for recovery
Date: 2008-09-18 14:09:43
Message-ID: 18549.1221746983@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
> On Thu, 2008-09-18 at 09:06 -0400, Tom Lane wrote:
>> Do we really need a checkpoint there at all?

> "Timelines only change at shutdown checkpoints".

Hmm. I *think* that that is just a debugging crosscheck rather than a
critical property. But yeah, it would take some close investigation,
which maybe isn't warranted if you have a less-invasive solution.

regards, tom lane


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: List pgsql-patches <pgsql-patches(at)postgresql(dot)org>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCHES] Infrastructure changes for recovery
Date: 2008-09-18 14:24:55
Message-ID: 1221747895.3913.2432.camel@ebony.2ndQuadrant
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


On Thu, 2008-09-18 at 10:09 -0400, Tom Lane wrote:
> Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
> > On Thu, 2008-09-18 at 09:06 -0400, Tom Lane wrote:
> >> Do we really need a checkpoint there at all?
>
> > "Timelines only change at shutdown checkpoints".
>
> Hmm. I *think* that that is just a debugging crosscheck rather than a
> critical property. But yeah, it would take some close investigation,
> which maybe isn't warranted if you have a less-invasive solution.

The important thing is that everybody uses the new timelineid.

AFAICS we actually write new timelineids into the WAL file if the
previous timelineid, so it can't be that risky.

Ignore v5 then and look for v6 on Monday. Hopefully the final one :-)

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: List pgsql-patches <pgsql-patches(at)postgresql(dot)org>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCHES] Infrastructure changes for recovery
Date: 2008-09-22 22:06:01
Message-ID: 1222121161.4445.248.camel@ebony.2ndQuadrant
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


On Thu, 2008-09-18 at 10:09 -0400, Tom Lane wrote:
> Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
> > On Thu, 2008-09-18 at 09:06 -0400, Tom Lane wrote:
> >> Do we really need a checkpoint there at all?
>
> > "Timelines only change at shutdown checkpoints".
>
> Hmm. I *think* that that is just a debugging crosscheck rather than a
> critical property. But yeah, it would take some close investigation,
> which maybe isn't warranted if you have a less-invasive solution.

OK, new patch, version 6. Some major differences to previous patch.

* new IsRecoveryProcessingMode() in shmem
* padding in XLogCtl to ensure above call is cheap
* specific part of bgwriter shmem for passing restartpoint data
* avoid Shutdown checkpoint at end of recovery, with carefully
considered positioning of statements (beware!)
* only one new postmaster mode, PM_RECOVERY
* bgwriter changes state without stopping/starting

Modes I have tested so far
* make check
* Start, Stop
* Crash Recovery
* Archive Recovery
* Archive Recovery, switch in middle of restartpoint

Modes not yet tested
* EXEC_BACKEND

Ready for serious review prior to commit. I will be performing further
testing also.

backend/access/transam/multixact.c | 2
backend/access/transam/xlog.c | 328 ++++++++++++---!!!!!!!!!!!!
backend/postmaster/bgwriter.c | 371 +++++---!!!!!!!!!!!!!!!!!!!!!
backend/postmaster/postmaster.c | 62 ++++!!
backend/storage/buffer/README | 5
backend/storage/buffer/bufmgr.c | 34 +!!
include/access/xlog.h | 14 !
include/access/xlog_internal.h | 3
include/catalog/pg_control.h | 2
include/postmaster/bgwriter.h | 2
include/storage/bufmgr.h | 2
include/storage/pmsignal.h | 1
12 files changed, 279 insertions(+), 56 deletions(-), 491 mods(!)

There's a few subtle points along the way. I've tried to explain them
all in code comments, but questions welcome. At v6, most things are now
done a particular way for a specific reason.

Look especially at InRecovery, which is used extensively in different
parts of the code. The meaning of this has been subdivided into two
meanings, so only *some* of the places that use it have been changed.
All have been checked.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support

Attachment Content-Type Size
recovery_infrastruc.v6.patch text/x-patch 51.2 KB

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: List pgsql-patches <pgsql-patches(at)postgresql(dot)org>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCHES] Infrastructure changes for recovery
Date: 2008-09-23 15:42:21
Message-ID: 1222184541.4445.400.camel@ebony.2ndQuadrant
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


On Mon, 2008-09-22 at 23:06 +0100, Simon Riggs wrote:
> On Thu, 2008-09-18 at 10:09 -0400, Tom Lane wrote:
> > Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
> > > On Thu, 2008-09-18 at 09:06 -0400, Tom Lane wrote:
> > >> Do we really need a checkpoint there at all?
> >
> > > "Timelines only change at shutdown checkpoints".
> >
> > Hmm. I *think* that that is just a debugging crosscheck rather than a
> > critical property. But yeah, it would take some close investigation,
> > which maybe isn't warranted if you have a less-invasive solution.
>
> OK, new patch, version 6. Some major differences to previous patch.

> Ready for serious review prior to commit. I will be performing further
> testing also.

Version 7

I've removed the concept of interrupting a restartpoint half way
through, I found a fault there. It was more ugly than the alternative
and less robust. The code now waits at the end of recovery if we are in
the middle of a restartpoint, but forces a do-it-more-quickly also. That
means we won't always get a fast start even though we skip the shutdown
checkpoint, but at least we're sure there's no chance of breakage
because of concurrent activiy, state changes etc..

I'm happy with this now.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support

Attachment Content-Type Size
recovery_infrastruc.v7.patch text/x-patch 50.0 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: List pgsql-patches <pgsql-patches(at)postgresql(dot)org>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCHES] Infrastructure changes for recovery
Date: 2008-09-25 22:28:39
Message-ID: 28973.1222381719@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
> Version 7

After reading this for awhile, I realized that there is a rather
fundamental problem with it: it switches into "consistent recovery"
mode as soon as it's read WAL beyond ControlFile->minRecoveryPoint.
In a crash recovery situation that typically is before the last
checkpoint (if indeed it's not still zero), and what that means is
that this patch will activate the bgwriter and start letting in
backends instantaneously after a crash, long before we can have any
certainty that the DB state really is consistent.

In a normal crash recovery situation this would be easily fixed by
simply not letting it go to "consistent recovery" state at all, but
what about recovery from a restartpoint? We don't want a slave that's
crashed once to never let backends in again. But I don't see how to
determine that we're far enough past the restartpoint to be consistent
again. In crash recovery we assume (without proof ;-)) that we're
consistent once we reach the end of valid-looking WAL, but that rule
doesn't help for a slave that's following a continuing WAL sequence.

Perhaps something could be done based on noting when we have to pull in
a WAL segment from the recovery_command, but it sounds like a pretty
fragile assumption.

Anyway, that's sufficiently bad that I'm bouncing the patch for
reconsideration. Some other issues I noted before giving up:

* I'm a bit uncomfortable with the fact that the
IsRecoveryProcessingMode flag is read and written with no lock.
There's no atomicity or concurrent-write problem, sure, but on
a multiprocessor with weak memory ordering guarantees (eg PPC)
readers could get a fairly stale value of the flag. The false
to true transition happens before anyone except the startup process is
running, so that's no problem; the net effect is then that backends
might think that recovery mode was still active for awhile after it
wasn't. This seems a bit scary, eg in the patch as it stands that'll
disable XLogFlush calls that should have happened. You could fix that
by grabbing/releasing some spinlock (any old one) around the accesses,
but are any of the call sites performance-critical? The one in
XLogInsert looks like it is, if nothing else.

* I kinda think you broke XLogFlush anyway. It's certainly clear
that the WARNING case at the bottom is unreachable with the patch,
and I think that means that you've messed up an important error
robustness behavior. Is it still possible to get out of recovery mode
if there are any bad LSNs in the shared buffer pool?

* The use of InRecovery in CreateCheckPoint seems pretty bogus, since
that function can be called from the bgwriter in which the flag will
never be true. Either this needs to be IsRecoveryProcessingMode(),
or it's useless because we'll never create ordinary checkpoints until
after subtrans.c is up anyway.

* The patch moves the clearing of InRecovery from after to before
StartupCLOG, RecoverPreparedTransactions, etc. Is that really a
good idea? It makes it hard for those modules to know if they are
coming up after a normal or recovery startup. I think they may not
care at the moment, but I'd leave that alone without a darn good
reason to change it.

* The comment about CheckpointLock being only pro forma is now wrong,
if you are going to use locking it to implement a delay in exitRecovery.
But I don't understand why the delay there. If it's needed, seems like
the path where a restartpoint *isn't* in progress is wrong --- don't you
actually need to start one and wait for it? Also I note that if the
LWLockConditionalAcquire succeeds, you neglect to release the lock,
which surely can't be right.

* The tail end of StartupXLOG() looks pretty unsafe to me. Surely
we mustn't clear IsRecoveryProcessingMode until after we have
established the safe-recovery checkpoint. The comments there seem to
be only vaguely related to the current state of the patch, too.

* Logging of recoveryLastXTime seems pretty bogus now. It's supposed to
be associated with a restartpoint completion report, but now it's just
out in the ether somewhere and doesn't represent a guarantee that we're
synchronized that far.

* backup.sgml needs to be updated to say that log_restartpoints is
obsolete.

* There are a bunch of disturbing assumptions in the SLRU-related
modules about their StartUp calls being executed without any concurrent
access. This isn't really a problem that needs to be dealt with in this
patch, I think, but that will all have to be cleaned up before we dare
allow any backends to run concurrently with recovery. btree's
suppression of relcache invals for metapage updates will be a problem
too.

regards, tom lane


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: List pgsql-patches <pgsql-patches(at)postgresql(dot)org>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCHES] Infrastructure changes for recovery
Date: 2008-09-26 10:20:15
Message-ID: 1222424415.4445.917.camel@ebony.2ndQuadrant
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


On Thu, 2008-09-25 at 18:28 -0400, Tom Lane wrote:
> Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
> > Version 7

> Anyway, that's sufficiently bad that I'm bouncing the patch for
> reconsideration.

No problem, I understand this needs discussion.

There's less detail here than first appears. There are some basic points
to consider from which all else follows.

> After reading this for awhile, I realized that there is a rather
> fundamental problem with it: it switches into "consistent recovery"
> mode as soon as it's read WAL beyond ControlFile->minRecoveryPoint.
> In a crash recovery situation that typically is before the last
> checkpoint (if indeed it's not still zero), and what that means is
> that this patch will activate the bgwriter and start letting in
> backends instantaneously after a crash, long before we can have any
> certainty that the DB state really is consistent.
>
> In a normal crash recovery situation this would be easily fixed by
> simply not letting it go to "consistent recovery" state at all, but
> what about recovery from a restartpoint? We don't want a slave that's
> crashed once to never let backends in again. But I don't see how to
> determine that we're far enough past the restartpoint to be consistent
> again. In crash recovery we assume (without proof ;-)) that we're
> consistent once we reach the end of valid-looking WAL, but that rule
> doesn't help for a slave that's following a continuing WAL sequence.
>
> Perhaps something could be done based on noting when we have to pull in
> a WAL segment from the recovery_command, but it sounds like a pretty
> fragile assumption.

Seems like we just say we only signal the postmaster if
InArchiveRecovery. Archive recovery from a restartpoint is still archive
recovery, so this shouldn't be a problem in the way you mention. The
presence of recovery.conf overrides all other cases.

> Some other issues I noted before giving up:

All of these issues raised can be addressed, but I think the main
decision we need to make is not so much about running other processes
but about when it can start and when they have to change mode.

When they can start seems solvable, as above.

When/how they must change state from recovery to normal mode seems more
difficult. State change must be atomic across all processes, but also
done at a micro level so that XLogFlush tests for the state change. The
overhead of continually checking seems high, so I am tempted to say lets
just kick 'em all off and then let them back on again. That's easily
accomplished for bgwriter without anybody noticing much. For Hot Standby
that would mean that a failover would kick off all query backends. I can
see why that would be both desirable and undesirable.

Anyway, from here I propose:
* we keep the shutdown checkpoint
* we kick off bgwriter (and any children) then let 'em back on again so
they can initialise in a different mode.

To do that, I just need to dust off a previous version of the patch. So
we can sort this out quickly if we have a clear way to proceed.

------------------------------------------------------------------
other comments relate to this current patch, so further discussion of
the points below may not be required, if we agree how to proceed as
above.

> * I'm a bit uncomfortable with the fact that the
> IsRecoveryProcessingMode flag is read and written with no lock.
> There's no atomicity or concurrent-write problem, sure, but on
> a multiprocessor with weak memory ordering guarantees (eg PPC)
> readers could get a fairly stale value of the flag. The false
> to true transition happens before anyone except the startup process is
> running, so that's no problem; the net effect is then that backends
> might think that recovery mode was still active for awhile after it
> wasn't. This seems a bit scary, eg in the patch as it stands that'll
> disable XLogFlush calls that should have happened. You could fix that
> by grabbing/releasing some spinlock (any old one) around the accesses,
> but are any of the call sites performance-critical? The one in
> XLogInsert looks like it is, if nothing else.

Agreed.

It's not a dynamic state, so I can fix that inside
IsRecoveryProcessingMode() with a local state to make check faster.

bool
IsRecoveryProcessingMode(void)
{
if (!IsRecoveryProcessingMode)
return false;

{
/* use volatile pointer to prevent code rearrangement */
volatile XLogCtlData *xlogctl = XLogCtl;

SpinLockAcquire(&xlogctl->mode_lck);
RecoveryProcessingMode = XLogCtl->IsRecoveryProcessingMode;
SpinLockRelease(&xlogctl->mode_lck);
}

return IsRecoveryProcessingMode;
}

This only applies if we decide not to kick everybody off, change state
and then let them back on again.

> * I kinda think you broke XLogFlush anyway. It's certainly clear
> that the WARNING case at the bottom is unreachable with the patch,
> and I think that means that you've messed up an important error
> robustness behavior. Is it still possible to get out of recovery mode
> if there are any bad LSNs in the shared buffer pool?

Perhaps. But the WARNING could only occur during shutdown checkpoints.
This specifically patch avoids those, so the case would never arise with
this patch and needs no avoidance. Yes, you can still leave recovery
mode if there are bad LSNs with this patch, but you won't know what they
are because of the lack of the shutdown checkpoint. Probably an argument
in favour of allowing shutdown checkpoints.

> * The use of InRecovery in CreateCheckPoint seems pretty bogus, since
> that function can be called from the bgwriter in which the flag will
> never be true. Either this needs to be IsRecoveryProcessingMode(),
> or it's useless because we'll never create ordinary checkpoints until
> after subtrans.c is up anyway.

Exactly. bgwriter never needs this to be set because it writes
restorepoints before this, using different code path.

> * The patch moves the clearing of InRecovery from after to before
> StartupCLOG, RecoverPreparedTransactions, etc. Is that really a
> good idea? It makes it hard for those modules to know if they are
> coming up after a normal or recovery startup. I think they may not
> care at the moment, but I'd leave that alone without a darn good
> reason to change it.

I didn't move this as you say. It already was before StartupClog.

I moved it into exitRecovery() only, so it was unset in the same way
exitArchiveRecovery sets InArchiveRecovery to false. So refactoring, no
change of sequencing.

> * The comment about CheckpointLock being only pro forma is now wrong,
> if you are going to use locking it to implement a delay in exitRecovery.
> But I don't understand why the delay there. If it's needed, seems like
> the path where a restartpoint *isn't* in progress is wrong --- don't you
> actually need to start one and wait for it?

All of this ducking and diving is because of the bgwriter needing to
perform a state change. That's the ball to keep our eye on.

After much thrashing, I decided that interrupting a restartpoint is too
dangerous a thing to want to do. If we're in the middle of one, we
finish it, if not there's no need to interrupt it.

> Also I note that if the
> LWLockConditionalAcquire succeeds, you neglect to release the lock,
> which surely can't be right.

Doh. Thanks.

> * The tail end of StartupXLOG() looks pretty unsafe to me. Surely
> we mustn't clear IsRecoveryProcessingMode until after we have
> established the safe-recovery checkpoint. The comments there seem to
> be only vaguely related to the current state of the patch, too.

The whole point was to remove the ShutdownCheckpoint, but it sounds like
you're not keen on that any more.

I'm neutral on this point: I can see why people would want it removed -
it will speed up failover. I can see why people would want it kept -
there is a slight window where if we crash we will need to go back to
archive recovery.

I had a workable solution that kept it, so will revert to it.

> * Logging of recoveryLastXTime seems pretty bogus now. It's supposed to
> be associated with a restartpoint completion report, but now it's just
> out in the ether somewhere and doesn't represent a guarantee that we're
> synchronized that far.

That last one was there before so we knew where the log ended. It was
not supposed to be associated with a restartpoint, just with end of log
purely for information (by user request, for when a log file is
corrupted and we need to know "when" we are up to).

> * backup.sgml needs to be updated to say that log_restartpoints is
> obsolete.

Yes

> * There are a bunch of disturbing assumptions in the SLRU-related
> modules about their StartUp calls being executed without any concurrent
> access. This isn't really a problem that needs to be dealt with in this
> patch, I think, but that will all have to be cleaned up before we dare
> allow any backends to run concurrently with recovery.

Well spotted, thanks.

> btree's
> suppression of relcache invals for metapage updates will be a problem
> too.

Again thanks. This patch is stand-alone from later work, thats why.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: List pgsql-patches <pgsql-patches(at)postgresql(dot)org>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCHES] Infrastructure changes for recovery
Date: 2008-09-26 11:42:06
Message-ID: 1222429326.4445.945.camel@ebony.2ndQuadrant
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


On Fri, 2008-09-26 at 11:20 +0100, Simon Riggs wrote:

> > After reading this for awhile, I realized that there is a rather
> > fundamental problem with it: it switches into "consistent recovery"
> > mode as soon as it's read WAL beyond ControlFile->minRecoveryPoint.
> > In a crash recovery situation that typically is before the last
> > checkpoint (if indeed it's not still zero), and what that means is
> > that this patch will activate the bgwriter and start letting in
> > backends instantaneously after a crash, long before we can have any
> > certainty that the DB state really is consistent.
> >
> > In a normal crash recovery situation this would be easily fixed by
> > simply not letting it go to "consistent recovery" state at all, but
> > what about recovery from a restartpoint? We don't want a slave that's
> > crashed once to never let backends in again. But I don't see how to
> > determine that we're far enough past the restartpoint to be consistent
> > again. In crash recovery we assume (without proof ;-)) that we're
> > consistent once we reach the end of valid-looking WAL, but that rule
> > doesn't help for a slave that's following a continuing WAL sequence.
> >
> > Perhaps something could be done based on noting when we have to pull in
> > a WAL segment from the recovery_command, but it sounds like a pretty
> > fragile assumption.
>
> Seems like we just say we only signal the postmaster if
> InArchiveRecovery. Archive recovery from a restartpoint is still archive
> recovery, so this shouldn't be a problem in the way you mention. The
> presence of recovery.conf overrides all other cases.

Anticipating your possible reponses, I would add this also:

There has long been an annoying hole in the PITR scheme which is the
support of recovery using a crashed database. That is there to support
split mirror snapshots, but it creates a loophole where we don't know
the min recovery location, circumventing the care we (you!) took to put
stop/start backup in place.

I think we need to add a parameter to recovery.conf that people can use
to specify a minRecoveryPoint iff there in no backup label file. They
can work out what this should be by following this procedure, which we
should document:
* split mirror, so you have offline copy of crashed database
* copy database away to backup
* go to running database and run pg_current_xlog_insert_location()
* use the value to specify recovery_min_location

If they don't specify this, then bgwriter will not start and you cannot
run in Hot Standby mode. Their choice, so we need not worry then about
the loophole any more.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: List pgsql-patches <pgsql-patches(at)postgresql(dot)org>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCHES] Infrastructure changes for recovery
Date: 2008-09-28 18:02:25
Message-ID: 9794.1222624945@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
> On Thu, 2008-09-25 at 18:28 -0400, Tom Lane wrote:
>> After reading this for awhile, I realized that there is a rather
>> fundamental problem with it: it switches into "consistent recovery"
>> mode as soon as it's read WAL beyond ControlFile->minRecoveryPoint.
>> In a crash recovery situation that typically is before the last
>> checkpoint (if indeed it's not still zero), and what that means is
>> that this patch will activate the bgwriter and start letting in
>> backends instantaneously after a crash, long before we can have any
>> certainty that the DB state really is consistent.
>>
>> In a normal crash recovery situation this would be easily fixed by
>> simply not letting it go to "consistent recovery" state at all, but
>> what about recovery from a restartpoint? We don't want a slave that's
>> crashed once to never let backends in again. But I don't see how to
>> determine that we're far enough past the restartpoint to be consistent
>> again. In crash recovery we assume (without proof ;-)) that we're
>> consistent once we reach the end of valid-looking WAL, but that rule
>> doesn't help for a slave that's following a continuing WAL sequence.
>>
>> Perhaps something could be done based on noting when we have to pull in
>> a WAL segment from the recovery_command, but it sounds like a pretty
>> fragile assumption.

> Seems like we just say we only signal the postmaster if
> InArchiveRecovery. Archive recovery from a restartpoint is still archive
> recovery, so this shouldn't be a problem in the way you mention. The
> presence of recovery.conf overrides all other cases.

What that implements is my comment that we don't have to let anyone in
at all during a plain crash recovery. It does nothing AFAICS for the
problem that when restarting archive recovery from a restartpoint,
it's not clear when it is safe to start letting in backends. You need
to get past the highest LSN that has made it out to disk, and there is
no good way to know what that is.

Unless we can get past this problem the whole thing seems a bit dead in
the water :-(

>> * I'm a bit uncomfortable with the fact that the
>> IsRecoveryProcessingMode flag is read and written with no lock.

> It's not a dynamic state, so I can fix that inside
> IsRecoveryProcessingMode() with a local state to make check faster.

Erm, this code doesn't look like it can allow IsRecoveryProcessingMode
to become locally true in the first place? I guess you could fix it
by initializing IsRecoveryProcessingMode to true, but that seems likely
to break other places. Maybe better is to have an additional local
state variable showing whether the flag has ever been fetched from
shared memory.

The other issues don't seem worth arguing about ...

regards, tom lane


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: List pgsql-patches <pgsql-patches(at)postgresql(dot)org>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCHES] Infrastructure changes for recovery
Date: 2008-09-29 00:54:19
Message-ID: 1222649659.4445.1061.camel@ebony.2ndQuadrant
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


On Sun, 2008-09-28 at 14:02 -0400, Tom Lane wrote:

> It does nothing AFAICS for the
> problem that when restarting archive recovery from a restartpoint,
> it's not clear when it is safe to start letting in backends. You need
> to get past the highest LSN that has made it out to disk, and there is
> no good way to know what that is.
>
> Unless we can get past this problem the whole thing seems a bit dead
> in
> the water :-(

I agree the importance of your a problem but don't fully understand the
circumstances under which you see a problem arising.

AFAICS when we set minRecoveryLoc we *never* unset it. It's recorded in
the controlfile, so whenever we restart we can see that it has been set
previously and now we are beyond it. So if we crash during recovery and
then restart *after* we reached minRecoveryLoc then we resume in safe
mode almost immediately. If we crash during recovery before we reached
minRecoveryLoc then we continue until we find it.

There is a loophole, as described on separate post, but that can be
plugged by offering explicit setting of the minRecoveryLoc from
recovery.conf. Most people use pg_start_backup() so do not experience
the need for that.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: List pgsql-patches <pgsql-patches(at)postgresql(dot)org>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCHES] Infrastructure changes for recovery
Date: 2008-09-29 01:16:01
Message-ID: 22856.1222650961@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
>> It does nothing AFAICS for the
>> problem that when restarting archive recovery from a restartpoint,
>> it's not clear when it is safe to start letting in backends. You need
>> to get past the highest LSN that has made it out to disk, and there is
>> no good way to know what that is.

> AFAICS when we set minRecoveryLoc we *never* unset it. It's recorded in
> the controlfile, so whenever we restart we can see that it has been set
> previously and now we are beyond it.

Right ...

> So if we crash during recovery and
> then restart *after* we reached minRecoveryLoc then we resume in safe
> mode almost immediately.

Wrong.

What minRecoveryLoc is is an upper bound for the LSNs that might be
on-disk in the filesystem backup that an archive recovery starts from.
(Defined as such, it never changes during a restartpoint crash/restart.)
Once you pass that, the on-disk state as modified by any dirty buffers
inside the recovery process represents a consistent database state.
However, the on-disk state alone is not guaranteed consistent. As you
flush some (not all) of your shared buffers you enter other
not-certainly-consistent on-disk states. If we crash in such a state,
we know how to use the last restartpoint plus WAL replay to recover to
another state in which disk + dirty buffers are consistent. However,
we reach such a state only when we have read WAL to beyond the highest
LSN that has reached disk --- and in recovery mode there is no clean
way to determine what that was.

Perhaps a solution is to make XLogFLush not be a no-op in recovery mode,
but have it scribble a highest-LSN somewhere on stable storage (maybe
scribble on pg_control itself, or maybe better someplace else). I'm
not totally sure about that. But I am sure that doing nothing will
be unreliable.

regards, tom lane


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: List pgsql-patches <pgsql-patches(at)postgresql(dot)org>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCHES] Infrastructure changes for recovery
Date: 2008-09-29 12:33:16
Message-ID: 1222691596.4445.1188.camel@ebony.2ndQuadrant
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


On Sun, 2008-09-28 at 21:16 -0400, Tom Lane wrote:
> Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
> >> It does nothing AFAICS for the
> >> problem that when restarting archive recovery from a restartpoint,
> >> it's not clear when it is safe to start letting in backends. You need
> >> to get past the highest LSN that has made it out to disk, and there is
> >> no good way to know what that is.
>
> > AFAICS when we set minRecoveryLoc we *never* unset it. It's recorded in
> > the controlfile, so whenever we restart we can see that it has been set
> > previously and now we are beyond it.
>
> Right ...
>
> > So if we crash during recovery and
> > then restart *after* we reached minRecoveryLoc then we resume in safe
> > mode almost immediately.
>
> Wrong.

OK, see where you're coming from now. Solution is needed, I agree.

> What minRecoveryLoc is is an upper bound for the LSNs that might be
> on-disk in the filesystem backup that an archive recovery starts from.
> (Defined as such, it never changes during a restartpoint crash/restart.)
> Once you pass that, the on-disk state as modified by any dirty buffers
> inside the recovery process represents a consistent database state.
> However, the on-disk state alone is not guaranteed consistent. As you
> flush some (not all) of your shared buffers you enter other
> not-certainly-consistent on-disk states. If we crash in such a state,
> we know how to use the last restartpoint plus WAL replay to recover to
> another state in which disk + dirty buffers are consistent. However,
> we reach such a state only when we have read WAL to beyond the highest
> LSN that has reached disk --- and in recovery mode there is no clean
> way to determine what that was.
>
> Perhaps a solution is to make XLogFLush not be a no-op in recovery mode,
> but have it scribble a highest-LSN somewhere on stable storage (maybe
> scribble on pg_control itself, or maybe better someplace else). I'm
> not totally sure about that. But I am sure that doing nothing will
> be unreliable.

No need to write highest LSN to disk constantly...

If we restart from a restartpoint then initially the current apply LSN
will be potentially/probably earlier than the latest on-disk LSN, as you
say. But once we have completed the next restartpoint *after* the value
pg_control says then we will be guaranteed that the two LSNs are the
same, since otherwise we would have restarted at a later point.

That kinda works, but the problem is that restartpoints are time based,
not log based. We need them to be deterministic for us to rely upon them
in the above way. If we crash and then replay we can only be certain we
are safe when we have found a restartpoint that the previous recovery
will definitely have reached.

So we must have log-based restartpoints, using either a constant LSN
offset, or a parameter like checkpoint_segments. But if it is changeable
then it needs to be written into the control file, so we don't make a
mistake about it.

So we need to:
* add an extra test to delay safe point if required
* write restart_segments value to control file
* force a restartpoint on first valid checkpoint WAL record after we
have passed restart_segments worth of log

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: List pgsql-patches <pgsql-patches(at)postgresql(dot)org>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCHES] Infrastructure changes for recovery
Date: 2008-09-29 12:46:46
Message-ID: 10307.1222692406@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
> ... That kinda works, but the problem is that restartpoints are time based,
> not log based. We need them to be deterministic for us to rely upon them
> in the above way.

Right, but the performance disadvantages of making them strictly
log-distance-based are pretty daunting. We don't really want slaves
doing that while they're in catchup mode.

regards, tom lane


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: List pgsql-patches <pgsql-patches(at)postgresql(dot)org>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCHES] Infrastructure changes for recovery
Date: 2008-09-29 13:53:28
Message-ID: 1222696408.4445.1229.camel@ebony.2ndQuadrant
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


On Mon, 2008-09-29 at 08:46 -0400, Tom Lane wrote:
> Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
> > ... That kinda works, but the problem is that restartpoints are time based,
> > not log based. We need them to be deterministic for us to rely upon them
> > in the above way.
>
> Right, but the performance disadvantages of making them strictly
> log-distance-based are pretty daunting. We don't really want slaves
> doing that while they're in catchup mode.

I don't think we need to perform restartpoints actually, now I think
about it. It's only the LSN that is important.

I think we can get away with writing the LSN value to disk, as you
suggested, but only every so often. No need to do it after every WAL
record, just consistently every so often, so it gives us a point at
which we know we are safe. We will need to have Startup process block
momentarily while the value is written.

Propose Startup process writes/flushes LSN to pg_control every time we
change xlogid. That's independent of WAL file size and fairly clear.

When we reach that LSN + 1 we will know that no LSNs higher than that
value can have reached disk.

OK?

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: List pgsql-patches <pgsql-patches(at)postgresql(dot)org>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCHES] Infrastructure changes for recovery
Date: 2008-09-29 14:13:59
Message-ID: 11540.1222697639@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
> I think we can get away with writing the LSN value to disk, as you
> suggested, but only every so often. No need to do it after every WAL
> record, just consistently every so often, so it gives us a point at
> which we know we are safe.

Huh? How does that make you safe? What you need to know is the max
LSN that could possibly be on disk.

Hmm, actually we could get away with tying this to fetching WAL files
from the archive. When switching to a new WAL file, write out the
*ending* WAL address of that file to pg_control. Then process the WAL
records in it. Whether or not any of the affected pages get to disk,
we know that there is no LSN on disk exceeding what we already put in
pg_control. If we crash and restart, we'll have to get to the end
of this file before we start letting backends in; which might be further
than we actually got before the crash, but not too much further because
we already know the whole WAL file is available.

Or is that the same thing you were saying? The detail about using
the end address seems fairly critical, and you didn't mention it...

regards, tom lane


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: List pgsql-patches <pgsql-patches(at)postgresql(dot)org>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCHES] Infrastructure changes for recovery
Date: 2008-09-29 15:06:44
Message-ID: 1222700804.4445.1271.camel@ebony.2ndQuadrant
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


On Mon, 2008-09-29 at 10:13 -0400, Tom Lane wrote:
> Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
> > I think we can get away with writing the LSN value to disk, as you
> > suggested, but only every so often. No need to do it after every WAL
> > record, just consistently every so often, so it gives us a point at
> > which we know we are safe.
>
> Huh? How does that make you safe? What you need to know is the max
> LSN that could possibly be on disk.
>
> Hmm, actually we could get away with tying this to fetching WAL files
> from the archive. When switching to a new WAL file, write out the
> *ending* WAL address of that file to pg_control. Then process the WAL
> records in it. Whether or not any of the affected pages get to disk,
> we know that there is no LSN on disk exceeding what we already put in
> pg_control. If we crash and restart, we'll have to get to the end
> of this file before we start letting backends in; which might be further
> than we actually got before the crash, but not too much further because
> we already know the whole WAL file is available.
>
> Or is that the same thing you were saying? The detail about using
> the end address seems fairly critical, and you didn't mention it...

Same! Just said safe point was "LSN + 1", and since end = next start.

Looks we've got a solution, no matter how it's described. (I actually
have a more detailed proof of safety using snapshots/MVCC considerations
so I wasn't overly worried but what we've discussed is much easier to
understand and agree. Proof of safety is all we need, and this simpler
proof is more secure.)

Don't want to make it per file though. Big systems can whizz through WAL
files very quickly, so we either make it a big number e.g. 255 files per
xlogid, or we make it settable (and recorded in pg_control).

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: List pgsql-patches <pgsql-patches(at)postgresql(dot)org>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCHES] Infrastructure changes for recovery
Date: 2008-09-29 15:24:08
Message-ID: 12423.1222701848@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
> On Mon, 2008-09-29 at 10:13 -0400, Tom Lane wrote:
>> ... If we crash and restart, we'll have to get to the end
>> of this file before we start letting backends in; which might be further
>> than we actually got before the crash, but not too much further because
>> we already know the whole WAL file is available.

> Don't want to make it per file though. Big systems can whizz through WAL
> files very quickly, so we either make it a big number e.g. 255 files per
> xlogid, or we make it settable (and recorded in pg_control).

I think you are missing the point I made above. If you set the
okay-to-resume point N files ahead, and then the master stops generating
files so quickly, you've got a problem --- it might be a long time until
the slave starts letting backends in after a crash/restart.

Fetching a new WAL segment from the archive is expensive enough that an
additional write/fsync per cycle doesn't seem that big a problem to me.
There's almost certainly a few fsync-equivalents going on in the
filesystem to create and delete the retrieved segment files.

regards, tom lane


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: List pgsql-patches <pgsql-patches(at)postgresql(dot)org>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCHES] Infrastructure changes for recovery
Date: 2008-09-29 15:54:55
Message-ID: 1222703695.4445.1283.camel@ebony.2ndQuadrant
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


On Mon, 2008-09-29 at 11:24 -0400, Tom Lane wrote:
> Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
> > On Mon, 2008-09-29 at 10:13 -0400, Tom Lane wrote:
> >> ... If we crash and restart, we'll have to get to the end
> >> of this file before we start letting backends in; which might be further
> >> than we actually got before the crash, but not too much further because
> >> we already know the whole WAL file is available.
>
> > Don't want to make it per file though. Big systems can whizz through WAL
> > files very quickly, so we either make it a big number e.g. 255 files per
> > xlogid, or we make it settable (and recorded in pg_control).
>
> I think you are missing the point I made above. If you set the
> okay-to-resume point N files ahead, and then the master stops generating
> files so quickly, you've got a problem --- it might be a long time until
> the slave starts letting backends in after a crash/restart.
>
> Fetching a new WAL segment from the archive is expensive enough that an
> additional write/fsync per cycle doesn't seem that big a problem to me.
> There's almost certainly a few fsync-equivalents going on in the
> filesystem to create and delete the retrieved segment files.

Didn't miss yer point, just didn't agree. :-)

I'll put it at one (1) and then wait for any negative perf reports. No
need to worry about things like that until later.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: List pgsql-patches <pgsql-patches(at)postgresql(dot)org>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCHES] Infrastructure changes for recovery
Date: 2008-10-02 22:11:28
Message-ID: 1222985488.4445.1625.camel@ebony.2ndQuadrant
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


On Thu, 2008-09-25 at 18:28 -0400, Tom Lane wrote:
> Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
> > Version 7
>
> After reading this for awhile, I realized that there is a rather
> fundamental problem with it: it switches into "consistent recovery"
> mode as soon as it's read WAL beyond ControlFile->minRecoveryPoint.

Just seen this patch has been bounced into November CommitFest, even
though the new patch fixes all of the concerns raised.

I'm concerned that this is going to make the final Hot Standby patch
fairly large, which will make it even harder to review, test and
generally get accepted.

What's the best way to make this easier for you/others to review?

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCHES] Infrastructure changes for recovery
Date: 2008-10-02 23:07:55
Message-ID: 48E5544B.5050103@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Simon Riggs wrote:
> On Thu, 2008-09-25 at 18:28 -0400, Tom Lane wrote:
>
>> Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
>>
>>> Version 7
>>>
>> After reading this for awhile, I realized that there is a rather
>> fundamental problem with it: it switches into "consistent recovery"
>> mode as soon as it's read WAL beyond ControlFile->minRecoveryPoint.
>>
>
> Just seen this patch has been bounced into November CommitFest, even
> though the new patch fixes all of the concerns raised.
>
> I'm concerned that this is going to make the final Hot Standby patch
> fairly large, which will make it even harder to review, test and
> generally get accepted.
>
> What's the best way to make this easier for you/others to review?
>
>

The fact that it's been put on the November list doesn't mean it can't
be reviewed and committed before then.

cheers

andrew


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCHES] Infrastructure changes for recovery
Date: 2008-10-07 08:17:25
Message-ID: 1223367445.4747.105.camel@ebony.2ndQuadrant
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


On Thu, 2008-10-02 at 19:07 -0400, Andrew Dunstan wrote:

> Simon Riggs wrote:

> > Just seen this patch has been bounced into November CommitFest, even
> > though the new patch fixes all of the concerns raised.
> >
> > I'm concerned that this is going to make the final Hot Standby patch
> > fairly large, which will make it even harder to review, test and
> > generally get accepted.
> >
> > What's the best way to make this easier for you/others to review?

> The fact that it's been put on the November list doesn't mean it can't
> be reviewed and committed before then.

But that seems unlikely to be the case.

A patch specifically marked as "required for other work" has been
delayed by more than 5 weeks on queue and nobody was ever assigned to
review it. That was exactly the problem CommitFests were supposed to
resolve and from my perspective this is a systemic failure. If I had
submitted the patch a month late it wouldn't have got reviewed any
earlier, yet many people would cry foul (why?). The current system means
I have to code up to the deadline, officially do nothing for a month,
then respond within hours to code reviews or have the patch rejected for
another month. It works great for minor patches, but its simply not
working for bigger features. It's just not possible to be fully
available to respond to reviews, yet at the same time not able to work
more than about 25% of the development calendar.

Luckily Tom reviewed it, but with no commit and no guidance on how to
proceed this still leaves me in a difficult position.

I'm forced now to leave much of this code behind, since I cannot now
complete Hot Standby at the same time as having bgwriter active during
recovery, if that code is at risk of causing the whole thing to be
rejected. Are the two together a risk? No. Is developing them together
harder? Yes. Do *I* trust my own code? Yes, but its reviewers that
count. Is it a good thing for Postgres to leave this code behind?
Probably not. Can I add it later? Maybe.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCHES] Infrastructure changes for recovery
Date: 2008-10-07 14:08:47
Message-ID: 13866.1223388527@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
> A patch specifically marked as "required for other work" has been
> delayed by more than 5 weeks on queue and nobody was ever assigned to
> review it. That was exactly the problem CommitFests were supposed to
> resolve and from my perspective this is a systemic failure.

To be blunt, that patch spent most of September in "waiting for author"
state. Looking in the archives, I see that

* Original patch was posted on 31-Aug.

* I reviewed that patch on 8-Sep.

* You posted a revised patch on 10-Sep, but it was explicitly marked
as not ready to be actioned.

* It was not until 23-Sep that a patch was posted that you stated
you were happy with.

* I reviewed that one on 25-Sep.

* The patch now in the queue was posted on 30-Sep (all of 8 minutes
before midnight).

I don't see any systemic failure here.

regards, tom lane


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCHES] Infrastructure changes for recovery
Date: 2008-10-07 14:27:17
Message-ID: 1223389637.4747.194.camel@ebony.2ndQuadrant
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


On Tue, 2008-10-07 at 10:08 -0400, Tom Lane wrote:
> Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
> > A patch specifically marked as "required for other work" has been
> > delayed by more than 5 weeks on queue and nobody was ever assigned to
> > review it. That was exactly the problem CommitFests were supposed to
> > resolve and from my perspective this is a systemic failure.
>
> To be blunt...

The bluntness was mine, for which I apologise. The last review uncovered
essential behaviour I was missing, no doubt about that.

I'm just grumpy because I can't see a way to do the
patch-on-patch-on-patch that I'll need to make this all work for Nov 1.
So big patch here we come. But that's just the way it is and I'll stop
honking about it.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Dunstan <andrew(at)dunslane(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCHES] Infrastructure changes for recovery
Date: 2008-10-07 14:37:07
Message-ID: 20081007143707.GA19440@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Simon Riggs wrote:

> I'm just grumpy because I can't see a way to do the
> patch-on-patch-on-patch that I'll need to make this all work for Nov 1.
> So big patch here we come. But that's just the way it is and I'll stop
> honking about it.

This is one of the problems that DVCSs are supposed to solve ... have
you tried Git?

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: "Robert Haas" <robertmhaas(at)gmail(dot)com>
To: "Alvaro Herrera" <alvherre(at)commandprompt(dot)com>
Cc: "Simon Riggs" <simon(at)2ndquadrant(dot)com>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Andrew Dunstan" <andrew(at)dunslane(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCHES] Infrastructure changes for recovery
Date: 2008-10-07 14:46:28
Message-ID: 603c8f070810070746mdec86f7o8721dd1a8374ae94@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

>> I'm just grumpy because I can't see a way to do the
>> patch-on-patch-on-patch that I'll need to make this all work for Nov 1.
>> So big patch here we come. But that's just the way it is and I'll stop
>> honking about it.
>
> This is one of the problems that DVCSs are supposed to solve ... have
> you tried Git?

I think the other problem here is that the difficulty of getting the
patch landed increases more than linearly with its size. If it's hard
to get a patch of size X landed in one CommitFest, what are the
chances of landing on three times as large, with three times as many
changes to argue about? Getting things done in stages makes it easier
to build on earlier work without worrying that you'll be asked to go
back and redo everything.

...Robert


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Robert Haas" <robertmhaas(at)gmail(dot)com>
Cc: "Alvaro Herrera" <alvherre(at)commandprompt(dot)com>, "Simon Riggs" <simon(at)2ndquadrant(dot)com>, "Andrew Dunstan" <andrew(at)dunslane(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCHES] Infrastructure changes for recovery
Date: 2008-10-07 15:05:45
Message-ID: 14677.1223391945@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

"Robert Haas" <robertmhaas(at)gmail(dot)com> writes:
>>> I'm just grumpy because I can't see a way to do the
>>> patch-on-patch-on-patch that I'll need to make this all work for Nov 1.
>>> So big patch here we come. But that's just the way it is and I'll stop
>>> honking about it.
>>
>> This is one of the problems that DVCSs are supposed to solve ... have
>> you tried Git?

> I think the other problem here is that the difficulty of getting the
> patch landed increases more than linearly with its size.

Yeah. Tools aren't really the key problem there IMHO.

I agree with Simon that it would be a good thing if this patch got
landed before he went on to the next part. But personally I'm a bit
burned out from commitfest and am not eager to go take a third look
at the patch right now. Does anyone else want to review it?

regards, tom lane


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Dunstan <andrew(at)dunslane(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCHES] Infrastructure changes for recovery
Date: 2008-10-07 15:08:58
Message-ID: 1223392138.4747.215.camel@ebony.2ndQuadrant
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


On Tue, 2008-10-07 at 10:37 -0400, Alvaro Herrera wrote:
> Simon Riggs wrote:
>
> > I'm just grumpy because I can't see a way to do the
> > patch-on-patch-on-patch that I'll need to make this all work for Nov 1.
> > So big patch here we come. But that's just the way it is and I'll stop
> > honking about it.
>
> This is one of the problems that DVCSs are supposed to solve ... have
> you tried Git?

I have no doubt there's a better way, but no, I don't know it. My
braintime is devoted to databases not computing per se. Most of the time
that's a good time allocation decision.

Not convinced this is the right time to invest in side activities, but
if you think so, I'll look into it.

Anybody wanting to write or link to a Simon's Guide, most welcome.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support


From: "Dave Page" <dpage(at)pgadmin(dot)org>
To: "Simon Riggs" <simon(at)2ndquadrant(dot)com>
Cc: "Alvaro Herrera" <alvherre(at)commandprompt(dot)com>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Andrew Dunstan" <andrew(at)dunslane(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCHES] Infrastructure changes for recovery
Date: 2008-10-07 15:19:21
Message-ID: 937d27e10810070819y5f6be128p88428d636f5b1976@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Tue, Oct 7, 2008 at 4:08 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>
> Not convinced this is the right time to invest in side activities, but
> if you think so, I'll look into it.
>
> Anybody wanting to write or link to a Simon's Guide, most welcome.

Heikki will be presenting a talk about GIT + PG at PGDay next week.
Might be useful.

--
Dave Page
EnterpriseDB UK: http://www.enterprisedb.com


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Dunstan <andrew(at)dunslane(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCHES] Infrastructure changes for recovery
Date: 2008-10-07 15:19:45
Message-ID: 48EB7E11.4010603@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Alvaro Herrera wrote:
> Simon Riggs wrote:
>> I'm just grumpy because I can't see a way to do the
>> patch-on-patch-on-patch that I'll need to make this all work for Nov 1.
>> So big patch here we come. But that's just the way it is and I'll stop
>> honking about it.
>
> This is one of the problems that DVCSs are supposed to solve ... have
> you tried Git?

Yeah, I maintained a relforks patch + FSM patch on top of that for quite
some time using git. It's still a bit of work, for sure, but it's possible.

The bottom line is that hot standby is a big feature, and probably a big
patch. No amount of version control will work around that. Finishing all
that in a few weeks is a very ambitious goal. I wish you luck, and I
wish I could do more to help you with that; it's a feature I'd really
like to see in 8.4. Given how many iterations just this part of the
patch needed, and it's still not there, the little project manager in me
says that we really need to be seeing the complete patch or patches very
soon, and do a first round of review well before the commit fest. The
risk that we have to drop it as unfinished in November is very high
otherwise.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Dunstan <andrew(at)dunslane(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCHES] Infrastructure changes for recovery
Date: 2008-10-07 16:07:46
Message-ID: 1223395666.4747.228.camel@ebony.2ndQuadrant
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


On Tue, 2008-10-07 at 18:19 +0300, Heikki Linnakangas wrote:

> The bottom line is that hot standby is a big feature, and probably a big
> patch. No amount of version control will work around that. Finishing all
> that in a few weeks is a very ambitious goal. I wish you luck, and I
> wish I could do more to help you with that; it's a feature I'd really
> like to see in 8.4. Given how many iterations just this part of the
> patch needed, and it's still not there, the little project manager in me
> says that we really need to be seeing the complete patch or patches very
> soon, and do a first round of review well before the commit fest. The
> risk that we have to drop it as unfinished in November is very high
> otherwise.

Agreed and understood. The coding isn't the problem, its the risk of
being blindsided by a showstopper. Coding is quick when you know exactly
what you are trying to achieve.

That's one reason to strip out the bgwriter stuff. It's the postmaster
state change stuff that's most needed. Anyway, watch this space.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Simon Riggs <simon(at)2ndQuadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCHES] Infrastructure changes for recovery
Date: 2009-01-27 14:09:51
Message-ID: 497F15AF.1040800@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

(replying to a very old message, since I just bumped into this in review)

Alvaro Herrera wrote:
> Simon Riggs wrote:
>> On Fri, 2008-09-12 at 14:14 -0400, Alvaro Herrera wrote:
>>> Simon Riggs wrote:
>>>
>>>> --- 5716,5725 ----
>>>> CheckpointStats.ckpt_sync_end_t,
>>>> &sync_secs, &sync_usecs);
>>>>
>>>> ! elog(LOG, "%s complete: wrote %d buffers (%.1f%%); "
>>>> "%d transaction log file(s) added, %d removed, %d recycled; "
>>>> "write=%ld.%03d s, sync=%ld.%03d s, total=%ld.%03d s",
>>>> + (checkpoint ? " checkpoint" : "restartpoint"),
>>>> CheckpointStats.ckpt_bufs_written,
>>>> (double) CheckpointStats.ckpt_bufs_written * 100 / NBuffers,
>>>> CheckpointStats.ckpt_segs_added,
>>> Very minor nit: this really needs a rework.
>> All I changed was the word "restartpoint"... its otherwise identical to
>> existing message. I'd rather not change that.
>
> The new message is not translatable, the original was.

Doesn't really matter since it's an elog(), not ereport().

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Simon Riggs <simon(at)2ndQuadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCHES] Infrastructure changes for recovery
Date: 2009-01-27 15:37:53
Message-ID: 15190.1233070673@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
> (replying to a very old message, since I just bumped into this in review)
> Alvaro Herrera wrote:
>> The new message is not translatable, the original was.

> Doesn't really matter since it's an elog(), not ereport().

... which is wrong in itself, since it's certainly meant as a
user-facing (or at least DBA-facing) message. elog should generally
only be used for debugging or "can't happen" messages, not for stuff
that users are expected to see on a routine basis.

regards, tom lane