Quick Links

Re: Hot Backup with rsync fails at pg_clog if under load

Lists:	pgsql-hackers

From:	Chris Redekop <chris(at)replicon(dot)com>
To:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Backup with rsync fails at pg_clog if under load
Date:	2011-10-16 01:33:33
Message-ID:	CAC2SuRJTtsXL67EuTdd25GThYZBZTcepfG7Z0Vvv2pZQWzJD=w@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> > Linas, could you capture the output of pg_controldata *and* increase the
> > log level to DEBUG1 on the standby? We should then see nextXid value of
> > the checkpoint the recovery is starting from.
>
> I'll try to do that whenever I'm in that territory again... Incidentally,
> recently there was a lot of unrelated-to-this-post work to polish things
up
> for a talk being given at PGWest 2011 Today :)
>
> > I also checked what rsync does when a file vanishes after rsync computed
the
> > file list, but before it is sent. rsync 3.0.7 on OSX, at least,
complains
> > loudly, and doesn't sync the file. It BTW also exits non-zero, with a
special
> > exit code for precisely that failure case.
>
> To be precise, my script has logic to accept the exit code 24, just as
> stated in PG manual:
>
> Docs> For example, some versions of rsync return a separate exit code for
> Docs> "vanished source files", and you can write a driver script to accept
> Docs> this exit code as a non-error case.

I also am running into this issue and can reproduce it very reliably. For
me, however, it happens even when doing the "fast backup" like so:
pg_start_backup('whatever', true)...my traffic is more write-heavy than
linas's tho, so that might have something to do with it. Yesterday it
reliably errored out on pg_clog every time, but today it is
failing sporadically on pg_subtrans (which seems to be past where the
pg_clog error was)....the only thing that has changed is that I've changed
the log level to debug1....I wouldn't think that could be related though.
I've linked the requested pg_controldata and debug1 logs for both errors.
Both links contain the output from pg_start_backup, rsync, pg_stop_backup,
pg_controldata, and then the postgres debug1 log produced from a subsequent
startup attempt.

pg_clog: http://pastebin.com/mTfdcjwH
pg_subtrans: http://pastebin.com/qAXEHAQt

Any workarounds would be very appreciated.....would copying clog+subtrans
before or after the rest of the data directory (or something like that) make
any difference?

Thanks!

From:	Chris Redekop <chris(at)replicon(dot)com>
To:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Backup with rsync fails at pg_clog if under load
Date:	2011-10-17 20:13:50
Message-ID:	CAC2SuRKyt0K35ExCTNnYuf+3XfAjSJJQV_sVX+O4jVTfDuAxog@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

I can confirm that both the pg_clog and pg_subtrans errors do occur when
using pg_basebackup instead of rsync. The data itself seems to be fine
because using the exact same data I can start up a warm standby no problem,
it is just the hot standby that will not start up.

On Sat, Oct 15, 2011 at 7:33 PM, Chris Redekop <chris(at)replicon(dot)com> wrote:

> > > Linas, could you capture the output of pg_controldata *and* increase
> the
> > > log level to DEBUG1 on the standby? We should then see nextXid value of
> > > the checkpoint the recovery is starting from.
> >
> > I'll try to do that whenever I'm in that territory again... Incidentally,
> > recently there was a lot of unrelated-to-this-post work to polish things
> up
> > for a talk being given at PGWest 2011 Today :)
> >
> > > I also checked what rsync does when a file vanishes after rsync
> computed the
> > > file list, but before it is sent. rsync 3.0.7 on OSX, at least,
> complains
> > > loudly, and doesn't sync the file. It BTW also exits non-zero, with a
> special
> > > exit code for precisely that failure case.
> >
> > To be precise, my script has logic to accept the exit code 24, just as
> > stated in PG manual:
> >
> > Docs> For example, some versions of rsync return a separate exit code for
> > Docs> "vanished source files", and you can write a driver script to
> accept
> > Docs> this exit code as a non-error case.
>
> I also am running into this issue and can reproduce it very reliably. For
> me, however, it happens even when doing the "fast backup" like so:
> pg_start_backup('whatever', true)...my traffic is more write-heavy than
> linas's tho, so that might have something to do with it. Yesterday it
> reliably errored out on pg_clog every time, but today it is
> failing sporadically on pg_subtrans (which seems to be past where the
> pg_clog error was)....the only thing that has changed is that I've changed
> the log level to debug1....I wouldn't think that could be related though.
> I've linked the requested pg_controldata and debug1 logs for both errors.
> Both links contain the output from pg_start_backup, rsync, pg_stop_backup,
> pg_controldata, and then the postgres debug1 log produced from a subsequent
> startup attempt.
>
> pg_clog: http://pastebin.com/mTfdcjwH
> pg_subtrans: http://pastebin.com/qAXEHAQt
>
> Any workarounds would be very appreciated.....would copying clog+subtrans
> before or after the rest of the data directory (or something like that) make
> any difference?
>
> Thanks!
>

From:	Chris Redekop <chris(at)replicon(dot)com>
To:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Backup with rsync fails at pg_clog if under load
Date:	2011-10-17 21:30:49
Message-ID:	CAC2SuRJPNqPe7Ga8LT6Q-vOgn05BmiH=y6F_dfiVu5u3NfOs=w@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Well, on the other hand maybe there is something wrong with the data.
Here's the test/steps I just did -
1. I do the pg_basebackup when the master is under load, hot slave now will
not start up but warm slave will.
2. I start a warm slave and let it catch up to current
3. On the slave I change 'hot_standby=on' and do a 'service postgresql
restart'
4. The postgres fails to restart with the same error.
5. I turn hot_standby back off and postgres starts back up fine as a warm
slave
6. I then turn off the load, the slave is all caught up, master and slave
are both sitting idle
7. I, again, change 'hot_standby=on' and do a service restart
8. Again it fails, with the same error, even though there is no longer any
load.
9. I repeat this warmstart/hotstart cycle a couple more times until to my
surprise, instead of failing, it successfully starts up as a hot standby
(this is after maybe 5 minutes or so of sitting idle)

So...given that it continued to fail even after the load had been turned of,
that makes me believe that the data which was copied over was invalid in
some way. And when a checkpoint/logrotation/somethingelse occurred when not
under load it cleared itself up....I'm shooting in the dark here

Anyone have any suggestions/ideas/things to try?

On Mon, Oct 17, 2011 at 2:13 PM, Chris Redekop <chris(at)replicon(dot)com> wrote:

> I can confirm that both the pg_clog and pg_subtrans errors do occur when
> using pg_basebackup instead of rsync. The data itself seems to be fine
> because using the exact same data I can start up a warm standby no problem,
> it is just the hot standby that will not start up.
>
>
> On Sat, Oct 15, 2011 at 7:33 PM, Chris Redekop <chris(at)replicon(dot)com> wrote:
>
>> > > Linas, could you capture the output of pg_controldata *and* increase
>> the
>> > > log level to DEBUG1 on the standby? We should then see nextXid value
>> of
>> > > the checkpoint the recovery is starting from.
>> >
>> > I'll try to do that whenever I'm in that territory again...
>> Incidentally,
>> > recently there was a lot of unrelated-to-this-post work to polish things
>> up
>> > for a talk being given at PGWest 2011 Today :)
>> >
>> > > I also checked what rsync does when a file vanishes after rsync
>> computed the
>> > > file list, but before it is sent. rsync 3.0.7 on OSX, at least,
>> complains
>> > > loudly, and doesn't sync the file. It BTW also exits non-zero, with a
>> special
>> > > exit code for precisely that failure case.
>> >
>> > To be precise, my script has logic to accept the exit code 24, just as
>> > stated in PG manual:
>> >
>> > Docs> For example, some versions of rsync return a separate exit code
>> for
>> > Docs> "vanished source files", and you can write a driver script to
>> accept
>> > Docs> this exit code as a non-error case.
>>
>> I also am running into this issue and can reproduce it very reliably. For
>> me, however, it happens even when doing the "fast backup" like so:
>> pg_start_backup('whatever', true)...my traffic is more write-heavy than
>> linas's tho, so that might have something to do with it. Yesterday it
>> reliably errored out on pg_clog every time, but today it is
>> failing sporadically on pg_subtrans (which seems to be past where the
>> pg_clog error was)....the only thing that has changed is that I've changed
>> the log level to debug1....I wouldn't think that could be related though.
>> I've linked the requested pg_controldata and debug1 logs for both errors.
>> Both links contain the output from pg_start_backup, rsync, pg_stop_backup,
>> pg_controldata, and then the postgres debug1 log produced from a subsequent
>> startup attempt.
>>
>> pg_clog: http://pastebin.com/mTfdcjwH
>> pg_subtrans: http://pastebin.com/qAXEHAQt
>>
>> Any workarounds would be very appreciated.....would copying clog+subtrans
>> before or after the rest of the data directory (or something like that) make
>> any difference?
>>
>> Thanks!
>>
>
>

From:	Daniel Farina <daniel(at)heroku(dot)com>
To:	Chris Redekop <chris(at)replicon(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Backup with rsync fails at pg_clog if under load
Date:	2011-10-23 20:48:06
Message-ID:	CAAZKuFbUSxOb8x80k9-fD9CsXBRmzTNWutqcDVGaTvwFTvURiQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Oct 17, 2011 at 11:30 PM, Chris Redekop <chris(at)replicon(dot)com> wrote:
> Well, on the other hand maybe there is something wrong with the data.
> Here's the test/steps I just did -
> 1. I do the pg_basebackup when the master is under load, hot slave now will
> not start up but warm slave will.
> 2. I start a warm slave and let it catch up to current
> 3. On the slave I change 'hot_standby=on' and do a 'service postgresql
> restart'
> 4. The postgres fails to restart with the same error.
> 5. I turn hot_standby back off and postgres starts back up fine as a warm
> slave
> 6. I then turn off the load, the slave is all caught up, master and slave
> are both sitting idle
> 7. I, again, change 'hot_standby=on' and do a service restart
> 8. Again it fails, with the same error, even though there is no longer any
> load.
> 9. I repeat this warmstart/hotstart cycle a couple more times until to my
> surprise, instead of failing, it successfully starts up as a hot standby
> (this is after maybe 5 minutes or so of sitting idle)
> So...given that it continued to fail even after the load had been turned of,
> that makes me believe that the data which was copied over was invalid in
> some way. And when a checkpoint/logrotation/somethingelse occurred when not
> under load it cleared itself up....I'm shooting in the dark here
> Anyone have any suggestions/ideas/things to try?

Having digged at this a little -- but not too much -- the problem
seems to be that postgres is reading the commit logs way, way too
early, that is to say, before it has played enough WAL to be
'consistent' (the WAL between pg_start and pg_stop backup). I have
not been able to reproduce this problem (I think) after the message
from postgres suggesting it has reached a consistent state; at that
time I am able to go into hot-standby mode.

The message is like: "consistent recovery state reached at %X/%X".
(this is the errmsg)

It doesn't seem meaningful for StartupCLOG (or, indeed, any of the
hot-standby path functionality) to be called before that code is
executed, but it is anyway right now. I'm not sure if this oversight
is simply an oversight, or indicative of a misplaced assumption
somewhere. Basically, my thoughts for a fix are to suppress
hot_standby = on (in spirit) before the consistent recovery state is
reached.

--
fdr

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Daniel Farina <daniel(at)heroku(dot)com>
Cc:	Chris Redekop <chris(at)replicon(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Backup with rsync fails at pg_clog if under load
Date:	2011-10-23 22:39:01
Message-ID:	CA+U5nM+pTpT6eWrHD57y-X_MqpicPMguJPXJVvcq1nG1Rid80Q@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sun, Oct 23, 2011 at 9:48 PM, Daniel Farina <daniel(at)heroku(dot)com> wrote:
> On Mon, Oct 17, 2011 at 11:30 PM, Chris Redekop <chris(at)replicon(dot)com> wrote:
>> Well, on the other hand maybe there is something wrong with the data.
>> Here's the test/steps I just did -
>> 1. I do the pg_basebackup when the master is under load, hot slave now will
>> not start up but warm slave will.
>> 2. I start a warm slave and let it catch up to current
>> 3. On the slave I change 'hot_standby=on' and do a 'service postgresql
>> restart'
>> 4. The postgres fails to restart with the same error.
>> 5. I turn hot_standby back off and postgres starts back up fine as a warm
>> slave
>> 6. I then turn off the load, the slave is all caught up, master and slave
>> are both sitting idle
>> 7. I, again, change 'hot_standby=on' and do a service restart
>> 8. Again it fails, with the same error, even though there is no longer any
>> load.
>> 9. I repeat this warmstart/hotstart cycle a couple more times until to my
>> surprise, instead of failing, it successfully starts up as a hot standby
>> (this is after maybe 5 minutes or so of sitting idle)
>> So...given that it continued to fail even after the load had been turned of,
>> that makes me believe that the data which was copied over was invalid in
>> some way. And when a checkpoint/logrotation/somethingelse occurred when not
>> under load it cleared itself up....I'm shooting in the dark here
>> Anyone have any suggestions/ideas/things to try?
>
> Having digged at this a little -- but not too much -- the problem
> seems to be that postgres is reading the commit logs way, way too
> early, that is to say, before it has played enough WAL to be
> 'consistent' (the WAL between pg_start and pg_stop backup). I have
> not been able to reproduce this problem (I think) after the message
> from postgres suggesting it has reached a consistent state; at that
> time I am able to go into hot-standby mode.
>
> The message is like: "consistent recovery state reached at %X/%X".
> (this is the errmsg)
>
> It doesn't seem meaningful for StartupCLOG (or, indeed, any of the
> hot-standby path functionality) to be called before that code is
> executed, but it is anyway right now. I'm not sure if this oversight
> is simply an oversight, or indicative of a misplaced assumption
> somewhere. Basically, my thoughts for a fix are to suppress
> hot_standby = on (in spirit) before the consistent recovery state is
> reached.

Not sure about that, but I'll look at where this comes from.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Chris Redekop <chris(at)replicon(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Backup with rsync fails at pg_clog if under load
Date:	2011-10-23 23:27:40
Message-ID:	CA+U5nMLvMp8ikF+9HYaJEtqUQyJkP5WHBEqZqbBswn6KDwCx1Q@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sun, Oct 16, 2011 at 2:33 AM, Chris Redekop <chris(at)replicon(dot)com> wrote:

> pg_subtrans: http://pastebin.com/qAXEHAQt

I confirm this as a HS issue and will investigate from here.

FATAL: could not access status of transaction 21110784
which, in pg_subtrans, is the first xid on a new subtrans page. So we
have missed zeroing a page.

pg_control shows ... Latest checkpoint's oldestActiveXID: 21116666
which shows quite clearly that the pg_control file is later than it should be.

Chris, can you rearrange the backup so you copy the pg_control file as
the first act after the pg_start_backup?

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Daniel Farina <daniel(at)heroku(dot)com>
Cc:	Chris Redekop <chris(at)replicon(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Backup with rsync fails at pg_clog if under load
Date:	2011-10-23 23:33:59
Message-ID:	CA+U5nM+37AyVi19Y8Ww5XO=KyPnbu01o4o6o_6Xe02EVdJnvPw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sun, Oct 23, 2011 at 9:48 PM, Daniel Farina <daniel(at)heroku(dot)com> wrote:

> Having digged at this a little -- but not too much -- the problem
> seems to be that postgres is reading the commit logs way, way too
> early, that is to say, before it has played enough WAL to be
> 'consistent' (the WAL between pg_start and pg_stop backup). I have
> not been able to reproduce this problem (I think) after the message
> from postgres suggesting it has reached a consistent state; at that
> time I am able to go into hot-standby mode.

The WAL appears too early because the other control info is later than
it should be.

So this is approx backwards and nothing related to consistent state,
but thanks for drawing my attention to this.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Florian Pflug <fgp(at)phlo(dot)org>
To:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc:	Chris Redekop <chris(at)replicon(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Backup with rsync fails at pg_clog if under load
Date:	2011-10-24 06:05:01
Message-ID:	86FA4083-5CD9-4AAF-BCD6-0226420DAA45@phlo.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Oct24, 2011, at 01:27 , Simon Riggs wrote:
> FATAL: could not access status of transaction 21110784
> which, in pg_subtrans, is the first xid on a new subtrans page. So we
> have missed zeroing a page.
>
> pg_control shows ... Latest checkpoint's oldestActiveXID: 21116666
> which shows quite clearly that the pg_control file is later than it should be.

But shouldn't pg_control be largely irrelevant in a hot backup scenario? Most
(all?) of the information contained therein should be overwritten with the
contents of the checkpoint referenced by the backup label, shouldn't it?

best regards,
Florian Pflug

From:	Florian Pflug <fgp(at)phlo(dot)org>
To:	Daniel Farina <daniel(at)heroku(dot)com>
Cc:	Chris Redekop <chris(at)replicon(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Backup with rsync fails at pg_clog if under load
Date:	2011-10-24 06:13:52
Message-ID:	213AEFBF-479C-4F9E-B0CA-24AAB2C9F4DD@phlo.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Oct23, 2011, at 22:48 , Daniel Farina wrote:
> It doesn't seem meaningful for StartupCLOG (or, indeed, any of the
> hot-standby path functionality) to be called before that code is
> executed, but it is anyway right now.

I think the idea is to check that the CLOG part which recovery *won't*
overwrite is consistent (or rather, given the simplicity of the check,
at least accessible)

Heikki said the following somewhere else in this thread when I suggested
something similar to your proposal:

>> There are pretty clear rules on what state clog can be in. When you launch postmaster in a standby:
>>
>> * Any clog preceding the nextXid from the checkpoint record we start recovery from, must either be valid, or the clog file must be missing altogether (which can happen when it was vacuumed away while the backup in progress - if the clog is still needed at the end of backup it must not be missing, of course).
>> * Any clog following nextXid can be garbled or missing.
>>
>> Recovery will overwrite any clog after nextXid from the WAL, but not the clog before it.

I think Simon's theory that we're starting recovery from the wrong place,
i.e. should start with an earlier WAL location, is probably correct. The
question is, why?

best regards,
Florian Pflug

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Florian Pflug <fgp(at)phlo(dot)org>
Cc:	Daniel Farina <daniel(at)heroku(dot)com>, Chris Redekop <chris(at)replicon(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Backup with rsync fails at pg_clog if under load
Date:	2011-10-25 07:03:31
Message-ID:	CA+U5nMJksETN=sJEmu2FqhWNv2Uv63e_9T5OrZEGmviNnWEC5Q@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Oct 24, 2011 at 7:13 AM, Florian Pflug <fgp(at)phlo(dot)org> wrote:

> I think Simon's theory that we're starting recovery from the wrong place,
> i.e. should start with an earlier WAL location, is probably correct. The
> question is, why?

Err, that's not what I said and I don't mean that. Having said that,
what I said about pg_control being invalid would imply that, so is
wrong also.

We are starting recovery at the right place but we are initialising
the clog and subtrans incorrectly. Precisely, the oldestActiveXid is
being derived later than it should be, which can cause problems if
this then means that whole pages are unitialised in subtrans. The bug
only shows up if you do enough transactions (2048 is always enough) to
move to the next subtrans page between the redo pointer and the
checkpoint record while at the same time we do not have a long running
transaction that spans those two points. That's just enough to happen
reasonably frequently on busy systems and yet just enough to have
slipped through testing.

We must either

1. During CreateCheckpoint() we should derive oldestActiveXid before
we derive the redo location

2. Change the way subtrans pages are initialized during recovery so we
don't rely on oldestActiveXid

I need to think some more before a decision on this in my own mind,
but I lean towards doing (1) as a longer term fix and doing (2) as a
short term fix for existing releases. I expect to have a fix later
today.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Florian Pflug <fgp(at)phlo(dot)org>
Cc:	Daniel Farina <daniel(at)heroku(dot)com>, Chris Redekop <chris(at)replicon(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Backup with rsync fails at pg_clog if under load
Date:	2011-10-25 09:13:14
Message-ID:	CA+U5nMJfnaay7p_rRzrvWLTyO-aKNEh414czr3jyXT8vhHiJVw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Oct 25, 2011 at 8:03 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:

> We are starting recovery at the right place but we are initialising
> the clog and subtrans incorrectly. Precisely, the oldestActiveXid is
> being derived later than it should be, which can cause problems if
> this then means that whole pages are unitialised in subtrans. The bug
> only shows up if you do enough transactions (2048 is always enough) to
> move to the next subtrans page between the redo pointer and the
> checkpoint record while at the same time we do not have a long running
> transaction that spans those two points. That's just enough to happen
> reasonably frequently on busy systems and yet just enough to have
> slipped through testing.
>
> We must either
>
> 1. During CreateCheckpoint() we should derive oldestActiveXid before
> we derive the redo location
>
> 2. Change the way subtrans pages are initialized during recovery so we
> don't rely on oldestActiveXid
>
> I need to think some more before a decision on this in my own mind,
> but I lean towards doing (1) as a longer term fix and doing (2) as a
> short term fix for existing releases. I expect to have a fix later
> today.

(1) looks the best way forwards in all cases.

Patch attached. Will be backpatched to 9.0

I think it is possible to avoid taking XidGenLock during
GetRunningTransactions() now, but I haven't included that change in
this patch.

Any other comments before commit?

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment	Content-Type	Size
oldestActiveXid_fixed.v1.patch	application/octet-stream	4.9 KB

From:	Florian Pflug <fgp(at)phlo(dot)org>
To:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc:	Daniel Farina <daniel(at)heroku(dot)com>, Chris Redekop <chris(at)replicon(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Backup with rsync fails at pg_clog if under load
Date:	2011-10-25 11:39:39
Message-ID:	1A94CE1A-C3E5-4923-B677-623D41BC273F@phlo.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Oct25, 2011, at 11:13 , Simon Riggs wrote:
> On Tue, Oct 25, 2011 at 8:03 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>> We are starting recovery at the right place but we are initialising
>> the clog and subtrans incorrectly. Precisely, the oldestActiveXid is
>> being derived later than it should be, which can cause problems if
>> this then means that whole pages are unitialised in subtrans. The bug
>> only shows up if you do enough transactions (2048 is always enough) to
>> move to the next subtrans page between the redo pointer and the
>> checkpoint record while at the same time we do not have a long running
>> transaction that spans those two points. That's just enough to happen
>> reasonably frequently on busy systems and yet just enough to have
>> slipped through testing.
>>
>> We must either
>>
>> 1. During CreateCheckpoint() we should derive oldestActiveXid before
>> we derive the redo location

> (1) looks the best way forwards in all cases.

Let me see if I understand this

The probem seems to be that we currently derive oldestActiveXid end the end of
the checkpoint, just before writing the checkpoint record. Since we use
oldestActiveXid to initialize SUBTRANS, this is wrong. Records written before
that checkpoint record (but after the REDO location, of course) may very well
contain XIDs earlier than that wrongly derived oldestActiveXID, and if attempt
to touch these XID's SUBTRANS state, we error out.

Your patch seems sensible, because the checkpoint "logically" occurs at the
REDO location not the checkpoint's location, so we ought to log an oldestActiveXID
corresponding to that location.

What I don't understand is how this affects the CLOG. How does oldestActiveXID
factor into CLOG initialization?

best regards,
Florian Pflug

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Florian Pflug <fgp(at)phlo(dot)org>
Cc:	Daniel Farina <daniel(at)heroku(dot)com>, Chris Redekop <chris(at)replicon(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Backup with rsync fails at pg_clog if under load
Date:	2011-10-25 12:51:51
Message-ID:	CA+U5nMKUUoA8kRG=itfsO5nZuE3x_KDJz78EaUN3_FKmq-uMJA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Oct 25, 2011 at 12:39 PM, Florian Pflug <fgp(at)phlo(dot)org> wrote:

> What I don't understand is how this affects the CLOG. How does oldestActiveXID
> factor into CLOG initialization?

It is an entirely different error.

Chris' clog error was caused by a file read error. The file was
opened, we did a seek within the file and then the call to read()
failed to return a complete page from the file.

The xid shown is 22811359, which is the nextxid in the control file.

So StartupClog() must have failed trying to read the clog page from disk.

That isn't a Hot Standby problem, a recovery problem nor is it certain
its a PostgreSQL problem.

OTOH SlruPhysicalReadPage() does cope gracefully with missing clog
files during recovery, so maybe we can think of a way to make recovery
cope with a SLRU_READ_FAILED error gracefully also. Any ideas?

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Chris Redekop <chris(at)replicon(dot)com>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	Florian Pflug <fgp(at)phlo(dot)org>, Daniel Farina <daniel(at)heroku(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Backup with rsync fails at pg_clog if under load
Date:	2011-10-25 21:06:14
Message-ID:	CAC2SuRLssGbdMBVwCwoQTJ9jt8L9oEvVPai++dR3d6EuTznw7g@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> Chris, can you rearrange the backup so you copy the pg_control file as
> the first act after the pg_start_backup?

I tried this and it doesn't seem to make any difference. I also tried the
patch and I can no longer reproduce the subtrans error, however instead it
now it starts up, but never gets to the point where it'll accept
connections. It starts up but if I try to do anything I always get "FATAL:
the database system is starting up"...even if the load is removed from the
primary, the standby still never finishes "starting up". Attached below is
a log of one of these startup attempts. In my testing with the patch
applied approx 3 in 10 attempts start up successfully, 7 in 10 attempts go
into the "db is starting up" state....the pg_clog error is still there, but
seems much harder to reproduce now....I've seen it only once since applying
the patch (out of probably 50 or 60 under-load startup attempts). It does
seem to be "moody" like that tho....it will be very difficult to reproduce
for a while, and then it will happen damn-near every time for a
while...weirdness

On a bit of a side note, I've been thinking of changing my scripts so that
they perform an initial rsync prior to doing the
startbackup-rsync-stopbackup just so that the second rsync will be
faster....so that the backup is in progress for a shorter period of time, as
while it is running it will stop other standbys from starting up....this
shouldn't cause any issues eh?

2011-10-25 13:43:24.035 MDT [15072]: [1-1] LOG: database system was
interrupted; last known up at 2011-10-25 13:43:11 MDT
2011-10-25 13:43:24.035 MDT [15072]: [2-1] LOG: creating missing WAL
directory "pg_xlog/archive_status"
2011-10-25 13:43:24.037 MDT [15072]: [3-1] LOG: entering standby mode
DEBUG: received replication command: IDENTIFY_SYSTEM
DEBUG: received replication command: START_REPLICATION 2/CF000000
2011-10-25 13:43:24.041 MDT [15073]: [1-1] LOG: streaming replication
successfully connected to primary
2011-10-25 13:43:24.177 MDT [15092]: [1-1] FATAL: the database system is
starting up
2011-10-25 13:43:24.781 MDT [15072]: [4-1] DEBUG: checkpoint record is at
2/CF81A478
2011-10-25 13:43:24.781 MDT [15072]: [5-1] DEBUG: redo record is at
2/CF000020; shutdown FALSE
2011-10-25 13:43:24.781 MDT [15072]: [6-1] DEBUG: next transaction ID:
0/4634700; next OID: 1188228
2011-10-25 13:43:24.781 MDT [15072]: [7-1] DEBUG: next MultiXactId: 839;
next MultiXactOffset: 1686
2011-10-25 13:43:24.781 MDT [15072]: [8-1] DEBUG: oldest unfrozen
transaction ID: 1669, in database 1
2011-10-25 13:43:24.781 MDT [15072]: [9-1] DEBUG: transaction ID wrap limit
is 2147485316, limited by database with OID 1
2011-10-25 13:43:24.783 MDT [15072]: [10-1] DEBUG: resetting unlogged
relations: cleanup 1 init 0
2011-10-25 13:43:24.791 MDT [15072]: [11-1] DEBUG: initializing for hot
standby
2011-10-25 13:43:24.791 MDT [15072]: [12-1] LOG: consistent recovery state
reached at 2/CF81A4D0
2011-10-25 13:43:24.791 MDT [15072]: [13-1] LOG: redo starts at 2/CF000020
2011-10-25 13:43:25.019 MDT [15072]: [14-1] LOG: consistent state delayed
because recovery snapshot incomplete
2011-10-25 13:43:25.019 MDT [15072]: [15-1] CONTEXT: xlog redo running
xacts:
nextXid 4634700 latestCompletedXid 4634698 oldestRunningXid 4634336; 130
xacts:
4634336 4634337 4634338 4634339 4634340 4634341 4634342 4634343 4634344
4634345
4634346 4634347 4634348 4634349 4634350 4634351 4634352 4634353 4634354
4634355
4634356 4634357 4634358 4634359 4634360 4634361 4634362 4634363 4634364
4634365
4634366 4634367 4634368 4634369 4634370 4634371 4634515 4634516 4634517
4634518
4634519 4634520 4634521 4634522 4634523 4634524 4634525 4634526 4634527
4634528
4634529 4634530 4634531 4634532 4634533 4634534 4634535 4634536 4634537
4634538
4634539 4634540 4634541 4634542 4634543 4634385 4634386 4634387 4634388
4634389
4634390 4634391 4634392 4634393 4634394 4634395 4634396 4634397 4634398
4634399
4634400 4634401 4634402 4634403 4634404 4634405 4634406 4634407 4634408
4634409
4634410 4634411 4634412 4634413 4634414 4634415 4634416 4634417 4634418
4634419
4634420 4634579 4634580 4634581 4634582 4634583 4634584 4634585 4634586
4634587
4634588 4634589 4634590 4634591 4634592 4634593 4634594 4634595 4634596
4634597
4634598 4634599 4634600 4634601 4634602 4634603 4634604 4634605 4634606
4634607;
subxid ovf
2011-10-25 13:43:25.240 MDT [15130]: [1-1] FATAL: the database system is
starting up
DEBUG: standby "sync_rep_test" has now caught up with primary
2011-10-25 13:43:26.304 MDT [15167]: [1-1] FATAL: the database system is
starting up
2011-10-25 13:43:27.366 MDT [15204]: [1-1] FATAL: the database system is
starting up
2011-10-25 13:43:28.426 MDT [15241]: [1-1] FATAL: the database system is
starting up
2011-10-25 13:43:29.461 MDT [15275]: [1-1] FATAL: the database system is
starting up
and so on...

On Tue, Oct 25, 2011 at 6:51 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:

> On Tue, Oct 25, 2011 at 12:39 PM, Florian Pflug <fgp(at)phlo(dot)org> wrote:
>
> > What I don't understand is how this affects the CLOG. How does
> oldestActiveXID
> > factor into CLOG initialization?
>
> It is an entirely different error.
>
> Chris' clog error was caused by a file read error. The file was
> opened, we did a seek within the file and then the call to read()
> failed to return a complete page from the file.
>
> The xid shown is 22811359, which is the nextxid in the control file.
>
> So StartupClog() must have failed trying to read the clog page from disk.
>
> That isn't a Hot Standby problem, a recovery problem nor is it certain
> its a PostgreSQL problem.
>
> OTOH SlruPhysicalReadPage() does cope gracefully with missing clog
> files during recovery, so maybe we can think of a way to make recovery
> cope with a SLRU_READ_FAILED error gracefully also. Any ideas?
>
> --
> Simon Riggs http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Training & Services
>

From:	Chris Redekop <chris(at)replicon(dot)com>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	Florian Pflug <fgp(at)phlo(dot)org>, Daniel Farina <daniel(at)heroku(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Backup with rsync fails at pg_clog if under load
Date:	2011-10-26 01:24:50
Message-ID:	CAC2SuRK4U59RE3VQDdwtEZS96bZr=T_5V7XLG1G++xwY3qVFLQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

>
> That isn't a Hot Standby problem, a recovery problem nor is it certain
> its a PostgreSQL problem.
>
Do you have any theories on this that I could help investigate? It happens
even when using pg_basebackup and it persists until another sync is
performed, so the files must be in some state that that it can't recover
from....without understanding the internals just viewing from an
outside perspective, I don't really see how this could not be a PostgreSQL
problem....

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Chris Redekop <chris(at)replicon(dot)com>
Cc:	Florian Pflug <fgp(at)phlo(dot)org>, Daniel Farina <daniel(at)heroku(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Backup with rsync fails at pg_clog if under load
Date:	2011-10-26 08:45:32
Message-ID:	CA+U5nMK+Ek4Ef9Kjit9KuaxY=Z2=Oojy=yRDGQXLTQnNA15rvQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Oct 25, 2011 at 10:06 PM, Chris Redekop <chris(at)replicon(dot)com> wrote:
>> Chris, can you rearrange the backup so you copy the pg_control file as
>> the first act after the pg_start_backup?
> I tried this and it doesn't seem to make any difference.

It won't, that was a poor initial diagnosis on my part.

> I also tried the
> patch and I can no longer reproduce the subtrans error

Good

>, however instead it
> now it starts up, but never gets to the point where it'll accept
> connections. It starts up but if I try to do anything I always get "FATAL:
> the database system is starting up"...even if the load is removed from the
> primary, the standby still never finishes "starting up".
...
> 2011-10-25 13:43:25.019 MDT [15072]: [14-1] LOG: consistent state delayed
> because recovery snapshot incomplete
...

This is a different problem and has already been reported by one of
your colleagues in a separate thread, and answered in detail by me
there. There is no bug related to this error message.

From here, it looks like the published fixes the originally reported problem.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Florian Pflug <fgp(at)phlo(dot)org>
To:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc:	Daniel Farina <daniel(at)heroku(dot)com>, Chris Redekop <chris(at)replicon(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Backup with rsync fails at pg_clog if under load
Date:	2011-10-26 11:16:51
Message-ID:	6C3D7EDA-573E-46DC-9047-5FEB92876DA8@phlo.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Oct25, 2011, at 14:51 , Simon Riggs wrote:
> On Tue, Oct 25, 2011 at 12:39 PM, Florian Pflug <fgp(at)phlo(dot)org> wrote:
>
>> What I don't understand is how this affects the CLOG. How does oldestActiveXID
>> factor into CLOG initialization?
>
> It is an entirely different error.

Ah, OK. I assumed that you believe the wrong oldestActiveXID computation
solved both the SUBTRANS-related *and* the CLOG-related errors, since you
said "We are starting recovery at the right place but we are initialising
the clog and subtrans incorrectly" at the start of the mail.

> Chris' clog error was caused by a file read error. The file was
> opened, we did a seek within the file and then the call to read()
> failed to return a complete page from the file.
>
> The xid shown is 22811359, which is the nextxid in the control file.
>
> So StartupClog() must have failed trying to read the clog page from disk.

Yep.

> That isn't a Hot Standby problem, a recovery problem nor is it certain
> its a PostgreSQL problem.

It's very likely that it's a PostgreSQL problem, though. It's probably
not a pilot error since it happens even for backups taken with pg_basebackup(),
so the only explanation other than a PostgreSQL bug is broken hardware or
a pretty serious kernel/filesystem bug.

> OTOH SlruPhysicalReadPage() does cope gracefully with missing clog
> files during recovery, so maybe we can think of a way to make recovery
> cope with a SLRU_READ_FAILED error gracefully also. Any ideas?

As long as we don't understand how the CLOG-related errors happen in
the first place, I think it's a bad idea to silence them.

best regards,
Florian Pflug

From:	Florian Pflug <fgp(at)phlo(dot)org>
To:	Florian Pflug <fgp(at)phlo(dot)org>
Cc:	Simon Riggs <simon(at)2ndQuadrant(dot)com>, Daniel Farina <daniel(at)heroku(dot)com>, Chris Redekop <chris(at)replicon(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Backup with rsync fails at pg_clog if under load
Date:	2011-10-26 11:26:54
Message-ID:	E89DA5E7-6905-4800-9C7D-11B48FC42E02@phlo.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Oct25, 2011, at 13:39 , Florian Pflug wrote:
> On Oct25, 2011, at 11:13 , Simon Riggs wrote:
>> On Tue, Oct 25, 2011 at 8:03 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>>> We are starting recovery at the right place but we are initialising
>>> the clog and subtrans incorrectly. Precisely, the oldestActiveXid is
>>> being derived later than it should be, which can cause problems if
>>> this then means that whole pages are unitialised in subtrans. The bug
>>> only shows up if you do enough transactions (2048 is always enough) to
>>> move to the next subtrans page between the redo pointer and the
>>> checkpoint record while at the same time we do not have a long running
>>> transaction that spans those two points. That's just enough to happen
>>> reasonably frequently on busy systems and yet just enough to have
>>> slipped through testing.
>>>
>>> We must either
>>>
>>> 1. During CreateCheckpoint() we should derive oldestActiveXid before
>>> we derive the redo location
>
>> (1) looks the best way forwards in all cases.
>
> Let me see if I understand this
>
> The probem seems to be that we currently derive oldestActiveXid end the end of
> the checkpoint, just before writing the checkpoint record. Since we use
> oldestActiveXid to initialize SUBTRANS, this is wrong. Records written before
> that checkpoint record (but after the REDO location, of course) may very well
> contain XIDs earlier than that wrongly derived oldestActiveXID, and if attempt
> to touch these XID's SUBTRANS state, we error out.
>
> Your patch seems sensible, because the checkpoint "logically" occurs at the
> REDO location not the checkpoint's location, so we ought to log an oldestActiveXID
> corresponding to that location.

Thinking about this some more (and tracing through the code), I realized that
things are a bit more complicated.

What we actually need to ensure, I think, is that the XID we pass to StartupSUBTRANS()
is earlier than any top-level XID in XLOG_XACT_ASSIGNMENT records. Which, at first
glance, implies that we ought to use the nextId at the *beginning* of the checkpoint
for SUBTRANS initialization. At second glace, however, that'd be wrong, because
backends emit XLOG_XACT_ASSIGNMENT only every PGPROC_MAX_CACHED_SUBXIDS sub-xid
assignment. Thus, an XLOG_XACT_ASSIGNMENT written *after* the checkpoint has started
may contain sub-XIDs which were assigned *before* the checkpoint has started.

Using oldestActiveXID works around that because we guarantee that sub-XIDs are always
larger than their parent XIDs and because only active transactions can produce
XLOG_XACT_ASSIGNMENT records.

So your patch is fine, but I think the reasoning about why oldestActiveXID is
the correct value for StartupSUBTRANS deserves an explanation somewhere.

best regards,
Florian Pflug

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Florian Pflug <fgp(at)phlo(dot)org>
Cc:	Daniel Farina <daniel(at)heroku(dot)com>, Chris Redekop <chris(at)replicon(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Backup with rsync fails at pg_clog if under load
Date:	2011-10-26 11:43:07
Message-ID:	CA+U5nMJLDZ7VP1O3Jdde+p+9-HxuEi8qG+v5+LuoZGSiOvhkxA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Oct 26, 2011 at 12:16 PM, Florian Pflug <fgp(at)phlo(dot)org> wrote:

>> Chris' clog error was caused by a file read error. The file was
>> opened, we did a seek within the file and then the call to read()
>> failed to return a complete page from the file.
>>
>> The xid shown is 22811359, which is the nextxid in the control file.
>>
>> So StartupClog() must have failed trying to read the clog page from disk.
>
> Yep.
>
>> That isn't a Hot Standby problem, a recovery problem nor is it certain
>> its a PostgreSQL problem.
>
> It's very likely that it's a PostgreSQL problem, though. It's probably
> not a pilot error since it happens even for backups taken with pg_basebackup(),
> so the only explanation other than a PostgreSQL bug is broken hardware or
> a pretty serious kernel/filesystem bug.

The way forwards here is for someone to show the clog file that causes
the error and find out why the call to read() fails.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Aidan Van Dyk <aidan(at)highrise(dot)ca>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	Florian Pflug <fgp(at)phlo(dot)org>, Daniel Farina <daniel(at)heroku(dot)com>, Chris Redekop <chris(at)replicon(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Backup with rsync fails at pg_clog if under load
Date:	2011-10-26 11:54:16
Message-ID:	CAC_2qU_eCUYMY6O5CgAK0pYugd=moApirdAt8G_G2sYKs0DfOQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Oct 26, 2011 at 7:43 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:

>> It's very likely that it's a PostgreSQL problem, though. It's probably
>> not a pilot error since it happens even for backups taken with pg_basebackup(),
>> so the only explanation other than a PostgreSQL bug is broken hardware or
>> a pretty serious kernel/filesystem bug.
>
> The way forwards here is for someone to show the clog file that causes
> the error and find out why the call to read() fails.

Sorry, I thought the problem was obvious. Either that, of I've
completely missed something in these threads... I'll admit to not
following this one very closely anymore...

When the backup started, the clog was small. So on the "recovering
instance", the clog is small. PostgreSQL is supposed to be able to
deal with any file as it was when the backup starts.

When the backup is stopped, clog is big. But that file was copied
after the backup was started, not after the backup finished. So its
size is only guarenteed to be as big as it was when the backup
started. Recovery is responsible for extending it as it was extended
during the backup period on the master.

The read fails because their is no data at the location it's trying to
read from, because clog hasn't been extended yet by recovery.

a.
--
Aidan Van Dyk Create like a god,
aidan(at)highrise(dot)ca command like a king,
http://www.highrise.ca/ work like a slave.

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Aidan Van Dyk <aidan(at)highrise(dot)ca>
Cc:	Florian Pflug <fgp(at)phlo(dot)org>, Daniel Farina <daniel(at)heroku(dot)com>, Chris Redekop <chris(at)replicon(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Backup with rsync fails at pg_clog if under load
Date:	2011-10-26 13:12:27
Message-ID:	CA+U5nML+fJgz-NdZrGX+R3m6JKnT1JC=v0iXkkSAsojnsqqPoQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Oct 26, 2011 at 12:54 PM, Aidan Van Dyk <aidan(at)highrise(dot)ca> wrote:

> The read fails because their is no data at the location it's trying to
> read from, because clog hasn't been extended yet by recovery.

You don't actually know that, though I agree it seems a reasonable
guess and was my first thought also.

The error is very specifically referring to 22811359, which is the
nextxid from pg_control and updated by checkpoint.

22811359 is mid-way through a clog page, so prior xids will already
have been allocated, pages extended and then those pages fsyncd before
the end of pg_start_backup(). So it shouldn't be possible for that
page to be absent from the base backup, unless the base backup was
taken without a preceding checkpoint, which seems is not the case from
the script output.

Note that if you are correct, then the solution is to extend clog,
which Florian disagrees with as a solution.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Florian Pflug <fgp(at)phlo(dot)org>
To:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc:	Aidan Van Dyk <aidan(at)highrise(dot)ca>, Daniel Farina <daniel(at)heroku(dot)com>, Chris Redekop <chris(at)replicon(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Backup with rsync fails at pg_clog if under load
Date:	2011-10-26 13:57:12
Message-ID:	FD57812A-6A9B-44F3-8CDF-3CD2F8D9918E@phlo.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Oct26, 2011, at 15:12 , Simon Riggs wrote:
> On Wed, Oct 26, 2011 at 12:54 PM, Aidan Van Dyk <aidan(at)highrise(dot)ca> wrote:
>
>> The read fails because their is no data at the location it's trying to
>> read from, because clog hasn't been extended yet by recovery.
>
> You don't actually know that, though I agree it seems a reasonable
> guess and was my first thought also.

The actual error message also supports that theory. Here's the relevant
snippet from the OP's log (Found in CA9FD2FE(dot)1D8D2%linas(dot)virbalas(at)continuent(dot)com)

2011-09-21 13:41:05 CEST FATAL: could not access status of transaction 1188673
2011-09-21 13:41:05 CEST DETAIL: Could not read from file "pg_clog/0001" at offset 32768: Success.

Note that it says "Success" at the end of the second log entry. That
can only happen, I think, if we're trying to read the page adjacent to
the last page in the file. The seek would be successfull, and the subsequent
read() would indicate EOF by returning zero bytes. None of the calls would
set errno. If there was a real IO error, read() would set errno, and if the
page wasn't adjacent to the last page in the file, seek() would set errno.
In both cases we'd see the corresponding error messag, not "Success".

> The error is very specifically referring to 22811359, which is the
> nextxid from pg_control and updated by checkpoint.

Where does that XID come from? The reference to that XID in the archives
that I can find is in your message
CA+U5nMKUUoA8kRG=itfsO5nZuE3x_KDJz78EaUN3_FKmq-uMJA(at)mail(dot)gmail(dot)com

> 22811359 is mid-way through a clog page, so prior xids will already
> have been allocated, pages extended and then those pages fsyncd before
> the end of pg_start_backup(). So it shouldn't be possible for that
> page to be absent from the base backup, unless the base backup was
> taken without a preceding checkpoint, which seems is not the case from
> the script output.

Or unless the nextId we store in the checkpoint is for some reason higher
than it should be. Or unless nextId somehow gets mangled during recovery.
Or unless there's some interaction between VACUUM and CHECKPOINTS that
we're overlooking...

> Note that if you are correct, then the solution is to extend clog,
> which Florian disagrees with as a solution.

That's not what I said. As you said, the CLOG page corresponding to nextId
*should* always be accessible at the start of recovery (Unless whole file
has been removed by VACUUM, that is). So we shouldn't need to extends CLOG.
Yet the error suggest that the CLOG is, in fact, too short. What I said
is that we shouldn't apply any fix (for the CLOG problem) before we understand
the reason for that apparent contradiction.

Doing it nevertheless to get rid of this seems dangerous. What happens, for
example, to the CLOG state of transactions earlier than the checkpoint's
nextId? There COMMIT record may very well lie before the checkpoint's REDO
pointer, so the CLOG we copied better contained their correct state. Yet if
it does, then why isn't the nextId's CLOG page accessible?

best regards,
Florian Pflug

From:	Aidan Van Dyk <aidan(at)highrise(dot)ca>
To:	Florian Pflug <fgp(at)phlo(dot)org>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, Daniel Farina <daniel(at)heroku(dot)com>, Chris Redekop <chris(at)replicon(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Backup with rsync fails at pg_clog if under load
Date:	2011-10-26 14:18:06
Message-ID:	CAC_2qU_kyy3yFKJ+twu0oMf+LKGh1f0nDn4ySTs6DyG3GK9vBg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Oct 26, 2011 at 9:57 AM, Florian Pflug <fgp(at)phlo(dot)org> wrote:
> On Oct26, 2011, at 15:12 , Simon Riggs wrote:
>> On Wed, Oct 26, 2011 at 12:54 PM, Aidan Van Dyk <aidan(at)highrise(dot)ca> wrote:
>>
>>> The read fails because their is no data at the location it's trying to
>>> read from, because clog hasn't been extended yet by recovery.
>>
>> You don't actually know that, though I agree it seems a reasonable
>> guess and was my first thought also.
>
> The actual error message also supports that theory. Here's the relevant
> snippet from the OP's log (Found in CA9FD2FE(dot)1D8D2%linas(dot)virbalas(at)continuent(dot)com)
>
> 2011-09-21 13:41:05 CEST FATAL: could not access status of transaction 1188673
> 2011-09-21 13:41:05 CEST DETAIL: Could not read from file "pg_clog/0001" at offset 32768: Success.
>
> Note that it says "Success" at the end of the second log entry. That
> can only happen, I think, if we're trying to read the page adjacent to
> the last page in the file. The seek would be successfull, and the subsequent
> read() would indicate EOF by returning zero bytes. None of the calls would
> set errno. If there was a real IO error, read() would set errno, and if the
> page wasn't adjacent to the last page in the file, seek() would set errno.
> In both cases we'd see the corresponding error messag, not "Success".

And even more pointedly, in the original go around on this:
http://article.gmane.org/gmane.comp.db.postgresql.devel.general/174056

He reported that clog/0000 after pg_start_backup call:
-rw------- 1 postgres postgres 8192 Sep 23 14:31 0000

Changed during the rsync phase to this:
-rw------- 1 postgres postgres 16384 Sep 23 14:33 0000

But on the slave, of course, it was copied before it was extend so it
was the original size (that's ok, that's the point of recovery after
the backup):
-rw------- 1 postgres postgres 8192 Sep 23 14:31 0000

With the error:
2011-09-23 14:33:46 CEST FATAL: could not access status of transaction 37206
2011-09-23 14:33:46 CEST DETAIL: Could not read from file
"pg_clog/0000" at offset 8192: Success.

And that error happens *before* recovery even can get attempted.

And that if he copied the "recent" clog/0000 from the master, it did start up.

And I think they also reported that if they didn't run hot standby,
but just normal recovery into a new master, it didn't have the problem
either, i.e. without hotstandby, recovery ran, properly extended the
clog, and then ran as a new master fine.

--
Aidan Van Dyk Create like a god,
aidan(at)highrise(dot)ca command like a king,
http://www.highrise.ca/ work like a slave.

From:	Florian Pflug <fgp(at)phlo(dot)org>
To:	Florian Pflug <fgp(at)phlo(dot)org>
Cc:	Simon Riggs <simon(at)2ndQuadrant(dot)com>, Aidan Van Dyk <aidan(at)highrise(dot)ca>, Daniel Farina <daniel(at)heroku(dot)com>, Chris Redekop <chris(at)replicon(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Backup with rsync fails at pg_clog if under load
Date:	2011-10-26 14:47:10
Message-ID:	12E895D5-C6D0-4E71-B9C3-85E1BC14E6B8@phlo.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Oct26, 2011, at 15:57 , Florian Pflug wrote:
> As you said, the CLOG page corresponding to nextId
> *should* always be accessible at the start of recovery (Unless whole file
> has been removed by VACUUM, that is). So we shouldn't need to extends CLOG.
> Yet the error suggest that the CLOG is, in fact, too short. What I said
> is that we shouldn't apply any fix (for the CLOG problem) before we understand
> the reason for that apparent contradiction.

Ha! I think I've got a working theory.

In CreateCheckPoint(), we determine the nextId that'll go into the checkpoint
record, and then call CheckPointGuts() which does the actual writing and fsyncing.
So far, that fine. If a transaction ID is assigned before we compute the
checkpoint's nextXid, we'll extend the CLOG accordingly, and CheckPointGuts() will
make sure the new CLOG page goes to disk.

But, if wal_level = hot_standby, we also call LogStandbySnapshot() in
CreateCheckPoint(), and we do that *after* CheckPointGuts(). Which would be
fine too, except that LogStandbySnapshot() re-assigned the *current* value of
ShmemVariableCache->nextXid to the checkpoint's nextXid field.

Thus, if the CLOG is extended after (or in the middle of) CheckPointGuts(), but
before LogStandbySnapshot(), then we end up with a nextXid in the checkpoint
whose CLOG page hasn't necessarily made it to the disk yet. The longer CheckPointGuts()
takes to finish it's work the more likely it becomes (assuming that CLOG writing
and syncing doesn't happen at the very end). This fits the OP's observation ob the
problem vanishing when pg_start_backup() does an immediate checkpoint.

I dunno how to this fix, though, since I don't really understand why
LogStandbySnapshot() needs to modify the checkpoint's nextXid.Simon, is there some
documentation on what assumptions the hot standby code makes about the various XID
fields included in a checkpoint?

best regards,
Florian Pflug

From:	Chris Redekop <chris(at)replicon(dot)com>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	Florian Pflug <fgp(at)phlo(dot)org>, Daniel Farina <daniel(at)heroku(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Backup with rsync fails at pg_clog if under load
Date:	2011-10-26 14:51:30
Message-ID:	CAC2SuRKPq8rkhXXNa3=6oyUBD-nsMeNJFUY0nAMRz3D-+ReWiA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

>
> The way forwards here is for someone to show the clog file that causes
> the error and find out why the call to read() fails.

Well let me hook ya up :) Here's the log, controldata, and erroring clog
attached

2011-10-26 08:44:15.419 MDT [1544]: [1-1] LOG: database system was
interrupted; last known up at 2011-10-26 08:43:54 MDT
2011-10-26 08:44:15.419 MDT [1544]: [2-1] LOG: creating missing WAL
directory "pg_xlog/archive_status"
2011-10-26 08:44:15.421 MDT [1544]: [3-1] LOG: entering standby mode
DEBUG: received replication command: IDENTIFY_SYSTEM
DEBUG: received replication command: START_REPLICATION 6/CE000000
2011-10-26 08:44:15.425 MDT [1545]: [1-1] LOG: streaming replication
successfully connected to primary
2011-10-26 08:44:16.246 MDT [1547]: [1-1] FATAL: the database system is
starting up
2011-10-26 08:44:17.501 MDT [1544]: [4-1] DEBUG: checkpoint record is at
6/D0BC36A0
2011-10-26 08:44:17.501 MDT [1544]: [5-1] DEBUG: redo record is at
6/CE0010E0; shutdown FALSE
2011-10-26 08:44:17.501 MDT [1544]: [6-1] DEBUG: next transaction ID:
0/10846655; next OID: 2826628
2011-10-26 08:44:17.501 MDT [1544]: [7-1] DEBUG: next MultiXactId: 1741;
next MultiXactOffset: 3498
2011-10-26 08:44:17.501 MDT [1544]: [8-1] DEBUG: oldest unfrozen
transaction ID: 1669, in database 1
2011-10-26 08:44:17.501 MDT [1544]: [9-1] DEBUG: transaction ID wrap limit
is 2147485316, limited by database with OID 1
2011-10-26 08:44:17.742 MDT [1544]: [10-1] DEBUG: resetting unlogged
relations: cleanup 1 init 0
2011-10-26 08:44:17.751 MDT [1544]: [11-1] DEBUG: initializing for hot
standby
2011-10-26 08:44:17.751 MDT [1544]: [12-1] FATAL: could not access status
of transaction 10846655
2011-10-26 08:44:17.751 MDT [1544]: [13-1] DETAIL: Could not read from file
"pg_clog/000A" at offset 90112: Success.
2011-10-26 08:44:17.751 MDT [1537]: [2-1] LOG: startup process (PID 1544)
exited with exit code 1
2011-10-26 08:44:17.751 MDT [1537]: [3-1] LOG: aborting startup due to
startup process failure
2011-10-26 08:44:17.843 MDT [1543]: [1-1] DEBUG: logger shutting down

pg_control version number: 903
Catalog version number: 201105231
Database system identifier: 5667259861501982685
Database cluster state: in production
pg_control last modified: Wed 26 Oct 2011 08:43:54 AM MDT
Latest checkpoint location: 6/D0BC36A0
Prior checkpoint location: 6/877B9950
Latest checkpoint's REDO location: 6/CE0010E0
Latest checkpoint's TimeLineID: 1
Latest checkpoint's NextXID: 0/10846655
Latest checkpoint's NextOID: 2826628
Latest checkpoint's NextMultiXactId: 1741
Latest checkpoint's NextMultiOffset: 3498
Latest checkpoint's oldestXID: 1669
Latest checkpoint's oldestXID's DB: 1
Latest checkpoint's oldestActiveXID: 10845042
Time of latest checkpoint: Wed 26 Oct 2011 08:43:44 AM MDT
Minimum recovery ending location: 0/0
Backup start location: 0/0
Current wal_level setting: hot_standby
Current max_connections setting: 600
Current max_prepared_xacts setting: 0
Current max_locks_per_xact setting: 256
Maximum data alignment: 8
Database block size: 8192
Blocks per segment of large relation: 131072
WAL block size: 8192
Bytes per WAL segment: 16777216
Maximum length of identifiers: 64
Maximum columns in an index: 32
Maximum size of a TOAST chunk: 1996
Date/time type storage: 64-bit integers
Float4 argument passing: by value
Float8 argument passing: by value

Attachment	Content-Type	Size
000A	application/octet-stream	88.0 KB

From:	Chris Redekop <chris(at)replicon(dot)com>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	Florian Pflug <fgp(at)phlo(dot)org>, Daniel Farina <daniel(at)heroku(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Backup with rsync fails at pg_clog if under load
Date:	2011-10-26 15:36:56
Message-ID:	CAC2SuRL8hjcSpWdyVAQpStJ6bw6FyCGUPKkesmCgMX-a_JHmTQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> And I think they also reported that if they didn't run hot standby,
> but just normal recovery into a new master, it didn't have the problem
> either, i.e. without hotstandby, recovery ran, properly extended the
> clog, and then ran as a new master fine.

Yes this is correct...attempting to start as hotstandby will produce the
pg_clog error repeatedly and then without changing anything else, just
turning hot standby off it will start up successfully.

> This fits the OP's observation ob the
> problem vanishing when pg_start_backup() does an immediate checkpoint.

Note that this is *not* the behaviour I'm seeing....it's possible it happens
more frequently without the immediate checkpoint, but I am seeing it happen
even with the immediate checkpoint.

> This is a different problem and has already been reported by one of
> your colleagues in a separate thread, and answered in detail by me
> there. There is no bug related to this error message.

Excellent...I will continue this discussion in that thread.

From:	Florian Pflug <fgp(at)phlo(dot)org>
To:	Chris Redekop <chris(at)replicon(dot)com>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, Daniel Farina <daniel(at)heroku(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Backup with rsync fails at pg_clog if under load
Date:	2011-10-26 15:59:59
Message-ID:	01A2E400-AC30-48D5-8DD7-9F789D736CE7@phlo.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Oct26, 2011, at 17:36 , Chris Redekop wrote:
> > And I think they also reported that if they didn't run hot standby,
> > but just normal recovery into a new master, it didn't have the problem
> > either, i.e. without hotstandby, recovery ran, properly extended the
> > clog, and then ran as a new master fine.
>
> Yes this is correct...attempting to start as hotstandby will produce the
> pg_clog error repeatedly and then without changing anything else, just
> turning hot standby off it will start up successfully.

Yup, because with hot standby disabled (on the client side), StartupCLOG()
happens after recovery has completed. That, at the very least, makes the
problem very unlikely to occur in the non-hot-standby case. I'm not sure
it's completely impossible, though.

Per my theory about the cause of the problem in my other mail, I think you
might see StartupCLOG failures even during crash recovery, provided that
wal_level was set to hot_standby when the primary crashed. Here's how

1) We start a checkpoint, and get as far as LogStandbySnapshot()
2) A backend does AssignTransactionId, and gets as far as GetTransactionoId().
The assigned XID requires CLOG extension.
3) The checkpoint continues, and LogStandbySnapshot () advances the
checkpoint's nextXid to the XID assigned in (2).
4) We crash after writing the checkpoint record, but before the CLOG
extension makes it to the disk, and before any trace of the XID assigned
in (2) makes it to the xlog.

Then StartupCLOG() would fail at the end of recovery, because we'd end up
with a nextXid whose corresponding CLOG page doesn't exist.

> > This fits the OP's observation ob the
> > problem vanishing when pg_start_backup() does an immediate checkpoint.
>
> Note that this is *not* the behaviour I'm seeing....it's possible it happens
> more frequently without the immediate checkpoint, but I am seeing it happen
> even with the immediate checkpoint.

Yeah, I should have said "of the problem's likelihood decreasing" instead
of "vanishing". The point is, the longer the checkpoint takes, the higher
the chance the nextId is advanced far enough to require a CLOG extension.

That alone isn't enough to trigger the error - the CLOG extension must also
*not* make it to the disk before the checkpoint completes - but it's
a required precondition for the error to occur.

best regards,
Florian Pflug

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Florian Pflug <fgp(at)phlo(dot)org>
Cc:	Aidan Van Dyk <aidan(at)highrise(dot)ca>, Daniel Farina <daniel(at)heroku(dot)com>, Chris Redekop <chris(at)replicon(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Backup with rsync fails at pg_clog if under load
Date:	2011-10-26 16:08:45
Message-ID:	CA+U5nMLyW=S1DTBhswOirf_EZX06epvtyUZN364BGPJBDH_tiA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Oct 26, 2011 at 3:47 PM, Florian Pflug <fgp(at)phlo(dot)org> wrote:
> On Oct26, 2011, at 15:57 , Florian Pflug wrote:
>> As you said, the CLOG page corresponding to nextId
>> *should* always be accessible at the start of recovery (Unless whole file
>> has been removed by VACUUM, that is). So we shouldn't need to extends CLOG.
>> Yet the error suggest that the CLOG is, in fact, too short. What I said
>> is that we shouldn't apply any fix (for the CLOG problem) before we understand
>> the reason for that apparent contradiction.
>
> Ha! I think I've got a working theory.
>
> In CreateCheckPoint(), we determine the nextId that'll go into the checkpoint
> record, and then call CheckPointGuts() which does the actual writing and fsyncing.
> So far, that fine. If a transaction ID is assigned before we compute the
> checkpoint's nextXid, we'll extend the CLOG accordingly, and CheckPointGuts() will
> make sure the new CLOG page goes to disk.
>
> But, if wal_level = hot_standby, we also call LogStandbySnapshot() in
> CreateCheckPoint(), and we do that *after* CheckPointGuts(). Which would be
> fine too, except that LogStandbySnapshot() re-assigned the *current* value of
> ShmemVariableCache->nextXid to the checkpoint's nextXid field.
>
> Thus, if the CLOG is extended after (or in the middle of) CheckPointGuts(), but
> before LogStandbySnapshot(), then we end up with a nextXid in the checkpoint
> whose CLOG page hasn't necessarily made it to the disk yet. The longer CheckPointGuts()
> takes to finish it's work the more likely it becomes (assuming that CLOG writing
> and syncing doesn't happen at the very end). This fits the OP's observation ob the
> problem vanishing when pg_start_backup() does an immediate checkpoint.

This is the correct explanation. I've just come back into Wifi range,
so I was just writing to you with this explanation but your original
point that nextxid must be wrong deserves credit. OTOH I was just
waiting to find out what the reason for the physical read was, rather
than guessing.

Notice that the nextxid value isn't wrong, its just not the correct
value to use for starting clog.

As it turns out the correct fix is actually just to skip StartupClog()
until the end of recovery because it does nothing useful when executed
at that time. When I wrote the original code I remember thinking that
StartupClog() is superfluous at that point.

Brewing a patch now.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Florian Pflug <fgp(at)phlo(dot)org>
Cc:	Aidan Van Dyk <aidan(at)highrise(dot)ca>, Daniel Farina <daniel(at)heroku(dot)com>, Chris Redekop <chris(at)replicon(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Backup with rsync fails at pg_clog if under load
Date:	2011-10-26 16:16:07
Message-ID:	CA+U5nMKgdgSdao_iNt4VK2MMSZpA=rQ5OUkXctB3SR-9=c_cSw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Oct 26, 2011 at 5:08 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:

> Brewing a patch now.

Latest thinking... confirmations or other error reports please.

This fixes both the subtrans and clog bugs in one patch.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment	Content-Type	Size
oldestActiveXid_fixed.v2.patch	application/octet-stream	6.2 KB

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Florian Pflug <fgp(at)phlo(dot)org>
Cc:	Aidan Van Dyk <aidan(at)highrise(dot)ca>, Daniel Farina <daniel(at)heroku(dot)com>, Chris Redekop <chris(at)replicon(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Backup with rsync fails at pg_clog if under load
Date:	2011-10-26 17:09:34
Message-ID:	CA+U5nMLAMuWyk4gRnKrP6ChyTecmJgcRSG2rzu9MBsh_BCZrKg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Oct 26, 2011 at 5:16 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> On Wed, Oct 26, 2011 at 5:08 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>
>> Brewing a patch now.
>
> Latest thinking... confirmations or other error reports please.
>
> This fixes both the subtrans and clog bugs in one patch.

I'll be looking to commit that tomorrow afternoon as two separate
patches with appropriate credits.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Chris Redekop <chris(at)replicon(dot)com>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	Florian Pflug <fgp(at)phlo(dot)org>, Aidan Van Dyk <aidan(at)highrise(dot)ca>, Daniel Farina <daniel(at)heroku(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Backup with rsync fails at pg_clog if under load
Date:	2011-10-26 19:27:23
Message-ID:	CAC2SuRKfzZ1RhMrYceZ-UxeHcJ=eSLQnX83f=EiS6JbfuKswtw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

FYI I have given this patch a good test and can now no longer reproduce
either the subtrans nor the clog error. Thanks guys!

On Wed, Oct 26, 2011 at 11:09 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:

> On Wed, Oct 26, 2011 at 5:16 PM, Simon Riggs <simon(at)2ndquadrant(dot)com>
> wrote:
> > On Wed, Oct 26, 2011 at 5:08 PM, Simon Riggs <simon(at)2ndquadrant(dot)com>
> wrote:
> >
> >> Brewing a patch now.
> >
> > Latest thinking... confirmations or other error reports please.
> >
> > This fixes both the subtrans and clog bugs in one patch.
>
> I'll be looking to commit that tomorrow afternoon as two separate
> patches with appropriate credits.
>
> --
> Simon Riggs http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Training & Services
>

From:	Florian Pflug <fgp(at)phlo(dot)org>
To:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc:	Aidan Van Dyk <aidan(at)highrise(dot)ca>, Daniel Farina <daniel(at)heroku(dot)com>, Chris Redekop <chris(at)replicon(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Backup with rsync fails at pg_clog if under load
Date:	2011-10-26 23:29:49
Message-ID:	5EDA27FA-963F-413D-9849-874072F317E5@phlo.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Oct26, 2011, at 18:08 , Simon Riggs wrote:
> On Wed, Oct 26, 2011 at 3:47 PM, Florian Pflug <fgp(at)phlo(dot)org> wrote:
>> On Oct26, 2011, at 15:57 , Florian Pflug wrote:
>> Thus, if the CLOG is extended after (or in the middle of) CheckPointGuts(), but
>> before LogStandbySnapshot(), then we end up with a nextXid in the checkpoint
>> whose CLOG page hasn't necessarily made it to the disk yet. The longer CheckPointGuts()
>> takes to finish it's work the more likely it becomes (assuming that CLOG writing
>> and syncing doesn't happen at the very end). This fits the OP's observation ob the
>> problem vanishing when pg_start_backup() does an immediate checkpoint.
>
> As it turns out the correct fix is actually just to skip StartupClog()
> until the end of recovery because it does nothing useful when executed
> at that time. When I wrote the original code I remember thinking that
> StartupClog() is superfluous at that point.

Hm, that fixes the problem in the hot standby case, but as I said in my
reply to Chris Redekop, normal crash recovery is also at risk (although
the probability of hitting the bug is much smaller there). Here's my
reasoning from that other mail:

Then StartupCLOG() would fail at the end of recovery, because we'd end up
with a nextXid whose corresponding CLOG page doesn't exist.

Quite aside from that concern, I think it's probably not a good idea
for the nextXid value of a checkpoint to depend on whether wal_level
was set to hot_standby or not. Our recovery code is already quite complex
and hard to test, and this just adds one more combination that has to
be thought about while coding and that needs to be tested.

My suggestion is to fix the CLOG problem in that same way that you fixed
the SUBTRANS problem, i.e. by moving LogStandbySnapshot() to before
CheckPointGuts().

Here's what I image CreateCheckPoint() should look like:

1) LogStandbySnapshot() and fill out oldestActiveXid
2) Fill out REDO
3) Wait for concurrent commits
4) Fill out nextXid and the other fields
5) CheckPointGuts()
6) Rest

It's then no longer necessary for LogStandbySnapshot() do modify
the nextXid, since we fill out nextXid after LogStandbySnapshot() and
will thus derive a higher value than LogStandbySnapshot() would have.

We could then also fold GetOldestActiveTransactionId() back into
your proposed LogStandbySnapshot() and thus don't need two ProcArray
traversals.

best regards,
Florian Pflug

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	Florian Pflug <fgp(at)phlo(dot)org>, Aidan Van Dyk <aidan(at)highrise(dot)ca>, Daniel Farina <daniel(at)heroku(dot)com>, Chris Redekop <chris(at)replicon(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Backup with rsync fails at pg_clog if under load
Date:	2011-10-27 01:32:31
Message-ID:	CA+TgmoaJw9TV8CTP0t9R62=JFUb5ye811Wm4sK1M=3LOb+PYsA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Oct 26, 2011 at 12:16 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> On Wed, Oct 26, 2011 at 5:08 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>
>> Brewing a patch now.
>
> Latest thinking... confirmations or other error reports please.
>
> This fixes both the subtrans and clog bugs in one patch.

I don't see the point of changing StartupCLOG() to be an empty
function and adding a new function TrimCLOG() that does everything
StartupCLOG() used to do. Seems simpler to just move the calls to
StartupCLOG() wherever they need to be - i.e. remove the one that
happens before WAL replay, and extricate the one at end-of-recovery
from the if block which currently contains it.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, Florian Pflug <fgp(at)phlo(dot)org>, Aidan Van Dyk <aidan(at)highrise(dot)ca>, Daniel Farina <daniel(at)heroku(dot)com>, Chris Redekop <chris(at)replicon(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Backup with rsync fails at pg_clog if under load
Date:	2011-10-27 03:36:19
Message-ID:	18333.1319686579@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Wed, Oct 26, 2011 at 12:16 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>> This fixes both the subtrans and clog bugs in one patch.

> I don't see the point of changing StartupCLOG() to be an empty
> function and adding a new function TrimCLOG() that does everything
> StartupCLOG() used to do.

+1 ... I found that overly cute also.

regards, tom lane

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Florian Pflug <fgp(at)phlo(dot)org>
Cc:	Simon Riggs <simon(at)2ndQuadrant(dot)com>, Aidan Van Dyk <aidan(at)highrise(dot)ca>, Daniel Farina <daniel(at)heroku(dot)com>, Chris Redekop <chris(at)replicon(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Backup with rsync fails at pg_clog if under load
Date:	2011-10-27 06:57:41
Message-ID:	4EA900E5.9070905@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 27.10.2011 02:29, Florian Pflug wrote:
> Per my theory about the cause of the problem in my other mail, I think you
> might see StartupCLOG failures even during crash recovery, provided that
> wal_level was set to hot_standby when the primary crashed. Here's how
>
> 1) We start a checkpoint, and get as far as LogStandbySnapshot()
> 2) A backend does AssignTransactionId, and gets as far as GetTransactionoId().
> The assigned XID requires CLOG extension.
> 3) The checkpoint continues, and LogStandbySnapshot () advances the
> checkpoint's nextXid to the XID assigned in (2).
> 4) We crash after writing the checkpoint record, but before the CLOG
> extension makes it to the disk, and before any trace of the XID assigned
> in (2) makes it to the xlog.
>
> Then StartupCLOG() would fail at the end of recovery, because we'd end up
> with a nextXid whose corresponding CLOG page doesn't exist.

No, clog extension is WAL-logged while holding the XidGenLock. At step
3, LogStandbySnapshot() would block until the clog-extension record is
written to WAL, so crash recovery would see and replay that record
before calling StartupCLOG().

That can happen during hot standby, though, because StartupCLOG() is
called earlier.

> My suggestion is to fix the CLOG problem in that same way that you fixed
> the SUBTRANS problem, i.e. by moving LogStandbySnapshot() to before
> CheckPointGuts().
>
> Here's what I image CreateCheckPoint() should look like:
>
> 1) LogStandbySnapshot() and fill out oldestActiveXid
> 2) Fill out REDO
> 3) Wait for concurrent commits
> 4) Fill out nextXid and the other fields
> 5) CheckPointGuts()
> 6) Rest
>
> It's then no longer necessary for LogStandbySnapshot() do modify
> the nextXid, since we fill out nextXid after LogStandbySnapshot() and
> will thus derive a higher value than LogStandbySnapshot() would have.

Hmm, I don't think that fully fixes the problem. Even if you're certain
that CheckPointGuts() has fsync'd the clog page to disk, VACUUM might
decide to truncate it away again while the checkpoint is running.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Florian Pflug <fgp(at)phlo(dot)org>
Cc:	Simon Riggs <simon(at)2ndQuadrant(dot)com>, Aidan Van Dyk <aidan(at)highrise(dot)ca>, Daniel Farina <daniel(at)heroku(dot)com>, Chris Redekop <chris(at)replicon(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Backup with rsync fails at pg_clog if under load
Date:	2011-10-27 08:18:06
Message-ID:	4EA913BE.2080102@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 27.10.2011 09:57, Heikki Linnakangas wrote:
>> My suggestion is to fix the CLOG problem in that same way that you fixed
>> the SUBTRANS problem, i.e. by moving LogStandbySnapshot() to before
>> CheckPointGuts().
>>
>> Here's what I image CreateCheckPoint() should look like:
>>
>> 1) LogStandbySnapshot() and fill out oldestActiveXid
>> 2) Fill out REDO
>> 3) Wait for concurrent commits
>> 4) Fill out nextXid and the other fields
>> 5) CheckPointGuts()
>> 6) Rest
>>
>> It's then no longer necessary for LogStandbySnapshot() do modify
>> the nextXid, since we fill out nextXid after LogStandbySnapshot() and
>> will thus derive a higher value than LogStandbySnapshot() would have.
>
> Hmm, I don't think that fully fixes the problem. Even if you're certain
> that CheckPointGuts() has fsync'd the clog page to disk, VACUUM might
> decide to truncate it away again while the checkpoint is running.

Oh, scratch that. During recovery, we merrily treat missing slru files
as they were filled with zeros.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Florian Pflug <fgp(at)phlo(dot)org>, Aidan Van Dyk <aidan(at)highrise(dot)ca>, Daniel Farina <daniel(at)heroku(dot)com>, Chris Redekop <chris(at)replicon(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Backup with rsync fails at pg_clog if under load
Date:	2011-10-27 09:37:48
Message-ID:	CA+U5nMKx-N5Qmn6FwpvSoLhTqDp70U+tWJ59=t=--KUH3uhP7g@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Oct 27, 2011 at 4:36 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Wed, Oct 26, 2011 at 12:16 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>>> This fixes both the subtrans and clog bugs in one patch.
>
>> I don't see the point of changing StartupCLOG() to be an empty
>> function and adding a new function TrimCLOG() that does everything
>> StartupCLOG() used to do.
>
> +1 ... I found that overly cute also.

It would have been even easier to move StartupCLOG() later, but then
we'd need a big comment explaining why CLOG starts up at one point and
subtrans starts up at another point, since that is very confusing way
of doing things. I wrote it that way first and it definitely looks
strange.

It's much easier to understand that StartupCLOG() is actually a no-op
and that we need to trim the clog at the end of recovery in all cases.

The patch isn't meant to be cute, just a better of way of expressing
what needs to be done, so I think the patch should stay that way.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Florian Pflug <fgp(at)phlo(dot)org>, Aidan Van Dyk <aidan(at)highrise(dot)ca>, Daniel Farina <daniel(at)heroku(dot)com>, Chris Redekop <chris(at)replicon(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Backup with rsync fails at pg_clog if under load
Date:	2011-10-27 11:36:37
Message-ID:	CA+TgmoYAF1wKOdpoAtz29bou-Sw3rheQepM=rDi4G3P=P3Lk5Q@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Oct 27, 2011 at 5:37 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> On Thu, Oct 27, 2011 at 4:36 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>>> On Wed, Oct 26, 2011 at 12:16 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>>>> This fixes both the subtrans and clog bugs in one patch.
>>
>>> I don't see the point of changing StartupCLOG() to be an empty
>>> function and adding a new function TrimCLOG() that does everything
>>> StartupCLOG() used to do.
>>
>> +1 ... I found that overly cute also.
>
> It would have been even easier to move StartupCLOG() later, but then
> we'd need a big comment explaining why CLOG starts up at one point and
> subtrans starts up at another point, since that is very confusing way
> of doing things. I wrote it that way first and it definitely looks
> strange.
>
> It's much easier to understand that StartupCLOG() is actually a no-op
> and that we need to trim the clog at the end of recovery in all cases.

If it's a no-op, why have it at all? I know we have a bunch of places
in the code where we have empty stubs where there used to be
initialization or cleanup code, but I've never found that particularly
good style. If something no longer requires initialization in a
certain place, I think we should nuke the whole function.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Florian Pflug <fgp(at)phlo(dot)org>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	Simon Riggs <simon(at)2ndQuadrant(dot)com>, Aidan Van Dyk <aidan(at)highrise(dot)ca>, Daniel Farina <daniel(at)heroku(dot)com>, Chris Redekop <chris(at)replicon(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Backup with rsync fails at pg_clog if under load
Date:	2011-10-27 12:03:23
Message-ID:	AAE7BBFB-641C-482C-A8D2-0D61E2DF3B09@phlo.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Oct27, 2011, at 08:57 , Heikki Linnakangas wrote:
> On 27.10.2011 02:29, Florian Pflug wrote:
>> Per my theory about the cause of the problem in my other mail, I think you
>> might see StartupCLOG failures even during crash recovery, provided that
>> wal_level was set to hot_standby when the primary crashed. Here's how
>>
>> 1) We start a checkpoint, and get as far as LogStandbySnapshot()
>> 2) A backend does AssignTransactionId, and gets as far as GetTransactionoId().
>> The assigned XID requires CLOG extension.
>> 3) The checkpoint continues, and LogStandbySnapshot () advances the
>> checkpoint's nextXid to the XID assigned in (2).
>> 4) We crash after writing the checkpoint record, but before the CLOG
>> extension makes it to the disk, and before any trace of the XID assigned
>> in (2) makes it to the xlog.
>>
>> Then StartupCLOG() would fail at the end of recovery, because we'd end up
>> with a nextXid whose corresponding CLOG page doesn't exist.
>
> No, clog extension is WAL-logged while holding the XidGenLock. At step 3,
> LogStandbySnapshot() would block until the clog-extension record is written
> to WAL, so crash recovery would see and replay that record before calling
> StartupCLOG().

Hm, true. But it still seems wrong for LogStandbySnapshot() to modify the
checkpoint's nextXid, and even more wrong to do that only if wal_mode =
hot_standby. Plus, I think it's a smart idea to verify that the required
parts of the CLOG are available at the start of recovery. Because if they're
missing, the data on the standby *will* be corrupted. Is there any argument
against doiing LogStandbySnapshot() earlier (i.e., at the time oldestActiveXid
is computed)?

best regards,
Florian Pflug

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Florian Pflug <fgp(at)phlo(dot)org>, Aidan Van Dyk <aidan(at)highrise(dot)ca>, Daniel Farina <daniel(at)heroku(dot)com>, Chris Redekop <chris(at)replicon(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Backup with rsync fails at pg_clog if under load
Date:	2011-10-27 13:36:54
Message-ID:	CA+U5nMLBnRU_KJCrMME5eRG930E=k65yFbj6R+1LXMtZ7nippw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Oct 27, 2011 at 12:36 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Thu, Oct 27, 2011 at 5:37 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>> On Thu, Oct 27, 2011 at 4:36 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>>>> On Wed, Oct 26, 2011 at 12:16 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>>>>> This fixes both the subtrans and clog bugs in one patch.
>>>
>>>> I don't see the point of changing StartupCLOG() to be an empty
>>>> function and adding a new function TrimCLOG() that does everything
>>>> StartupCLOG() used to do.
>>>
>>> +1 ... I found that overly cute also.
>>
>> It would have been even easier to move StartupCLOG() later, but then
>> we'd need a big comment explaining why CLOG starts up at one point and
>> subtrans starts up at another point, since that is very confusing way
>> of doing things. I wrote it that way first and it definitely looks
>> strange.
>>
>> It's much easier to understand that StartupCLOG() is actually a no-op
>> and that we need to trim the clog at the end of recovery in all cases.
>
> If it's a no-op, why have it at all? I know we have a bunch of places
> in the code where we have empty stubs where there used to be
> initialization or cleanup code, but I've never found that particularly
> good style. If something no longer requires initialization in a
> certain place, I think we should nuke the whole function.

It is a no-op for exactly the same reason other similar functions are
no-ops - it used to do something but now does not.

Anyone seeing StartupSubtrans and StartupMultiXact but no StartupClog
will immediately ask "why?".
IMHO it's easier to have an obviously named function than a comment -
its less invasive for a backpatch as well.

I'm following current code style. If you wish to change that, feel
free to change this and all other locations that do this. Until then,
doing this makes most sense and follows current coding style.

If I had done it the way you suggest, I don't doubt someone would say
in about 6 months "Which idiot removed StartupClog()?".

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Florian Pflug <fgp(at)phlo(dot)org>
Cc:	Aidan Van Dyk <aidan(at)highrise(dot)ca>, Daniel Farina <daniel(at)heroku(dot)com>, Chris Redekop <chris(at)replicon(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Backup with rsync fails at pg_clog if under load
Date:	2011-10-27 13:51:00
Message-ID:	CA+U5nM+7nXEceaFQ9rpKEP45TV0ZzdEfuqdnkaWfBcB4bu_=cA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Oct 27, 2011 at 12:29 AM, Florian Pflug <fgp(at)phlo(dot)org> wrote:

> Per my theory about the cause of the problem in my other mail, I think you
> might see StartupCLOG failures even during crash recovery, provided that
> wal_level was set to hot_standby when the primary crashed. Here's how
>
> 1) We start a checkpoint, and get as far as LogStandbySnapshot()
> 2) A backend does AssignTransactionId, and gets as far as GetTransactionoId().
> The assigned XID requires CLOG extension.
> 3) The checkpoint continues, and LogStandbySnapshot () advances the
> checkpoint's nextXid to the XID assigned in (2).
> 4) We crash after writing the checkpoint record, but before the CLOG
> extension makes it to the disk, and before any trace of the XID assigned
> in (2) makes it to the xlog.
>
> Then StartupCLOG() would fail at the end of recovery, because we'd end up
> with a nextXid whose corresponding CLOG page doesn't exist.

Clog extension holds XidGenLock, as does LogStandbySnapshot, which
specifically excludes the above scenario.

> Quite aside from that concern, I think it's probably not a good idea
> for the nextXid value of a checkpoint to depend on whether wal_level
> was set to hot_standby or not. Our recovery code is already quite complex
> and hard to test, and this just adds one more combination that has to
> be thought about while coding and that needs to be tested.
>
> My suggestion is to fix the CLOG problem in that same way that you fixed
> the SUBTRANS problem, i.e. by moving LogStandbySnapshot() to before
> CheckPointGuts().
>
> Here's what I image CreateCheckPoint() should look like:
>
> 1) LogStandbySnapshot() and fill out oldestActiveXid
> 2) Fill out REDO
> 3) Wait for concurrent commits
> 4) Fill out nextXid and the other fields
> 5) CheckPointGuts()
> 6) Rest
>
> It's then no longer necessary for LogStandbySnapshot() do modify
> the nextXid, since we fill out nextXid after LogStandbySnapshot() and
> will thus derive a higher value than LogStandbySnapshot() would have.
>
> We could then also fold GetOldestActiveTransactionId() back into
> your proposed LogStandbySnapshot() and thus don't need two ProcArray
> traversals.

I think you make a good case for doing this.

However, I'm concerned that moving LogStandbySnapshot() in a backpatch
seems more risky than it's worth. We could easily introduce a new bug
into what we would all agree is a complex piece of code. Minimal
change seems best in this case.

And also, 2 ProcArray traversals is not a problem there.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Florian Pflug <fgp(at)phlo(dot)org>
To:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc:	Aidan Van Dyk <aidan(at)highrise(dot)ca>, Daniel Farina <daniel(at)heroku(dot)com>, Chris Redekop <chris(at)replicon(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Backup with rsync fails at pg_clog if under load
Date:	2011-10-27 14:03:13
Message-ID:	E90F5C5E-C043-4702-B0F5-AFAE9CD1B244@phlo.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Oct27, 2011, at 15:51 , Simon Riggs wrote:
> On Thu, Oct 27, 2011 at 12:29 AM, Florian Pflug <fgp(at)phlo(dot)org> wrote:
>> Here's what I image CreateCheckPoint() should look like:
>>
>> 1) LogStandbySnapshot() and fill out oldestActiveXid
>> 2) Fill out REDO
>> 3) Wait for concurrent commits
>> 4) Fill out nextXid and the other fields
>> 5) CheckPointGuts()
>> 6) Rest
>>
>> It's then no longer necessary for LogStandbySnapshot() do modify
>> the nextXid, since we fill out nextXid after LogStandbySnapshot() and
>> will thus derive a higher value than LogStandbySnapshot() would have.
>>
>> We could then also fold GetOldestActiveTransactionId() back into
>> your proposed LogStandbySnapshot() and thus don't need two ProcArray
>> traversals.
>
> I think you make a good case for doing this.
>
> However, I'm concerned that moving LogStandbySnapshot() in a backpatch
> seems more risky than it's worth. We could easily introduce a new bug
> into what we would all agree is a complex piece of code. Minimal
> change seems best in this case.

OTOH, we currently compute oldestActiveXid within LogStandbySnapshot().
Your proposed patch changes that, which also carries a risk since something
could depend on these values being in sync. Especially since both the logged
snapshot and oldestActiveXid influence the snapshot tracking on the slave.

But since you wrote most of that code, your judgement about the relative
risks of these two approaches obviously out-weights mine.

best regards,
Florian Pflug

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Florian Pflug <fgp(at)phlo(dot)org>, Aidan Van Dyk <aidan(at)highrise(dot)ca>, Daniel Farina <daniel(at)heroku(dot)com>, Chris Redekop <chris(at)replicon(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Backup with rsync fails at pg_clog if under load
Date:	2011-10-27 14:13:13
Message-ID:	27827.1319724793@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
> On Thu, Oct 27, 2011 at 12:36 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> On Thu, Oct 27, 2011 at 5:37 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>>> It's much easier to understand that StartupCLOG() is actually a no-op
>>> and that we need to trim the clog at the end of recovery in all cases.

>> If it's a no-op, why have it at all?

> It is a no-op for exactly the same reason other similar functions are
> no-ops - it used to do something but now does not.

> Anyone seeing StartupSubtrans and StartupMultiXact but no StartupClog
> will immediately ask "why?".

I think it's a good point that StartupCLog doesn't exist in a vacuum
but should be parallel to the init functions for the other SLRU modules.
So at this point I think I agree with Simon's approach. However, the
obvious next question is whether those other modules don't need to be
changed also, and if not why not.

Another issue is that if StartupCLog is left as a no-op, what will
happen if someone mistakenly tries to access clog before the trim
function is called? It would be a good idea to make sure that such
a thing results in an easily-identifiable failure.

regards, tom lane

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Florian Pflug <fgp(at)phlo(dot)org>
Cc:	Aidan Van Dyk <aidan(at)highrise(dot)ca>, Daniel Farina <daniel(at)heroku(dot)com>, Chris Redekop <chris(at)replicon(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Backup with rsync fails at pg_clog if under load
Date:	2011-10-27 14:30:06
Message-ID:	CA+U5nML80NWYAYt0=-ioA1dYSX5++b=JODp9nu0tv4DKDsaM3g@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Oct 27, 2011 at 3:03 PM, Florian Pflug <fgp(at)phlo(dot)org> wrote:

>> I think you make a good case for doing this.
>>
>> However, I'm concerned that moving LogStandbySnapshot() in a backpatch
>> seems more risky than it's worth. We could easily introduce a new bug
>> into what we would all agree is a complex piece of code. Minimal
>> change seems best in this case.
>
> OTOH, we currently compute oldestActiveXid within LogStandbySnapshot().
> Your proposed patch changes that, which also carries a risk since something
> could depend on these values being in sync. Especially since both the logged
> snapshot and oldestActiveXid influence the snapshot tracking on the slave.
>
> But since you wrote most of that code, your judgement about the relative
> risks of these two approaches obviously out-weights mine.

We must move oldestActiveXid since that is the source of a bug. There
is no need to move LogStandbySnapshot(), so I am suggesting we don't
do that for the backpatch. I was going to implement it the way you
suggest in HEAD, since I agree that is a cleaner way.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Florian Pflug <fgp(at)phlo(dot)org>, Aidan Van Dyk <aidan(at)highrise(dot)ca>, Daniel Farina <daniel(at)heroku(dot)com>, Chris Redekop <chris(at)replicon(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Backup with rsync fails at pg_clog if under load
Date:	2011-10-27 15:25:25
Message-ID:	CA+U5nMKgGZu-esbTiOL5che6f5ENZPdZSpScaEEmRx_9J2Y02A@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Oct 27, 2011 at 3:13 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> However, the
> obvious next question is whether those other modules don't need to be
> changed also, and if not why not.

Good point.

StartupSubtrans() is also changed by this patch, since it will be
supplied with an earlier initialisation value.

StartupMultiXact() didn't need changing, I thought, but I will review further.

> Another issue is that if StartupCLog is left as a no-op, what will
> happen if someone mistakenly tries to access clog before the trim
> function is called? It would be a good idea to make sure that such
> a thing results in an easily-identifiable failure.

The old StartupCLOG() didn't do anything that was essential to using
the clog, which is why its a no-op.

You can still use the clog, just with zero startup.

Maybe setting the current page should go in at startup, will think.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Florian Pflug <fgp(at)phlo(dot)org>
To:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc:	Aidan Van Dyk <aidan(at)highrise(dot)ca>, Daniel Farina <daniel(at)heroku(dot)com>, Chris Redekop <chris(at)replicon(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Backup with rsync fails at pg_clog if under load
Date:	2011-10-27 19:23:16
Message-ID:	9B958DED-5A9B-4A1F-AC5C-7D225120ED9D@phlo.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Oct27, 2011, at 16:30 , Simon Riggs wrote:
> On Thu, Oct 27, 2011 at 3:03 PM, Florian Pflug <fgp(at)phlo(dot)org> wrote:
>
>>> I think you make a good case for doing this.
>>>
>>> However, I'm concerned that moving LogStandbySnapshot() in a backpatch
>>> seems more risky than it's worth. We could easily introduce a new bug
>>> into what we would all agree is a complex piece of code. Minimal
>>> change seems best in this case.
>>
>> OTOH, we currently compute oldestActiveXid within LogStandbySnapshot().
>> Your proposed patch changes that, which also carries a risk since something
>> could depend on these values being in sync. Especially since both the logged
>> snapshot and oldestActiveXid influence the snapshot tracking on the slave.
>>
>> But since you wrote most of that code, your judgement about the relative
>> risks of these two approaches obviously out-weights mine.
>
> We must move oldestActiveXid since that is the source of a bug. There
> is no need to move LogStandbySnapshot(), so I am suggesting we don't
> do that for the backpatch. I was going to implement it the way you
> suggest in HEAD, since I agree that is a cleaner way.

Sound good.

best regards,
Florian Pflug

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Florian Pflug <fgp(at)phlo(dot)org>, Aidan Van Dyk <aidan(at)highrise(dot)ca>, Daniel Farina <daniel(at)heroku(dot)com>, Chris Redekop <chris(at)replicon(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Backup with rsync fails at pg_clog if under load
Date:	2011-11-01 15:33:20
Message-ID:	CA+U5nMKWyZ1T8=RzJBnhX4DUsombf0cWD9g71+RuPBE-dqZ4xQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Oct 27, 2011 at 4:25 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:

> StartupMultiXact() didn't need changing, I thought, but I will review further.

Good suggestion.

On review, StartupMultiXact() could also suffer similar error to the
clog failure. This was caused *because* MultiXact is not maintained by
recovery, which I had thought meant it was protected from such
failure.

Revised patch attached.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment	Content-Type	Size
slru_startup.v3.patch	application/octet-stream	3.5 KB

From:	Chris Redekop <chris(at)replicon(dot)com>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Florian Pflug <fgp(at)phlo(dot)org>, Aidan Van Dyk <aidan(at)highrise(dot)ca>, Daniel Farina <daniel(at)heroku(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Backup with rsync fails at pg_clog if under load
Date:	2011-11-02 02:40:27
Message-ID:	CAC2SuRJJ09rWdZdtKnJgoqDPdB__-=Nty=xAc=a4j8W0CgEE+w@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

looks like the v3 patch re-introduces the pg_subtrans issue...

On Tue, Nov 1, 2011 at 9:33 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:

> On Thu, Oct 27, 2011 at 4:25 PM, Simon Riggs <simon(at)2ndquadrant(dot)com>
> wrote:
>
> > StartupMultiXact() didn't need changing, I thought, but I will review
> further.
>
> Good suggestion.
>
> On review, StartupMultiXact() could also suffer similar error to the
> clog failure. This was caused *because* MultiXact is not maintained by
> recovery, which I had thought meant it was protected from such
> failure.
>
> Revised patch attached.
>
> --
> Simon Riggs http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Training & Services
>

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Chris Redekop <chris(at)replicon(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Florian Pflug <fgp(at)phlo(dot)org>, Aidan Van Dyk <aidan(at)highrise(dot)ca>, Daniel Farina <daniel(at)heroku(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Backup with rsync fails at pg_clog if under load
Date:	2011-11-02 07:34:31
Message-ID:	CA+U5nMJhwd+t9GeyiuOUVdvXkMOaiJWzu5G5N507UEp9PnW=gQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Nov 2, 2011 at 2:40 AM, Chris Redekop <chris(at)replicon(dot)com> wrote:

> looks like the v3 patch re-introduces the pg_subtrans issue...

No, I just separated the patches to be clearer about the individual changes.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Chris Redekop <chris(at)replicon(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Florian Pflug <fgp(at)phlo(dot)org>, Aidan Van Dyk <aidan(at)highrise(dot)ca>, Daniel Farina <daniel(at)heroku(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Backup with rsync fails at pg_clog if under load
Date:	2011-11-02 11:56:46
Message-ID:	CA+U5nM+_TT=qcasb3Vi=nriA6H9+EBTaj+rGWsxtR=SvF2aw-Q@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Nov 2, 2011 at 7:34 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> On Wed, Nov 2, 2011 at 2:40 AM, Chris Redekop <chris(at)replicon(dot)com> wrote:
>
>> looks like the v3 patch re-introduces the pg_subtrans issue...
>
> No, I just separated the patches to be clearer about the individual changes.

3 bug fixes committed and back patched.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Chris Redekop <chris(at)replicon(dot)com>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Florian Pflug <fgp(at)phlo(dot)org>, Aidan Van Dyk <aidan(at)highrise(dot)ca>, Daniel Farina <daniel(at)heroku(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Backup with rsync fails at pg_clog if under load
Date:	2011-11-02 15:29:56
Message-ID:	CAC2SuRLuk=gzH06f=U4KwcgAZYC689rUFAWtoKDRtaZJYZX3qw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

okay, sorry I'm a little confused then. Should I be able to apply both the
v2 patch as well as the v3 patch? or is it expected that I'd have to
manually do the merge?

On Wed, Nov 2, 2011 at 1:34 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:

> On Wed, Nov 2, 2011 at 2:40 AM, Chris Redekop <chris(at)replicon(dot)com> wrote:
>
> > looks like the v3 patch re-introduces the pg_subtrans issue...
>
> No, I just separated the patches to be clearer about the individual
> changes.
>
> --
> Simon Riggs http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Training & Services
>

From:	Greg Smith <greg(at)2ndQuadrant(dot)com>
To:	pgsql-hackers(at)postgresql(dot)org, linas(dot)virbalas(at)continuent(dot)com, chris(at)replicon(dot)com
Subject:	Re: Hot Backup with rsync fails at pg_clog if under load
Date:	2011-11-08 21:30:15
Message-ID:	4EB99F67.3090806@2ndQuadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

I was curious if anyone running into these problems has gotten a chance
to test the 3 fixes committed here. It sounded like Linas even had a
repeatable test case?

For easier reference the commits are:

http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=2f55c535e1f026929cf20855b3790d3632062d42

http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=ff8451aa14c8513e429cbef09ddc72e79da366a5

http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=656bba95af3950b26e9e97d86d29787d89e2b423

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us