Quick Links

Re: Hot Backup with rsync fails at pg_clog if under load

Lists:	pgsql-hackers

From:	Linas Virbalas <linas(dot)virbalas(at)continuent(dot)com>
To:	"pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Cc:	"daniel(at)heroku(dot)com" <daniel(at)heroku(dot)com>
Subject:	Hot Backup with rsync fails at pg_clog if under load
Date:	2011-09-21 14:44:30
Message-ID:	CA9FD2FE.1D8D2%linas.virbalas@continuent.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hello,

* Context *

I'm observing problems with provisioning a standby from the master by
following a basic and documented "Making a Base Backup" [1] procedure with
rsync if, in the mean time, heavy load is applied on the master.

After searching the archives, the only more discussed and similar issue I
found hit was by Daniel Farina in a thread "hot backups: am I doing it
wrong, or do we have a problem with pg_clog?" [2], but, it seems, the issue
was discarded because of a non-standard backup procedure Deniel used.
However, I'm observing the same error with a simple procedure, hence this
message.

* Details *

Procedure:

1. Start load generator on the master (WAL archiving enabled).
2. Prepare a Streaming Replication standby (accepting WAL files too):
2.1. pg_switch_xlog() on the master;
2.2. pg_start_backup(Obackup_under_load¹) on the master (this will take a
while as master is loaded up);
2.3. rsync data/global/pg_control to the standby;
2.4. rsync all other data/ (without pg_xlog) to the standby;
2.5. pg_stop_backup() on the master;
2.6. Wait to receive all WAL files, generated during the backup, on the
standby;
2.6. Start the standby PG instance.

The last step will, usually, fail with a similar error:

2011-09-21 13:41:05 CEST LOG: database system was interrupted; last known
up at 2011-09-21 13:40:50 CEST
Restoring 00000014.history
mv: cannot stat `/opt/PostgreSQL/9.1/archive/00000014.history': No such file
or directory
Restoring 00000013.history
2011-09-21 13:41:05 CEST LOG: restored log file "00000013.history" from
archive
2011-09-21 13:41:05 CEST LOG: entering standby mode
Restoring 0000001300000006000000DC
2011-09-21 13:41:05 CEST LOG: restored log file "0000001300000006000000DC"
from archive
Restoring 0000001300000006000000DB
2011-09-21 13:41:05 CEST LOG: restored log file "0000001300000006000000DB"
from archive
2011-09-21 13:41:05 CEST FATAL: could not access status of transaction
1188673
2011-09-21 13:41:05 CEST DETAIL: Could not read from file "pg_clog/0001" at
offset 32768: Success.
2011-09-21 13:41:05 CEST LOG: startup process (PID 13819) exited with exit
code 1
2011-09-21 13:41:05 CEST LOG: aborting startup due to startup process
failure

The procedure works very reliably if there is little or no load on the
master, but fails very often with the pg_clog error when load generator (few
thousands of SELECTs, ~60 INSERTs, ~60 DELETEs and ~60 UPDATES per second)
is started up.

I assumed that a file system backup taken during pg_start_backup and
pg_stop_backup is guaranteed to be consistent and that missing pieces will
be taken from the WAL files, generated & shipped during the backup, but is
it really?

Is this procedure missing some steps? Or maybe this a known issue?

Thank you,
Linas

[1] http://www.postgresql.org/docs/current/static/continuous-archiving.html
[2] http://archives.postgresql.org/pgsql-hackers/2011-04/msg01132.php

From:	Euler Taveira de Oliveira <euler(at)timbira(dot)com>
To:	pgsql-hackers(at)postgresql(dot)org
Cc:	linas(dot)virbalas(at)continuent(dot)com
Subject:	Re: Hot Backup with rsync fails at pg_clog if under load
Date:	2011-09-21 16:22:22
Message-ID:	4E7A0F3E.4050204@timbira.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 21-09-2011 11:44, Linas Virbalas wrote:

[This question doesn't belong to -hackers. Please post it in -general or -admin]

> 2.2. pg_start_backup(Obackup_under_load¹) on the master (this will take a
> while as master is loaded up);
No. if you use pg_start_backup('foo', true) it will be fast. Check the manual.

> 2.3. rsync data/global/pg_control to the standby;
Why are you doing this? If ...

> 2.4. rsync all other data/ (without pg_xlog) to the standby;
you will copy it again or no? Don't understand your point.

> 2.5. pg_stop_backup() on the master;
> 2.6. Wait to receive all WAL files, generated during the backup, on the
> standby;
> 2.6. Start the standby PG instance.
>
> The last step will, usually, fail with a similar error:
>
The problem could be that the minimum recovery point (step 2.3) is different
from the end of rsync if you are under load.

--
Euler Taveira de Oliveira - Timbira http://www.timbira.com.br/
PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento

From:	Florian Pflug <fgp(at)phlo(dot)org>
To:	Linas Virbalas <linas(dot)virbalas(at)continuent(dot)com>
Cc:	"pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, "daniel(at)heroku(dot)com" <daniel(at)heroku(dot)com>
Subject:	Re: Hot Backup with rsync fails at pg_clog if under load
Date:	2011-09-21 16:34:24
Message-ID:	F363F283-B85C-47E0-AB8E-F69572C1738B@phlo.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sep21, 2011, at 16:44 , Linas Virbalas wrote:
> After searching the archives, the only more discussed and similar issue I
> found hit was by Daniel Farina in a thread "hot backups: am I doing it
> wrong, or do we have a problem with pg_clog?" [2], but, it seems, the issue
> was discarded because of a non-standard backup procedure Deniel used.

That's not the way I read that thread. In fact, Robert Haas confirmed that
Daniel's backup procedure was sound in theory. The open question was whether the
error occurred because of a Bug in Daniel's backup code or postgresql's restore
code. The thread then petered out without that question being answered.

> Procedure:
>
> 1. Start load generator on the master (WAL archiving enabled).
> 2. Prepare a Streaming Replication standby (accepting WAL files too):
> 2.1. pg_switch_xlog() on the master;
> 2.2. pg_start_backup(Obackup_under_load¹) on the master (this will take a
> while as master is loaded up);
> 2.3. rsync data/global/pg_control to the standby;
> 2.4. rsync all other data/ (without pg_xlog) to the standby;
> 2.5. pg_stop_backup() on the master;
> 2.6. Wait to receive all WAL files, generated during the backup, on the
> standby;
> 2.6. Start the standby PG instance.

Looks good. (2.1) and (2.3) seem redundant (as Euler already noticed),
but shouldn't cause any errors.

Could you provide us with the exact rsync version and parameters you use?

> The last step will, usually, fail with a similar error:
>
> 2011-09-21 13:41:05 CEST LOG: database system was interrupted; last known
> up at 2011-09-21 13:40:50 CEST
> Restoring 00000014.history
> mv: cannot stat `/opt/PostgreSQL/9.1/archive/00000014.history': No such file
> or directory
> Restoring 00000013.history
> 2011-09-21 13:41:05 CEST LOG: restored log file "00000013.history" from
> archive
> 2011-09-21 13:41:05 CEST LOG: entering standby mode
> Restoring 0000001300000006000000DC
> 2011-09-21 13:41:05 CEST LOG: restored log file "0000001300000006000000DC"
> from archive
> Restoring 0000001300000006000000DB
> 2011-09-21 13:41:05 CEST LOG: restored log file "0000001300000006000000DB"
> from archive
> 2011-09-21 13:41:05 CEST FATAL: could not access status of transaction
> 1188673
> 2011-09-21 13:41:05 CEST DETAIL: Could not read from file "pg_clog/0001" at
> offset 32768: Success.

Whats the size of the file (pg_clog/0001) on both the master and the slave?

best regards,
Florian Pflug

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Euler Taveira de Oliveira <euler(at)timbira(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org, linas(dot)virbalas(at)continuent(dot)com
Subject:	Re: Hot Backup with rsync fails at pg_clog if under load
Date:	2011-09-21 17:23:49
Message-ID:	CA+TgmoYBP-hxL+b3dzFq=9NRLkMMAKAxCn5NXjggask+_Ka5bw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Sep 21, 2011 at 12:22 PM, Euler Taveira de Oliveira
<euler(at)timbira(dot)com> wrote:
> [This question doesn't belong to -hackers. Please post it in -general or
> -admin]

-hackers or -bugs seems appropriate to me; I think this is a bug.

>> 2.2. pg_start_backup(Obackup_under_loadš) on the master (this will take a
>> while as master is loaded up);
>
> No. if you use pg_start_backup('foo', true) it will be fast. Check the
> manual.

If the server is sufficiently heavily loaded that a checkpoint takes a
nontrivial amount of time, the OP is correct that this will be not
fast, regardless of whether you choose to force an immediate
checkpoint.

>> 2.3. rsync data/global/pg_control to the standby;
>
> Why are you doing this? If ...
>
>> 2.4. rsync all other data/ (without pg_xlog) to the standby;
>
> you will copy it again or no? Don't understand your point.

His point is that exercising the bug depends on doing the copying in a
certain order. Any order of copying the data theoretically ought to
be OK, as long as it's all between starting the backup and stopping
the backup, but apparently it isn't.

> The problem could be that the minimum recovery point (step 2.3) is different
> from the end of rsync if you are under load.

It seems pretty clear that some relevant chunk of WAL isn't getting
replayed, but it's not at all clear to me why not. It seems like it
would be useful to compare the LSN returned by pg_start_backup() with
the location at which replay begins when you fire up the clone.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Linas Virbalas <linas(dot)virbalas(at)continuent(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>, Euler Taveira de Oliveira <euler(at)timbira(dot)com>, Florian Pflug <fgp(at)phlo(dot)org>
Cc:	"pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Backup with rsync fails at pg_clog if under load
Date:	2011-09-22 14:24:50
Message-ID:	CAA11FE2.1DDE2%linas.virbalas@continuent.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

>>> 2.2. pg_start_backup(Obackup_under_loadš) on the master (this will take a
>>> while as master is loaded up);
>>
>> No. if you use pg_start_backup('foo', true) it will be fast. Check the
>> manual.
>
> If the server is sufficiently heavily loaded that a checkpoint takes a
> nontrivial amount of time, the OP is correct that this will be not
> fast, regardless of whether you choose to force an immediate
> checkpoint.

In order to check more cases, I have changed the procedure to force an
immediate checkpoint, i.e. pg_start_backup('backup_under_load', true). With
the same load generator running, pg_start_backup returned almost
instantaneously compared to how long it took previously.

Most importantly, after doing this change, I cannot reproduce the pg_clog
error message anymore. In other words, with immediate checkpoint hot backup
succeeds under this load!

>>> 2.3. rsync data/global/pg_control to the standby;
>>
>> Why are you doing this? If ...
>>
>>> 2.4. rsync all other data/ (without pg_xlog) to the standby;
>>
>> you will copy it again or no? Don't understand your point.
>
> His point is that exercising the bug depends on doing the copying in a
> certain order. Any order of copying the data theoretically ought to
> be OK, as long as it's all between starting the backup and stopping
> the backup, but apparently it isn't.

Please note that in the past I was able to reproduce the same pg_clog error
even with taking a singular rsync of the whole data/ folder (i.e. without
splitting it into two steps).

>> The problem could be that the minimum recovery point (step 2.3) is different
>> from the end of rsync if you are under load.

Do you have ideas why does the Hot Backup operation with
pg_start_backup('backup_under_load', true) succeed while
pg_start_backup('backup_under_load') fails under the same load?

Originally, I was using pg_start_backup('backup_under_load') in order not to
clog the master server during the I/O required for the checkpoint. Of
course, now, it seems, this should be sacrificed for the sake of a
successful backup under load.

> It seems pretty clear that some relevant chunk of WAL isn't getting
> replayed, but it's not at all clear to me why not. It seems like it
> would be useful to compare the LSN returned by pg_start_backup() with

If needed, I could do that, if I had the exact procedure... Currently,
during the start of the backup I take the following information:

pg_xlogfile_name(pg_start_backup(...))

> the location at which replay begins when you fire up the clone.

As you have seen in my original message, in the pg_log I get only the
restored WAL file names after starting up the standby. Can I tune the
postgresql.conf to include the location at which replay begins in the log?

> Could you provide us with the exact rsync version and parameters you use?

rsync -azv
version 2.6.8 protocol version 29

--
Sincerely,
Linas Virbalas
http://flyingclusters.blogspot.com/

From:	Euler Taveira de Oliveira <euler(at)timbira(dot)com>
To:	Linas Virbalas <linas(dot)virbalas(at)continuent(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Florian Pflug <fgp(at)phlo(dot)org>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Backup with rsync fails at pg_clog if under load
Date:	2011-09-22 15:59:32
Message-ID:	4E7B5B64.3070206@timbira.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 22-09-2011 11:24, Linas Virbalas wrote:
> In order to check more cases, I have changed the procedure to force an
> immediate checkpoint, i.e. pg_start_backup('backup_under_load', true). With
> the same load generator running, pg_start_backup returned almost
> instantaneously compared to how long it took previously.
>
> Most importantly, after doing this change, I cannot reproduce the pg_clog
> error message anymore. In other words, with immediate checkpoint hot backup
> succeeds under this load!
>
Interesting. I remembered someone reporting this same problem but it was not
reproducible by some of us.

> Do you have ideas why does the Hot Backup operation with
> pg_start_backup('backup_under_load', true) succeed while
> pg_start_backup('backup_under_load') fails under the same load?
>
I don't but if you show us the output of the steps above...

> If needed, I could do that, if I had the exact procedure... Currently,
> during the start of the backup I take the following information:
>
Just show us the output of pg_start_backup and part of the standby log with
the following message 'redo starts at' and the subsequent messages up to the
failure.

--
Euler Taveira de Oliveira - Timbira http://www.timbira.com.br/
PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Euler Taveira de Oliveira <euler(at)timbira(dot)com>
Cc:	Linas Virbalas <linas(dot)virbalas(at)continuent(dot)com>, Florian Pflug <fgp(at)phlo(dot)org>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Backup with rsync fails at pg_clog if under load
Date:	2011-09-22 17:28:46
Message-ID:	CA+TgmoY8hcGnQCkEzUWhtKLpge5jFSZVzTpdFru2Q7CwQ_uFrA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

2011/9/22 Euler Taveira de Oliveira <euler(at)timbira(dot)com>:
> On 22-09-2011 11:24, Linas Virbalas wrote:
>>
>> In order to check more cases, I have changed the procedure to force an
>> immediate checkpoint, i.e. pg_start_backup('backup_under_load', true).
>> With
>> the same load generator running, pg_start_backup returned almost
>> instantaneously compared to how long it took previously.
>>
>> Most importantly, after doing this change, I cannot reproduce the pg_clog
>> error message anymore. In other words, with immediate checkpoint hot
>> backup
>> succeeds under this load!
>>
> Interesting. I remembered someone reporting this same problem but it was not
> reproducible by some of us.

So maybe there's some action that has to happen between the time the
redo pointer is set and the time the checkpoint is WAL-logged to
tickle the bug. Like... CLOG extension, maybe?

*grep grep grep*

OK, so ExtendCLOG() just zeroes the page in memory, writes the WAL
record, and calls it good. All the interesting stuff is done while
holding CLogControlLock. So, at checkpoint time, we'd better make
sure to flush those pages out to disk before writing the checkpoint
record. Otherwise, the redo pointer might advance past the
CLOG-extension record before the corresponding page hits the disk.
That's the job of CheckPointCLOG(), which is called from
CheckPointGuts(), which is called just from CreateCheckPoint() just
after setting the redo pointer. Now, there is some funny business
with the locking here as we're writing the dirty pages
(CheckPointCLOG() calls SimpleLruFlush()). We release and reacquire
the control lock many times. But I don't see how that can cause a
problem, because it's all being done after the redo pointer has
already been said. We could end up having buffers get dirtied again
after they are flushed, but that shouldn't matter either as long as
each buffer is written out at least once. And if the write fails we
throw an error. So I don't see any holes there.

Anybody else have an idea?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Linas Virbalas <linas(dot)virbalas(at)continuent(dot)com>
To:	Euler Taveira de Oliveira <euler(at)timbira(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Florian Pflug <fgp(at)phlo(dot)org>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Backup with rsync fails at pg_clog if under load
Date:	2011-09-23 08:02:19
Message-ID:	CAA217BB.1E2F3%linas.virbalas@continuent.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 9/22/11 6:59 PM, "Euler Taveira de Oliveira" <euler(at)timbira(dot)com> wrote:

>> If needed, I could do that, if I had the exact procedure... Currently,
>> during the start of the backup I take the following information:
>>
> Just show us the output of pg_start_backup and part of the standby log with
> the following message 'redo starts at' and the subsequent messages up to the
> failure.

Unfortunately, it's impossible, because the error message "Could not read
from file "pg_clog/0001" at offset 32768: Success" is shown (and startup
aborted) before the turn for "redo starts at" message arrives.

For comparison purposes, here is the log from the original message:

2011-09-21 13:41:05 CEST LOG: database system was interrupted; last known
up at 2011-09-21 13:40:50 CEST
Restoring 00000014.history
mv: cannot stat `/opt/PostgreSQL/9.1/archive/00000014.history': No such file
or directory
Restoring 00000013.history
2011-09-21 13:41:05 CEST LOG: restored log file "00000013.history" from
archive
! 2011-09-21 13:41:05 CEST LOG: entering standby mode
Restoring 0000001300000006000000DC
2011-09-21 13:41:05 CEST LOG: restored log file "0000001300000006000000DC"
from archive
Restoring 0000001300000006000000DB
2011-09-21 13:41:05 CEST LOG: restored log file "0000001300000006000000DB"
from archive
2011-09-21 13:41:05 CEST FATAL: could not access status of transaction
1188673
2011-09-21 13:41:05 CEST DETAIL: Could not read from file "pg_clog/0001" at
offset 32768: Success.
2011-09-21 13:41:05 CEST LOG: startup process (PID 13819) exited with exit
code 1
2011-09-21 13:41:05 CEST LOG: aborting startup due to startup process
Failure

As you can see, there is no "redo starts at" message. If we compare this to
a healthy standby startup, we can see that the "pg_clog" error appears after
the "entering standby mode", but before the "redo starts at", hence the
latter is not reached. Healthy log example:

2011-09-23 09:52:31 CEST LOG: database system was interrupted; last known
up at 2011-09-23 09:52:15 CEST
Restoring 00000007.history
mv: cannot stat `/opt/PostgreSQL/9.1/archive/00000007.history': No such file
or directory
Restoring 00000006.history
2011-09-23 09:52:31 CEST LOG: restored log file "00000006.history" from
archive
2011-09-23 09:52:31 CEST LOG: entering standby mode
Restoring 000000060000000000000065
2011-09-23 09:52:31 CEST LOG: restored log file "000000060000000000000065"
from archive
2011-09-23 09:52:31 CEST LOG: redo starts at 0/65000020
2011-09-23 09:52:31 CEST LOG: consistent recovery state reached at
0/66000000
2011-09-23 09:52:31 CEST LOG: database system is ready to accept read only
connections
Restoring 000000060000000000000066
mv: cannot stat `/opt/PostgreSQL/9.1/archive/000000060000000000000066': No
such file or directory
Restoring 00000007.history
mv: cannot stat `/opt/PostgreSQL/9.1/archive/00000007.history': No such file
or directory
2011-09-23 09:52:31 CEST LOG: streaming replication successfully connected
to primary

--
Best regards,
Linas Virbalas
http://flyingclusters.blogspot.com/

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Linas Virbalas <linas(dot)virbalas(at)continuent(dot)com>
Cc:	Euler Taveira de Oliveira <euler(at)timbira(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Florian Pflug <fgp(at)phlo(dot)org>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Backup with rsync fails at pg_clog if under load
Date:	2011-09-23 08:41:28
Message-ID:	4E7C4638.1060103@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 23.09.2011 11:02, Linas Virbalas wrote:
> On 9/22/11 6:59 PM, "Euler Taveira de Oliveira"<euler(at)timbira(dot)com> wrote:
>
>>> If needed, I could do that, if I had the exact procedure... Currently,
>>> during the start of the backup I take the following information:
>>>
>> Just show us the output of pg_start_backup and part of the standby log with
>> the following message 'redo starts at' and the subsequent messages up to the
>> failure.
>
> Unfortunately, it's impossible, because the error message "Could not read
> from file "pg_clog/0001" at offset 32768: Success" is shown (and startup
> aborted) before the turn for "redo starts at" message arrives.

It looks to me that pg_clog/0001 exists, but it shorter than recovery
expects. Which shouldn't happen, of course, because the start-backup
checkpoint should flush all the clog that's needed by recovery to disk
before the backup procedure begins to them.

Can you do "ls -l pg_clog" in the master and the backup, and
pg_controldata on the backup dir, and post the results, please?

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	Florian Pflug <fgp(at)phlo(dot)org>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	Linas Virbalas <linas(dot)virbalas(at)continuent(dot)com>, Euler Taveira de Oliveira <euler(at)timbira(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Backup with rsync fails at pg_clog if under load
Date:	2011-09-23 08:48:08
Message-ID:	647BE58F-ED68-497C-8122-237E6327C075@phlo.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sep23, 2011, at 10:41 , Heikki Linnakangas wrote:
> On 23.09.2011 11:02, Linas Virbalas wrote:
>> On 9/22/11 6:59 PM, "Euler Taveira de Oliveira"<euler(at)timbira(dot)com> wrote:
>>
>>>> If needed, I could do that, if I had the exact procedure... Currently,
>>>> during the start of the backup I take the following information:
>>>>
>>> Just show us the output of pg_start_backup and part of the standby log with
>>> the following message 'redo starts at' and the subsequent messages up to the
>>> failure.
>>
>> Unfortunately, it's impossible, because the error message "Could not read
>> from file "pg_clog/0001" at offset 32768: Success" is shown (and startup
>> aborted) before the turn for "redo starts at" message arrives.
>
> It looks to me that pg_clog/0001 exists, but it shorter than recovery expects. Which shouldn't happen, of course, because the start-backup checkpoint should flush all the clog that's needed by recovery to disk before the backup procedure begins to them.

Yeah. What confuses me though is that we fail while trying to read from the clog. When do we do that during normal (non-standby) recovery?

One other possibility would be that the problem is somehow triggered by vacuum running while the start-backup checkpoint is commencing. That'd explain why the problem goes away with immediate checkpoints, because those make the windows mach smaller. But I haven't got a concrete theory of whats happening..

best regards,
Florian Pflug

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Florian Pflug <fgp(at)phlo(dot)org>
Cc:	Linas Virbalas <linas(dot)virbalas(at)continuent(dot)com>, Euler Taveira de Oliveira <euler(at)timbira(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Backup with rsync fails at pg_clog if under load
Date:	2011-09-23 09:05:36
Message-ID:	4E7C4BE0.4060701@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 23.09.2011 11:48, Florian Pflug wrote:
> On Sep23, 2011, at 10:41 , Heikki Linnakangas wrote:
>> On 23.09.2011 11:02, Linas Virbalas wrote:
>>> On 9/22/11 6:59 PM, "Euler Taveira de Oliveira"<euler(at)timbira(dot)com> wrote:
>>>
>>>>> If needed, I could do that, if I had the exact procedure... Currently,
>>>>> during the start of the backup I take the following information:
>>>>>
>>>> Just show us the output of pg_start_backup and part of the standby log with
>>>> the following message 'redo starts at' and the subsequent messages up to the
>>>> failure.
>>>
>>> Unfortunately, it's impossible, because the error message "Could not read
>>> from file "pg_clog/0001" at offset 32768: Success" is shown (and startup
>>> aborted) before the turn for "redo starts at" message arrives.
>>
>> It looks to me that pg_clog/0001 exists, but it shorter than recovery expects. Which shouldn't happen, of course, because the start-backup checkpoint should flush all the clog that's needed by recovery to disk before the backup procedure begins to them.
>
> Yeah. What confuses me though is that we fail while trying to read from the clog. When do we do that during normal (non-standby) recovery?

I believe the OP is setting up a standby. He didn't say if it's a hot
standby, though. That changes the startup sequence a bit.

At the end of recovery, we read the "last" clog page, containing the
next XID that will be assigned to a transaction, and zero out the future
part of that clog file (StartupCLOG). In hot standby, we do that
earlier, after reading the checkpoint record but before we start
replaying records.

> One other possibility would be that the problem is somehow triggered by vacuum running while the start-backup checkpoint is commencing. That'd explain why the problem goes away with immediate checkpoints, because those make the windows mach smaller. But I haven't got a concrete theory of whats happening..

Hmm, that's a good theory. The logic in SimpleLruPhysicakReadPage()
tolerates non-existent files, because the old clog files might've been
truncated away by a vacuum after the backup started. That's OK, because
they will be recreated, and later deleted again, during the WAL replay.
But what if something like this happens:

0. pg_start_backup().
1. rsync starts. It gets a list of all files. It notes that pg_clog/0001
exists.
2. vacuum runs, and deletes pg_clog/0001
3. rsync starts to copy pg_clog/0001. Oops, it's gone. Instead of
leaving it out altogether, it includes it as an empty file.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	Linas Virbalas <linas(dot)virbalas(at)continuent(dot)com>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Florian Pflug <fgp(at)phlo(dot)org>
Cc:	Euler Taveira de Oliveira <euler(at)timbira(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Backup with rsync fails at pg_clog if under load
Date:	2011-09-23 12:47:20
Message-ID:	CAA25A88.1E7B5%linas.virbalas@continuent.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 9/23/11 12:05 PM, "Heikki Linnakangas"
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:

>>> It looks to me that pg_clog/0001 exists, but it shorter than recovery
>>> expects. Which shouldn't happen, of course, because the start-backup
>>> checkpoint should flush all the clog that's needed by recovery to disk
>>> before the backup procedure begins to them.
>>
>> Yeah. What confuses me though is that we fail while trying to read from the
>> clog. When do we do that during normal (non-standby) recovery?
>
> I believe the OP is setting up a standby. He didn't say if it's a hot
> standby, though. That changes the startup sequence a bit.

I confirm that it is a Hot Standby with Streaming Replication.

>> One other possibility would be that the problem is somehow triggered by
>> vacuum running while the start-backup checkpoint is commencing. That'd
>> explain why the problem goes away with immediate checkpoints, because those
>> make the windows mach smaller. But I haven't got a concrete theory of whats
>> happening..
>
> Hmm, that's a good theory. The logic in SimpleLruPhysicakReadPage()
> tolerates non-existent files, because the old clog files might've been
> truncated away by a vacuum after the backup started. That's OK, because
> they will be recreated, and later deleted again, during the WAL replay.
> But what if something like this happens:
>
> 0. pg_start_backup().
> 1. rsync starts. It gets a list of all files. It notes that pg_clog/0001
> exists.
> 2. vacuum runs, and deletes pg_clog/0001
> 3. rsync starts to copy pg_clog/0001. Oops, it's gone. Instead of
> leaving it out altogether, it includes it as an empty file.

I cannot confirm that. I have tried this scenario one more time and was
observing how pg_clog/ looked on the master and standby. Here's an overview:

1. pg_start_backup('base_backup') - waiting...

2. Checking the size of pg_clog periodically on the master:

-bash-3.2$ ls -l pg_clog
total 8
-rw------- 1 postgres postgres 8192 Sep 23 14:31 0000
-bash-3.2$ ls -l pg_clog
total 8
-rw------- 1 postgres postgres 8192 Sep 23 14:31 0000
-bash-3.2$ ls -l pg_clog
total 16

3. Somewhere just before (1) step releases and we get into rsync phase, the
pg_clog size changes:

-rw------- 1 postgres postgres 16384 Sep 23 14:33 0000
-bash-3.2$ ls -l pg_clog
total 16
-rw------- 1 postgres postgres 16384 Sep 23 14:33 0000
-bash-3.2$ ls -l pg_clog
total 16
-rw------- 1 postgres postgres 16384 Sep 23 14:33 0000

4. The rsync does transfer the file:

...
INFO | jvm 1 | 2011/09/23 14:33:43 | 2011-09-23 14:33:43,881 INFO
management.script.ScriptPlugin base/16394/16405_fsm
INFO | jvm 1 | 2011/09/23 14:33:43 | 2011-09-23 14:33:43,881 INFO
management.script.ScriptPlugin base/16394/pg_internal.init
INFO | jvm 1 | 2011/09/23 14:33:43 | 2011-09-23 14:33:43,886 INFO
management.script.ScriptPlugin global/
INFO | jvm 1 | 2011/09/23 14:33:43 | 2011-09-23 14:33:43,886 INFO
management.script.ScriptPlugin global/12691
INFO | jvm 1 | 2011/09/23 14:33:43 | 2011-09-23 14:33:43,886 INFO
management.script.ScriptPlugin global/12696
INFO | jvm 1 | 2011/09/23 14:33:43 | 2011-09-23 14:33:43,886 INFO
management.script.ScriptPlugin global/12697
INFO | jvm 1 | 2011/09/23 14:33:43 | 2011-09-23 14:33:43,886 INFO
management.script.ScriptPlugin global/pg_internal.init
INFO | jvm 1 | 2011/09/23 14:33:43 | 2011-09-23 14:33:43,887 INFO
management.script.ScriptPlugin pg_clog/0000
INFO | jvm 1 | 2011/09/23 14:33:43 | 2011-09-23 14:33:43,887 INFO
management.script.ScriptPlugin pg_multixact/offsets/0000
INFO | jvm 1 | 2011/09/23 14:33:43 | 2011-09-23 14:33:43,887 INFO
management.script.ScriptPlugin pg_notify/
...

But on the standby its size is the old one (thus, it seems, that the size
changed after the rsync transfer and before the pg_stop_backup() was
called):

ls -l pg_clog/
total 8
-rw------- 1 postgres postgres 8192 Sep 23 14:31 0000

5. Hence, the server doesn't start and pg_log complains:

2011-09-23 14:33:45 CEST LOG: streaming replication successfully connected
to primary
2011-09-23 14:33:46 CEST FATAL: the database system is starting up
2011-09-23 14:33:46 CEST FATAL: the database system is starting up
2011-09-23 14:33:46 CEST FATAL: the database system is starting up
2011-09-23 14:33:46 CEST FATAL: the database system is starting up
2011-09-23 14:33:46 CEST FATAL: the database system is starting up
2011-09-23 14:33:46 CEST FATAL: the database system is starting up
2011-09-23 14:33:46 CEST FATAL: the database system is starting up
2011-09-23 14:33:46 CEST FATAL: could not access status of transaction
37206
2011-09-23 14:33:46 CEST DETAIL: Could not read from file "pg_clog/0000" at
offset 8192: Success.

6. Now, if I do something that, of course, should never be done, and copy
this 0000 file from master to the standby soon enough (without even
starting/stopping backup), the standby starts up successfully.

--
Hope this helps,
Linas Virbalas
http://flyingclusters.blogspot.com/

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Linas Virbalas <linas(dot)virbalas(at)continuent(dot)com>
Cc:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Florian Pflug <fgp(at)phlo(dot)org>, Euler Taveira de Oliveira <euler(at)timbira(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Backup with rsync fails at pg_clog if under load
Date:	2011-09-23 12:54:54
Message-ID:	CA+TgmobVnjvtpmFO3uzh+6HsCmCwsxAhze0LDEQzrO+sUwvyYQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Sep 23, 2011 at 8:47 AM, Linas Virbalas
<linas(dot)virbalas(at)continuent(dot)com> wrote:
> But on the standby its size is the old one (thus, it seems, that the size
> changed after the rsync transfer and before the pg_stop_backup() was
> called):

Now that seems pretty weird - I don't think that file should ever shrink.

Are you sure that the rsync isn't starting until after
pg_start_backup() completes? Because if you were starting it just a
tiny bit early, that would explain the observed symptoms perfectly, I
think.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Linas Virbalas <linas(dot)virbalas(at)continuent(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Florian Pflug <fgp(at)phlo(dot)org>, Euler Taveira de Oliveira <euler(at)timbira(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Backup with rsync fails at pg_clog if under load
Date:	2011-09-23 15:36:26
Message-ID:	7642135A793F9C41B8DCC83FF7DC7CB1FA494C65@EXVMBX003-1.exch003intermedia.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

>> But on the standby its size is the old one (thus, it seems, that the size
>> changed after the rsync transfer and before the pg_stop_backup() was
>> called):

> Now that seems pretty weird - I don't think that file should ever shrink.

It seems, I was not clear in my last example. The pg_clog file doesn't shrink. On the contrary, when rsync kicks in it is still small, but after it completes and somewhere around the pg_stop_backup() call, the pg_clog file grows on the master. Hence, it is left smaller on the standby.

> Are you sure that the rsync isn't starting until after
> pg_start_backup() completes?

100% - the procedure is scripted in a single threaded application, so rsync is started only after pg_start_backup(...) returns.

> Because if you were starting it just a
> tiny bit early, that would explain the observed symptoms perfectly, I
> think.

I agree, but that is not the case.

From:	Aidan Van Dyk <aidan(at)highrise(dot)ca>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	Linas Virbalas <linas(dot)virbalas(at)continuent(dot)com>, Euler Taveira de Oliveira <euler(at)timbira(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Florian Pflug <fgp(at)phlo(dot)org>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Backup with rsync fails at pg_clog if under load
Date:	2011-09-23 15:43:49
Message-ID:	CAC_2qU-Km5FW9qXDEduv7K0S7tUTk==+tp=ZjdrFbpdbWZc-2w@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Sep 23, 2011 at 4:41 AM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:

>> Unfortunately, it's impossible, because the error message "Could not read
>> from file "pg_clog/0001" at offset 32768: Success" is shown (and startup
>> aborted) before the turn for "redo starts at" message arrives.
>
> It looks to me that pg_clog/0001 exists, but it shorter than recovery
> expects. Which shouldn't happen, of course, because the start-backup
> checkpoint should flush all the clog that's needed by recovery to disk
> before the backup procedure begins to them.

I think the point here is that recover *never starts*. Something in
the standby startup is looking for a value in a clog block that
recovery hadn't had a chance to replay (produce) yet.

So the standby is looking into the data directory *before* recovery
has had a chance to run, and based on that,goes to look for something
in clog page that wasn't guarenteed to exists at the start of the
backup period, and bombing out before recovery has a chance to start
replaying WAL and write the new clog page.

--
Aidan Van Dyk Create like a god,
aidan(at)highrise(dot)ca command like a king,
http://www.highrise.ca/ work like a slave.

From:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To:	Linas Virbalas <linas(dot)virbalas(at)continuent(dot)com>
Cc:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Florian Pflug <fgp(at)phlo(dot)org>, Euler Taveira de Oliveira <euler(at)timbira(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Backup with rsync fails at pg_clog if under load
Date:	2011-09-23 15:58:59
Message-ID:	1316793461-sup-7812@alvh.no-ip.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Excerpts from Linas Virbalas's message of vie sep 23 09:47:20 -0300 2011:
> On 9/23/11 12:05 PM, "Heikki Linnakangas"
> <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:

> But on the standby its size is the old one (thus, it seems, that the size
> changed after the rsync transfer and before the pg_stop_backup() was
> called):
>
> ls -l pg_clog/
> total 8
> -rw------- 1 postgres postgres 8192 Sep 23 14:31 0000

Sounds like rsync is caching the file size at the start of the run, and
then copying that many bytes, ignoring the growth that occurred after it
started.

--
Álvaro Herrera <alvherre(at)commandprompt(dot)com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

From:	Magnus Hagander <magnus(at)hagander(dot)net>
To:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc:	Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Linas Virbalas <linas(dot)virbalas(at)continuent(dot)com>, Euler Taveira de Oliveira <euler(at)timbira(dot)com>, Florian Pflug <fgp(at)phlo(dot)org>
Subject:	Re: Hot Backup with rsync fails at pg_clog if under load
Date:	2011-09-23 16:03:13
Message-ID:	CABUevEzH3q=q8XMZ+murfoyE=pvWko6UqO9t8JkfqVNJ2QpVjg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sep 23, 2011 5:59 PM, "Alvaro Herrera" <alvherre(at)commandprompt(dot)com>
wrote:
>
>
> Excerpts from Linas Virbalas's message of vie sep 23 09:47:20 -0300 2011:
> > On 9/23/11 12:05 PM, "Heikki Linnakangas"
> > <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>
> > But on the standby its size is the old one (thus, it seems, that the
size
> > changed after the rsync transfer and before the pg_stop_backup() was
> > called):
> >
> > ls -l pg_clog/
> > total 8
> > -rw------- 1 postgres postgres 8192 Sep 23 14:31 0000
>
> Sounds like rsync is caching the file size at the start of the run, and
> then copying that many bytes, ignoring the growth that occurred after it
> started.
>

That pretty much matches what Daniel does when he got the same failure case.

Is this not allowed? Shouldn't wal replay fix this?

/Magnus

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	pgsql-hackers(at)postgresql(dot)org
Cc:	Linas Virbalas <linas(dot)virbalas(at)continuent(dot)com>, "daniel(at)heroku(dot)com" <daniel(at)heroku(dot)com>
Subject:	Re: Hot Backup with rsync fails at pg_clog if under load
Date:	2011-09-23 16:10:39
Message-ID:	201109231810.39927.andres@anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

On Wednesday 21 Sep 2011 16:44:30 Linas Virbalas wrote:
> 2011-09-21 13:41:05 CEST DETAIL: Could not read from file "pg_clog/0001"
> at offset 32768: Success.
Any chance you can attach gdb to the startup process and provide a backtrace
from the place where this message is printed?

Greetings,

Andres

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Magnus Hagander <magnus(at)hagander(dot)net>
Cc:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Linas Virbalas <linas(dot)virbalas(at)continuent(dot)com>, Euler Taveira de Oliveira <euler(at)timbira(dot)com>, Florian Pflug <fgp(at)phlo(dot)org>
Subject:	Re: Hot Backup with rsync fails at pg_clog if under load
Date:	2011-09-23 16:38:27
Message-ID:	4E7CB603.2060200@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 23.09.2011 19:03, Magnus Hagander wrote:
> On Sep 23, 2011 5:59 PM, "Alvaro Herrera"<alvherre(at)commandprompt(dot)com>
> wrote:
>>
>>
>> Excerpts from Linas Virbalas's message of vie sep 23 09:47:20 -0300 2011:
>>> On 9/23/11 12:05 PM, "Heikki Linnakangas"
>>> <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>>
>>> But on the standby its size is the old one (thus, it seems, that the
> size
>>> changed after the rsync transfer and before the pg_stop_backup() was
>>> called):
>>>
>>> ls -l pg_clog/
>>> total 8
>>> -rw------- 1 postgres postgres 8192 Sep 23 14:31 0000
>>
>> Sounds like rsync is caching the file size at the start of the run, and
>> then copying that many bytes, ignoring the growth that occurred after it
>> started.
>
> That pretty much matches what Daniel does when he got the same failure case.
>
> Is this not allowed? Shouldn't wal replay fix this?

That's OK. The effect is the same as if rsync had copied the file at the
start.

What's not OK is to store the file as empty or truncated file in the
backup, when the file is deleted from pg_clog while rsync is running.
The file must have length >= the length the file had when backup
started, or the file must be omitted from the backup altogether.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Aidan Van Dyk <aidan(at)highrise(dot)ca>
Cc:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Linas Virbalas <linas(dot)virbalas(at)continuent(dot)com>, Euler Taveira de Oliveira <euler(at)timbira(dot)com>, Florian Pflug <fgp(at)phlo(dot)org>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Backup with rsync fails at pg_clog if under load
Date:	2011-09-23 16:45:16
Message-ID:	CA+TgmoYoWD5HOk=__On2_LMe8qa6yn9dAtOuxzn54Wfgx+iVig@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Sep 23, 2011 at 11:43 AM, Aidan Van Dyk <aidan(at)highrise(dot)ca> wrote:
> On Fri, Sep 23, 2011 at 4:41 AM, Heikki Linnakangas
> <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>
>>> Unfortunately, it's impossible, because the error message "Could not read
>>> from file "pg_clog/0001" at offset 32768: Success" is shown (and startup
>>> aborted) before the turn for "redo starts at" message arrives.
>>
>> It looks to me that pg_clog/0001 exists, but it shorter than recovery
>> expects. Which shouldn't happen, of course, because the start-backup
>> checkpoint should flush all the clog that's needed by recovery to disk
>> before the backup procedure begins to them.
>
> I think the point here is that recover *never starts*. Something in
> the standby startup is looking for a value in a clog block that
> recovery hadn't had a chance to replay (produce) yet.

Ah. I think you are right - Heikki made the same point. Maybe some
of the stuff that happens just after this comment:

/*
* Initialize for Hot Standby, if enabled. We won't let backends in
* yet, not until we've reached the min recovery point specified in
* control file and we've established a recovery snapshot from a
* running-xacts WAL record.
*/

...actually needs to be postponed until after we've reached consistency?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Florian Pflug <fgp(at)phlo(dot)org>
To:	Magnus Hagander <magnus(at)hagander(dot)net>
Cc:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Linas Virbalas <linas(dot)virbalas(at)continuent(dot)com>, Euler Taveira de Oliveira <euler(at)timbira(dot)com>
Subject:	Re: Hot Backup with rsync fails at pg_clog if under load
Date:	2011-09-23 16:47:08
Message-ID:	612DC4A7-1938-404B-84CE-9F5AD49016F4@phlo.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sep23, 2011, at 18:03 , Magnus Hagander wrote:
> On Sep 23, 2011 5:59 PM, "Alvaro Herrera" <alvherre(at)commandprompt(dot)com> wrote:
> > Sounds like rsync is caching the file size at the start of the run, and
> > then copying that many bytes, ignoring the growth that occurred after it
> > started.
>
> That pretty much matches what Daniel does when he got the same failure case.
>
> Is this not allowed? Shouldn't wal replay fix this?

I don't see how it could be forbidden. ISTM that we absolutely *have* to be
able to deal with each byte of a file's date, including it's meta-data, being in
any state it was between the time pg_start_backup() returned and pg_stop_backup()
was issued. With the individual states being in no way synchronized.
(OK, in reality restricting this to individual 512-byte blocks is probably enough,
but still...).

This, I think, is also the reasons we need to force full_page_writes to on
during a hot backup. If a page was modified at all between pg_start_backup() and
pg_stop_backup(), we essentially have to assume it's totally garbled.

best regards,
Florian Pflug

From:	Florian Pflug <fgp(at)phlo(dot)org>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Aidan Van Dyk <aidan(at)highrise(dot)ca>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Linas Virbalas <linas(dot)virbalas(at)continuent(dot)com>, Euler Taveira de Oliveira <euler(at)timbira(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Backup with rsync fails at pg_clog if under load
Date:	2011-09-23 16:49:18
Message-ID:	9C1909AF-D319-49BF-B3DA-5C3C30B5C849@phlo.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sep23, 2011, at 18:45 , Robert Haas wrote:
> Ah. I think you are right - Heikki made the same point. Maybe some
> of the stuff that happens just after this comment:
>
> /*
> * Initialize for Hot Standby, if enabled. We won't let backends in
> * yet, not until we've reached the min recovery point specified in
> * control file and we've established a recovery snapshot from a
> * running-xacts WAL record.
> */
>
>
> ...actually needs to be postponed until after we've reached consistency?

I came the the same conclusion. It seems the before we've reached consistency,
we shouldn't attempt to read anything because the data can be pretty arbitrarily
garbled.

best regards,
Florian Pflug

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Florian Pflug <fgp(at)phlo(dot)org>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Aidan Van Dyk <aidan(at)highrise(dot)ca>, Linas Virbalas <linas(dot)virbalas(at)continuent(dot)com>, Euler Taveira de Oliveira <euler(at)timbira(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Backup with rsync fails at pg_clog if under load
Date:	2011-09-23 16:58:30
Message-ID:	4E7CBAB6.6010205@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 23.09.2011 19:49, Florian Pflug wrote:
> On Sep23, 2011, at 18:45 , Robert Haas wrote:
>> Ah. I think you are right - Heikki made the same point. Maybe some
>> of the stuff that happens just after this comment:
>>
>> /*
>> * Initialize for Hot Standby, if enabled. We won't let backends in
>> * yet, not until we've reached the min recovery point specified in
>> * control file and we've established a recovery snapshot from a
>> * running-xacts WAL record.
>> */
>>
>>
>> ...actually needs to be postponed until after we've reached consistency?
>
> I came the the same conclusion. It seems the before we've reached consistency,
> we shouldn't attempt to read anything because the data can be pretty arbitrarily
> garbled.

There are pretty clear rules on what state clog can be in. When you
launch postmaster in a standby:

* Any clog preceding the nextXid from the checkpoint record we start
recovery from, must either be valid, or the clog file must be missing
altogether (which can happen when it was vacuumed away while the backup
in progress - if the clog is still needed at the end of backup it must
not be missing, of course).
* Any clog following nextXid can be garbled or missing.

Recovery will overwrite any clog after nextXid from the WAL, but not the
clog before it.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	Florian Pflug <fgp(at)phlo(dot)org>, Aidan Van Dyk <aidan(at)highrise(dot)ca>, Linas Virbalas <linas(dot)virbalas(at)continuent(dot)com>, Euler Taveira de Oliveira <euler(at)timbira(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Backup with rsync fails at pg_clog if under load
Date:	2011-09-23 19:10:01
Message-ID:	CA+TgmobkDYTts3jtA8SaDT3=N6ow_UDV_trR30F=o959nZiJ-g@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Sep 23, 2011 at 12:58 PM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> There are pretty clear rules on what state clog can be in. When you launch
> postmaster in a standby:
>
> * Any clog preceding the nextXid from the checkpoint record we start
> recovery from, must either be valid, or the clog file must be missing
> altogether (which can happen when it was vacuumed away while the backup in
> progress - if the clog is still needed at the end of backup it must not be
> missing, of course).
> * Any clog following nextXid can be garbled or missing.
>
> Recovery will overwrite any clog after nextXid from the WAL, but not the
> clog before it.

So the actual error message in the last test was:

2011-09-21 13:41:05 CEST FATAL: could not access status of transaction 1188673

...but we can't tell if that was before or after nextXid, which seems
like it would be useful to know.

If Linas can rerun his experiment, but also capture the output of
pg_controldata before firing up the standby for the first time, then
we'd able to see that information.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Daniel Farina <daniel(at)heroku(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Aidan Van Dyk <aidan(at)highrise(dot)ca>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Linas Virbalas <linas(dot)virbalas(at)continuent(dot)com>, Euler Taveira de Oliveira <euler(at)timbira(dot)com>, Florian Pflug <fgp(at)phlo(dot)org>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Backup with rsync fails at pg_clog if under load
Date:	2011-09-25 00:15:36
Message-ID:	CAAZKuFZBKhyxT9G+LHet=L0mGNLsWOPGZs3E4jLMGYQS9i9YpQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Sep 23, 2011 at 9:45 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Fri, Sep 23, 2011 at 11:43 AM, Aidan Van Dyk <aidan(at)highrise(dot)ca> wrote:
>> On Fri, Sep 23, 2011 at 4:41 AM, Heikki Linnakangas
>> <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>>
>>>> Unfortunately, it's impossible, because the error message "Could not read
>>>> from file "pg_clog/0001" at offset 32768: Success" is shown (and startup
>>>> aborted) before the turn for "redo starts at" message arrives.
>>>
>>> It looks to me that pg_clog/0001 exists, but it shorter than recovery
>>> expects. Which shouldn't happen, of course, because the start-backup
>>> checkpoint should flush all the clog that's needed by recovery to disk
>>> before the backup procedure begins to them.
>>
>> I think the point here is that recover *never starts*. Something in
>> the standby startup is looking for a value in a clog block that
>> recovery hadn't had a chance to replay (produce) yet.
>
> Ah. I think you are right - Heikki made the same point. Maybe some
> of the stuff that happens just after this comment:
>
> /*
> * Initialize for Hot Standby, if enabled. We won't let backends in
> * yet, not until we've reached the min recovery point specified in
> * control file and we've established a recovery snapshot from a
> * running-xacts WAL record.
> */
>
>
> ...actually needs to be postponed until after we've reached consistency?

We have a number of backups that are like this, and the problem is
entirely reproducible for those. We always get around it by disabling
hot standby for a while (until consistency is reached) I poked at
xlog.c a bit, and to me seems entirely likely that StartupCLOG is
being called early -- way too early, or at least parts of it.
Presumably(?) it is being called so early in the hot standby path so
that the status of transactions can be known for the purposes of
querying, but it's happening before consistency is reached, ergo not
many invariants (outside of checkpointed things like pg_controldata)
are likely to hold...such as clog being the right size to locate the
transaction status of a page.

Anyway, sorry for dropping the ball on pushing that one; we've been
using this workaround for a while after taking a look at the mechanism
and deciding it was probably not a problem (except for a sound night's
sleep). We've now seen this dozens of times.

--
fdr

From:	Florian Pflug <fgp(at)phlo(dot)org>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Aidan Van Dyk <aidan(at)highrise(dot)ca>, Linas Virbalas <linas(dot)virbalas(at)continuent(dot)com>, Euler Taveira de Oliveira <euler(at)timbira(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Backup with rsync fails at pg_clog if under load
Date:	2011-09-27 22:53:10
Message-ID:	E75A2E54-8E6E-4ADE-B6D8-E1FEEDFEF3A6@phlo.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sep23, 2011, at 21:10 , Robert Haas wrote:
> So the actual error message in the last test was:
>
> 2011-09-21 13:41:05 CEST FATAL: could not access status of transaction 1188673
>
> ...but we can't tell if that was before or after nextXid, which seems
> like it would be useful to know.
>
> If Linas can rerun his experiment, but also capture the output of
> pg_controldata before firing up the standby for the first time, then
> we'd able to see that information.

Hm, wouldn't pg_controldata quite certainly show a nextId beyond the clog
if copied after pg_clog/*?

Linas, could you capture the output of pg_controldata *and* increase the
log level to DEBUG1 on the standby? We should then see nextXid value of
the checkpoint the recovery is starting from.

FWIW, I've had a few more theories about what's going on, but none survived
after looking at the code. My first guess was that there maybe are circumstances
under which the nextId from the control file, instead of the one from the
pre-backup checkpoint, ends up becoming the standby's nextXid. But there doesn't
seem to be a way for that to happen.

My next theory was that something increments nextIdx before StartupCLOG().
The only possible candidate seems to be PrescanPreparedTransactions(), which
does increment nextXid if it's smaller than some sub-xid of a prepared xact.
But we only call that before StartupCLOG() if we're starting from a
shutdown checkpoint, which shouldn't be the case for the OP.

I also checked what rsync does when a file vanishes after rsync computed the
file list, but before it is sent. rsync 3.0.7 on OSX, at least, complains
loudly, and doesn't sync the file. It BTW also exits non-zero, with a special
exit code for precisely that failure case.

best regards,
Florian Pflug

From:	Linas Virbalas <linas(dot)virbalas(at)continuent(dot)com>
To:	Florian Pflug <fgp(at)phlo(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Aidan Van Dyk <aidan(at)highrise(dot)ca>, Euler Taveira de Oliveira <euler(at)timbira(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Backup with rsync fails at pg_clog if under load
Date:	2011-09-29 15:44:59
Message-ID:	CAAA6D2B.2769E%linas.virbalas@continuent.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> Linas, could you capture the output of pg_controldata *and* increase the
> log level to DEBUG1 on the standby? We should then see nextXid value of
> the checkpoint the recovery is starting from.

I'll try to do that whenever I'm in that territory again... Incidentally,
recently there was a lot of unrelated-to-this-post work to polish things up
for a talk being given at PGWest 2011 Today :)

> I also checked what rsync does when a file vanishes after rsync computed the
> file list, but before it is sent. rsync 3.0.7 on OSX, at least, complains
> loudly, and doesn't sync the file. It BTW also exits non-zero, with a special
> exit code for precisely that failure case.

To be precise, my script has logic to accept the exit code 24, just as
stated in PG manual:

Docs> For example, some versions of rsync return a separate exit code for
Docs> "vanished source files", and you can write a driver script to accept
Docs> this exit code as a non-error case.

--
Sincerely,
Linas Virbalas
http://flyingclusters.blogspot.com/

From:	Florian Pflug <fgp(at)phlo(dot)org>
To:	Linas Virbalas <linas(dot)virbalas(at)continuent(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Aidan Van Dyk <aidan(at)highrise(dot)ca>, Euler Taveira de Oliveira <euler(at)timbira(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Backup with rsync fails at pg_clog if under load
Date:	2011-09-29 18:22:19
Message-ID:	193B7D25-BD69-4C14-8E67-8CEA82888B65@phlo.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sep29, 2011, at 17:44 , Linas Virbalas wrote:
>> I also checked what rsync does when a file vanishes after rsync computed the
>> file list, but before it is sent. rsync 3.0.7 on OSX, at least, complains
>> loudly, and doesn't sync the file. It BTW also exits non-zero, with a special
>> exit code for precisely that failure case.
>
> To be precise, my script has logic to accept the exit code 24, just as
> stated in PG manual:
>
> Docs> For example, some versions of rsync return a separate exit code for
> Docs> "vanished source files", and you can write a driver script to accept
> Docs> this exit code as a non-error case.

Oh, cool. I was about to suggest that we add something along these lines
to the docs - guess I should RTFM from time to time ;-)

best regards,
Florian Pflug

>
> --
> Sincerely,
> Linas Virbalas
> http://flyingclusters.blogspot.com/
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers