Re: Serious problem: media recovery fails after system or PostgreSQL crash

Lists: pgsql-hackers
From: "Kevin Grittner" <kgrittn(at)mail(dot)com>
To: "MauMau" <maumau307(at)gmail(dot)com>,pgsql-hackers(at)postgresql(dot)org
Subject: Re: Serious problem: media recovery fails after system or PostgreSQL crash
Date: 2012-12-06 16:52:46
Message-ID: 20121206165246.142840@gmx.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

MauMau wrote:

> [Problem]
> I'm using PostgreSQL 9.1.6 on Linux. I encountered a serious
> problem that media recovery failed showing the following message:
>
> FATAL: archive file "000000010000008000000028" has wrong size:
> 7340032 instead of 16777216
>
> I'm using normal cp command to archive WAL files. That is:
>
>  archive_command = '/path/to/my_script.sh "%p"
"/backup/archive_log/%f"'
>
> <<my_script.sh>>
> --------------------------------------------------
> #!/bin/sh
> some processing...
> cp "$1" "$2"
> other processing...
> --------------------------------------------------
>
>
> The media recovery was triggered by power failure. The disk drive
> that stored $PGDATA failed after a power failure. So I replaced
> the failed disk, and performed media recovery by creating
> recovery.conf and running pg_ctl start. However, pg_ctl failed
> with the above error message.

If you are attempting a PITR-style recovery and you want to include
WAL entries from the partially-copied file, pad a copy of it with
NUL bytes to the expected length.

-Kevin


From: "MauMau" <maumau307(at)gmail(dot)com>
To: "Kevin Grittner" <kgrittn(at)mail(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Serious problem: media recovery fails after system or PostgreSQL crash
Date: 2012-12-06 22:33:29
Message-ID: BBE4CD52C6514703A9AB604AB6655757@maumau
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

From: "Kevin Grittner" <kgrittn(at)mail(dot)com>
> If you are attempting a PITR-style recovery and you want to include
> WAL entries from the partially-copied file, pad a copy of it with
> NUL bytes to the expected length.

I'm afraid This is unacceptably difficult, or almost impossible, for many PG
users. How do you do the following?

1. Identify the file type (WAL segment, backup history file, timeline
history file) and its expected size in the archive_command script.
archive_command has to handle these three types of files. Embedding file
name logic (e.g. WAL is 000000010000000200000003) in archive_command is a
bad idea, because the file name might change in the future PG release.

2. Append NUL bytes to the file in the archive_command shell script or batch
file. Particularly I have no idea about Windows. I have some PG systems
running on Windows. This would compromise the ease of use of PostgreSQL.

So I believe PG should handle the problem, not the archive_command.

Regards
MauMau