Re: pg_standby -l might destory the archived file

Lists: pgsql-hackers
From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: pg_standby -l might destory the archived file
Date: 2009-06-01 05:47:36
Message-ID: 3f0b79eb0905312247u3cc3db16hdd20e6b36b86ac71@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

pg_standby can use ln command to restore an archived file,
which might destroy the archived file as follows.

1) pg_standby creates the symlink to the archived file '102'
2) '102' is applied
3) the next file '103' doesn't exist and the trigger file is created
4) '102' is re-fetched
5) at the end of recovery, the symlink to '102' is rename to '202',
but it still points '102'
6) after recovery, '202' is recycled (rename to '208', which still
points '102')
7) '208' is written new xlog records over
--> the archived file '102' comes down!

One simple solution to fix this problem is copying the content
of the symlink (ie. the archived file itself) and deleting it instead
of renaming it at the end of recovery.

Thought?

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_standby -l might destory the archived file
Date: 2009-06-01 14:41:56
Message-ID: 17191.1243867316@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Fujii Masao <masao(dot)fujii(at)gmail(dot)com> writes:
> pg_standby can use ln command to restore an archived file,
> which might destroy the archived file as follows.

Does it matter? pg_standby's source area wouldn't normally be an
"archive" in the real sense of the word, it's just a temporary staging
area between master and slave. (If it were being used as a real
archive, keeping it on the same disk as the live slave seems pretty
foolish anyway, so the case wouldn't arise.)

regards, tom lane


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_standby -l might destory the archived file
Date: 2009-06-01 14:55:53
Message-ID: 4A23EBF9.6000800@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> Fujii Masao <masao(dot)fujii(at)gmail(dot)com> writes:
>> pg_standby can use ln command to restore an archived file,
>> which might destroy the archived file as follows.
>
> Does it matter? pg_standby's source area wouldn't normally be an
> "archive" in the real sense of the word, it's just a temporary staging
> area between master and slave. (If it were being used as a real
> archive, keeping it on the same disk as the live slave seems pretty
> foolish anyway, so the case wouldn't arise.)

It seems perfectly sane to source pg_standby directly from the archive
to me. And we're talking about symbolic linking, so the archive
directory might well be on an NFS mount.

(I haven't looked at the issue yet)

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Aidan Van Dyk <aidan(at)highrise(dot)ca>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_standby -l might destory the archived file
Date: 2009-06-01 15:27:36
Message-ID: 20090601152736.GL15213@yugib.highrise.ca
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

* Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> [090601 10:56]:
> Tom Lane wrote:
>> Fujii Masao <masao(dot)fujii(at)gmail(dot)com> writes:
>>> pg_standby can use ln command to restore an archived file,
>>> which might destroy the archived file as follows.
>>
>> Does it matter? pg_standby's source area wouldn't normally be an
>> "archive" in the real sense of the word, it's just a temporary staging
>> area between master and slave. (If it were being used as a real
>> archive, keeping it on the same disk as the live slave seems pretty
>> foolish anyway, so the case wouldn't arise.)
>
> It seems perfectly sane to source pg_standby directly from the archive
> to me. And we're talking about symbolic linking, so the archive
> directory might well be on an NFS mount.

I would expect that any archive directly available would at least be RO
to the postgres slave... But....

Something like this would stop the "symlink" being renamed... Not portable, but probably portable
across platforms that have symlinks...
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 1b575e2..cba3f7a 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -3042,13 +3042,23 @@ RemoveOldXlogFiles(uint32 log, uint32 seg, XLogRecPtr endptr)
{
if (XLogArchiveCheckDone(xlde->d_name))
{
+ struct stat stat_buf;
snprintf(path, MAXPGPATH, XLOGDIR "/%s", xlde->d_name);

/*
* Before deleting the file, see if it can be recycled as a
* future log segment.
+ * If it's a symlink, we don't recycle it
*/
- if (InstallXLogFileSegment(&endlogId, &endlogSeg, path,
+ if ( (stat(path, &stat_buf) == 0) && S_ISLNK(stat_buf.st_mode))
+ {
+ ereport(DEBUG2,
+ (errmsg("removing transaction log symlink \"%s\"",
+ xlde->d_name)));
+ unlink(path);
+ CheckpointStats.ckpt_segs_removed++;
+ }
+ else if (InstallXLogFileSegment(&endlogId, &endlogSeg, path,
true, &max_advance,
true))
{

You can make a smaller patch if your not interested in the DEBUG2 message
saying that it deleted a symlink...

--
Aidan Van Dyk Create like a god,
aidan(at)highrise(dot)ca command like a king,
http://www.highrise.ca/ work like a slave.


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Aidan Van Dyk <aidan(at)highrise(dot)ca>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_standby -l might destory the archived file
Date: 2009-06-01 19:03:43
Message-ID: 4A24260F.5020902@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Aidan Van Dyk wrote:
> * Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> [090601 10:56]:
>> Tom Lane wrote:
>>> Fujii Masao <masao(dot)fujii(at)gmail(dot)com> writes:
>>>> pg_standby can use ln command to restore an archived file,
>>>> which might destroy the archived file as follows.
>>> Does it matter? pg_standby's source area wouldn't normally be an
>>> "archive" in the real sense of the word, it's just a temporary staging
>>> area between master and slave. (If it were being used as a real
>>> archive, keeping it on the same disk as the live slave seems pretty
>>> foolish anyway, so the case wouldn't arise.)
>> It seems perfectly sane to source pg_standby directly from the archive
>> to me. And we're talking about symbolic linking, so the archive
>> directory might well be on an NFS mount.
>
> I would expect that any archive directly available would at least be RO
> to the postgres slave... But....

Me too.

I wonder if we should just remove the symlink option from pg_standby.
Does anyone use it? Is there a meaningful performance difference?

> Something like this would stop the "symlink" being renamed... Not portable, but probably portable
> across platforms that have symlinks...
> diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c

That seems reasonable as well.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Aidan Van Dyk <aidan(at)highrise(dot)ca>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_standby -l might destory the archived file
Date: 2009-06-01 19:18:28
Message-ID: 22255.1243883908@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
> I wonder if we should just remove the symlink option from pg_standby.

I was considering suggesting that too...

regards, tom lane


From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_standby -l might destory the archived file
Date: 2009-06-02 00:39:05
Message-ID: 3f0b79eb0906011739i28361675q4e88c0648feb34e3@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

On Mon, Jun 1, 2009 at 11:41 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Fujii Masao <masao(dot)fujii(at)gmail(dot)com> writes:
>> pg_standby can use ln command to restore an archived file,
>> which might destroy the archived file as follows.
>
> Does it matter?  pg_standby's source area wouldn't normally be an
> "archive" in the real sense of the word, it's just a temporary staging
> area between master and slave.

If so, it might be deleted after triggering the warm-standby. But, we cannot
remove it because the latest xlog file which is required for normal recovery
might exist in it. This is another undesirable scenario. Is this problem?

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_standby -l might destory the archived file
Date: 2009-06-02 01:00:43
Message-ID: 29796.1243904443@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Fujii Masao <masao(dot)fujii(at)gmail(dot)com> writes:
> If so, it might be deleted after triggering the warm-standby. But, we cannot
> remove it because the latest xlog file which is required for normal recovery
> might exist in it. This is another undesirable scenario. Is this problem?

What recovery? In the problem case you're positing, the slave server
has executed a checkpoint and come up live. It's never going to be
interested in the old xlog again.

regards, tom lane


From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_standby -l might destory the archived file
Date: 2009-06-02 01:14:13
Message-ID: 3f0b79eb0906011814v1fa01606xe23fdba694fd0ed4@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

On Tue, Jun 2, 2009 at 10:00 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Fujii Masao <masao(dot)fujii(at)gmail(dot)com> writes:
>> If so, it might be deleted after triggering the warm-standby. But, we cannot
>> remove it because the latest xlog file which is required for normal recovery
>> might exist in it. This is another undesirable scenario. Is this problem?
>
> What recovery?  In the problem case you're positing, the slave server
> has executed a checkpoint and come up live.  It's never going to be
> interested in the old xlog again.

Yes, the old xlog itself is not used again. But, the *old file* might
be recycled
and used later. The case that I'm looking at is that the symlink to a temporary
area is recycled. Am I missing something?

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_standby -l might destory the archived file
Date: 2009-06-02 01:21:43
Message-ID: 256.1243905703@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Fujii Masao <masao(dot)fujii(at)gmail(dot)com> writes:
> Yes, the old xlog itself is not used again. But, the *old file* might
> be recycled and used later. The case that I'm looking at is that the
> symlink to a temporary area is recycled. Am I missing something?

Actually, I think the right fix for that would be to add defenses to
xlog.c to not try to "recycle" a file that is a symlink. No good could
possibly come of that.

regards, tom lane


From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_standby -l might destory the archived file
Date: 2009-06-02 02:39:36
Message-ID: 3f0b79eb0906011939x59d63cf7hbde1f55a5a03a5f4@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

On Tue, Jun 2, 2009 at 10:21 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Fujii Masao <masao(dot)fujii(at)gmail(dot)com> writes:
>> Yes, the old xlog itself is not used again. But, the *old file* might
>> be recycled and used later. The case that I'm looking at is that the
>> symlink to a temporary area is recycled. Am I missing something?
>
> Actually, I think the right fix for that would be to add defenses to
> xlog.c to not try to "recycle" a file that is a symlink.

OK, I tweaked Aidan's patch. Thanks Aidan!
http://archives.postgresql.org/message-id/20090601152736.GL15213@yugib.highrise.ca

Changes are:
- use lstat instead of stat
- add #if HAVE_WORKING_LINK and #endif code

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment Content-Type Size
not_recycle_symlink.patch text/x-patch 1.3 KB

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Aidan Van Dyk <aidan(at)highrise(dot)ca>
Subject: Re: pg_standby -l might destory the archived file
Date: 2009-06-02 06:40:02
Message-ID: 4A24C942.2090708@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Fujii Masao wrote:
> On Tue, Jun 2, 2009 at 10:21 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Fujii Masao <masao(dot)fujii(at)gmail(dot)com> writes:
>>> Yes, the old xlog itself is not used again. But, the *old file* might
>>> be recycled and used later. The case that I'm looking at is that the
>>> symlink to a temporary area is recycled. Am I missing something?
>> Actually, I think the right fix for that would be to add defenses to
>> xlog.c to not try to "recycle" a file that is a symlink.
>
> OK, I tweaked Aidan's patch. Thanks Aidan!
> http://archives.postgresql.org/message-id/20090601152736.GL15213@yugib.highrise.ca
>
> Changes are:
> - use lstat instead of stat
> - add #if HAVE_WORKING_LINK and #endif code

Committed. I left out the "#ifdef HAVE_WORKING_LINK" and used S_ISREG()
instead of S_ISLNK. We use lstat + S_ISREG elsewhere too, so there
should be no portability issues.

I backpatched to 8.3, since that's when pg_standby was added. Arguably
earlier versions should've been changed too, as pg_standby works with
earlier versions, but I decided to not rock the boat as this only
affects the pg_standby -l mode.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Aidan Van Dyk <aidan(at)highrise(dot)ca>
Subject: Re: pg_standby -l might destory the archived file
Date: 2009-06-02 06:45:14
Message-ID: 3f0b79eb0906012345h11bf905fy9073db5099f9cba9@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

On Tue, Jun 2, 2009 at 3:40 PM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> Fujii Masao wrote:
>>
>> On Tue, Jun 2, 2009 at 10:21 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>>
>>> Fujii Masao <masao(dot)fujii(at)gmail(dot)com> writes:
>>>>
>>>> Yes, the old xlog itself is not used again. But, the *old file* might
>>>> be recycled and used later. The case that I'm looking at is that the
>>>> symlink to a temporary area is recycled. Am I missing something?
>>>
>>> Actually, I think the right fix for that would be to add defenses to
>>> xlog.c to not try to "recycle" a file that is a symlink.
>>
>> OK, I tweaked Aidan's patch. Thanks Aidan!
>>
>> http://archives.postgresql.org/message-id/20090601152736.GL15213@yugib.highrise.ca
>>
>> Changes are:
>> - use lstat instead of stat
>> - add #if HAVE_WORKING_LINK and #endif code
>
> Committed. I left out the "#ifdef HAVE_WORKING_LINK" and used S_ISREG()
> instead of S_ISLNK. We use lstat + S_ISREG elsewhere too, so there should be
> no portability issues.

Thanks a lot!

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_standby -l might destory the archived file
Date: 2009-06-02 16:13:10
Message-ID: 1243959190.23910.81.camel@ebony.2ndQuadrant
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On Mon, 2009-06-01 at 14:47 +0900, Fujii Masao wrote:

> pg_standby can use ln command to restore an archived file,
> which might destroy the archived file as follows.
>
> 1) pg_standby creates the symlink to the archived file '102'
> 2) '102' is applied
> 3) the next file '103' doesn't exist and the trigger file is created
> 4) '102' is re-fetched
> 5) at the end of recovery, the symlink to '102' is rename to '202',
> but it still points '102'
> 6) after recovery, '202' is recycled (rename to '208', which still
> points '102')
> 7) '208' is written new xlog records over
> --> the archived file '102' comes down!
>
> One simple solution to fix this problem...

err...I don't see *any* problem at all, since pg_standby does not do
step (1) in the way you say and therefore never does step (5). Any links
created are explicitly deleted in all cases at the end of recovery.

General comment on thread: What's going on with all these "fixes"?
Anybody reading the commit log and/or weekly news is going to get fairly
worried for no reason at all. For that reason I ask for longer
consideration and wider discussion before committing something - it
would certainly avoid lengthy post commit discussion as has occurred
twice recently. I see no reason for such haste on these "fixes". If
there's a need for haste, ship them to your customers directly, please
don't scare other people's.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_standby -l might destory the archived file
Date: 2009-06-02 16:41:21
Message-ID: 14428.1243960881@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
> err...I don't see *any* problem at all, since pg_standby does not do
> step (1) in the way you say and therefore never does step (5). Any links
> created are explicitly deleted in all cases at the end of recovery.

That's a good point; don't we recover files under names like
RECOVERYXLOG, not under names that could possibly conflict with regular
WAL files?

regards, tom lane


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_standby -l might destory the archived file
Date: 2009-06-02 16:43:08
Message-ID: 4A25569C.8070906@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Simon Riggs wrote:
> On Mon, 2009-06-01 at 14:47 +0900, Fujii Masao wrote:
>
>> pg_standby can use ln command to restore an archived file,
>> which might destroy the archived file as follows.
>>
>> 1) pg_standby creates the symlink to the archived file '102'
>> 2) '102' is applied
>> 3) the next file '103' doesn't exist and the trigger file is created
>> 4) '102' is re-fetched
>> 5) at the end of recovery, the symlink to '102' is rename to '202',
>> but it still points '102'
>> 6) after recovery, '202' is recycled (rename to '208', which still
>> points '102')
>> 7) '208' is written new xlog records over
>> --> the archived file '102' comes down!
>>
>> One simple solution to fix this problem...
>
> err...I don't see *any* problem at all, since pg_standby does not do
> step (1) in the way you say and therefore never does step (5). Any links
> created are explicitly deleted in all cases at the end of recovery.

I don't know how you came to that conclusion, but Fujii-sans description
seems accurate to me, and I can reproduce the behavior on my laptop.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Simon Riggs <simon(at)2ndQuadrant(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_standby -l might destory the archived file
Date: 2009-06-02 17:18:46
Message-ID: 4A255EF6.5050001@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
>> err...I don't see *any* problem at all, since pg_standby does not do
>> step (1) in the way you say and therefore never does step (5). Any links
>> created are explicitly deleted in all cases at the end of recovery.
>
> That's a good point; don't we recover files under names like
> RECOVERYXLOG, not under names that could possibly conflict with regular
> WAL files?

Yes. But we rename RECOVERYXLOG to 000000010000000000000057 or similar
at the end of recovery, in exitArchiveRecovery().

Thinking about this some more, I think we should've changed
exitArchiveRecovery() rather than RemoveOldXlogFiles(): it would be more
robust if exitArchiveRecovery() always copied the last WAL file rather
than just renamed it. It doesn't seem safe to rely on the file the
symlink points to to be valid after recovery is finished, and we might
write to it before it's recycled, so the current fix isn't complete.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Simon Riggs <simon(at)2ndQuadrant(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_standby -l might destory the archived file
Date: 2009-06-02 18:54:21
Message-ID: 16657.1243968861@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
> Tom Lane wrote:
>> That's a good point; don't we recover files under names like
>> RECOVERYXLOG, not under names that could possibly conflict with regular
>> WAL files?

> Yes. But we rename RECOVERYXLOG to 000000010000000000000057 or similar
> at the end of recovery, in exitArchiveRecovery().

> Thinking about this some more, I think we should've changed
> exitArchiveRecovery() rather than RemoveOldXlogFiles(): it would be more
> robust if exitArchiveRecovery() always copied the last WAL file rather
> than just renamed it. It doesn't seem safe to rely on the file the
> symlink points to to be valid after recovery is finished, and we might
> write to it before it's recycled, so the current fix isn't complete.

Hmm. I think really the reason it's coded that way is that we assumed
the recovery command would be physically copying the file from someplace
else. pg_standby is violating the backend's expectations by using a
symlink. And I really doubt that the technique is saving anything, since
the data has to be read in from the archive location anyway.

I'm leaning back to the position that pg_standby's -l option is simply a
bad idea and should be removed.

regards, tom lane


From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_standby -l might destory the archived file
Date: 2009-06-03 06:20:26
Message-ID: 3f0b79eb0906022320y73fdae21gea56aa216d9d0628@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

On Wed, Jun 3, 2009 at 3:54 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
>> Tom Lane wrote:
>>> That's a good point; don't we recover files under names like
>>> RECOVERYXLOG, not under names that could possibly conflict with regular
>>> WAL files?
>
>> Yes. But we rename RECOVERYXLOG to 000000010000000000000057 or similar
>> at the end of recovery, in exitArchiveRecovery().
>
>> Thinking about this some more, I think we should've changed
>> exitArchiveRecovery() rather than RemoveOldXlogFiles(): it would be more
>> robust if exitArchiveRecovery() always copied the last WAL file rather
>> than just renamed it. It doesn't seem safe to rely on the file the
>> symlink points to to be valid after recovery is finished, and we might
>> write to it before it's recycled, so the current fix isn't complete.
>
> Hmm.  I think really the reason it's coded that way is that we assumed
> the recovery command would be physically copying the file from someplace
> else.  pg_standby is violating the backend's expectations by using a
> symlink.  And I really doubt that the technique is saving anything, since
> the data has to be read in from the archive location anyway.
>
> I'm leaning back to the position that pg_standby's -l option is simply a
> bad idea and should be removed.

I'm OK with this. And, we should document the assumption for
restore_command? Otherwise, some users might wrongly use
'ln' command to restore archived files.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_standby -l might destory the archived file
Date: 2009-06-04 08:53:05
Message-ID: 1244105585.23910.153.camel@ebony.2ndQuadrant
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On Tue, 2009-06-02 at 14:54 -0400, Tom Lane wrote:
> Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
> > Tom Lane wrote:
> >> That's a good point; don't we recover files under names like
> >> RECOVERYXLOG, not under names that could possibly conflict with regular
> >> WAL files?
>
> > Yes. But we rename RECOVERYXLOG to 000000010000000000000057 or similar
> > at the end of recovery, in exitArchiveRecovery().
>
> > Thinking about this some more, I think we should've changed
> > exitArchiveRecovery() rather than RemoveOldXlogFiles(): it would be more
> > robust if exitArchiveRecovery() always copied the last WAL file rather
> > than just renamed it. It doesn't seem safe to rely on the file the
> > symlink points to to be valid after recovery is finished, and we might
> > write to it before it's recycled, so the current fix isn't complete.
>
> Hmm. I think really the reason it's coded that way is that we assumed
> the recovery command would be physically copying the file from someplace
> else. pg_standby is violating the backend's expectations by using a
> symlink. And I really doubt that the technique is saving anything, since
> the data has to be read in from the archive location anyway.
>
> I'm leaning back to the position that pg_standby's -l option is simply a
> bad idea and should be removed.

ISTM we didn't clearly state what the recovery_command should do either
way. Even if you remove the pg_standby option that will not fix the
problem for people who have written their own script, or existing users
of pg_standby. The safe way is to do as Heikki suggests and copy the
final file into place and I would add that we must then fsync it also.
That should be back-patched possibly as far as 8.0. Documenting a change
is not nearly enough.

Removing -l is a separate discussion. If there was a consensus against
it, I would suggest that we deprecate the option, so that it does
nothing.

As an aside, I would be also much more comfortable if there was an
option to not recycle the WAL files at all, as a safe mode for error
checking at least. The question is: why do we need to zero fill the file
first anyway? We could save the 8 bytes for the prev pointer on every
record if we added enough zeroes onto every WAL write to zap any
pre-existing header data, causing recovery to fail.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_standby -l might destory the archived file
Date: 2009-06-04 09:01:28
Message-ID: 1244106088.23910.159.camel@ebony.2ndQuadrant
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On Tue, 2009-06-02 at 19:43 +0300, Heikki Linnakangas wrote:
> Simon Riggs wrote:
> > On Mon, 2009-06-01 at 14:47 +0900, Fujii Masao wrote:
> >
> >> pg_standby can use ln command to restore an archived file,
> >> which might destroy the archived file as follows.
> >>
> >> 1) pg_standby creates the symlink to the archived file '102'
> >> 2) '102' is applied
> >> 3) the next file '103' doesn't exist and the trigger file is created
> >> 4) '102' is re-fetched
> >> 5) at the end of recovery, the symlink to '102' is rename to '202',
> >> but it still points '102'
> >> 6) after recovery, '202' is recycled (rename to '208', which still
> >> points '102')
> >> 7) '208' is written new xlog records over
> >> --> the archived file '102' comes down!
> >>
> >> One simple solution to fix this problem...
> >
> > err...I don't see *any* problem at all, since pg_standby does not do
> > step (1) in the way you say and therefore never does step (5). Any links
> > created are explicitly deleted in all cases at the end of recovery.
>
> I don't know how you came to that conclusion, but Fujii-sans description
> seems accurate to me, and I can reproduce the behavior on my laptop.

I accept there is a bug, though the initial description did not appear
to reflect the way the code works.

Thank you for not making further changes while we wait to discuss.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support