Re: Point in Time Recovery

Lists: pgsql-hackers
From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, Zeugswetter Andreas SB SD <ZeugswetterA(at)spardat(dot)at>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Point in Time Recovery
Date: 2004-07-20 00:19:58
Message-ID: 1669.1090282798@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Bruce and I had another phone chat about the problems that can ensue
if you restore a tar backup that contains old (incompletely filled)
versions of WAL segment files. While the current code will ignore them
during the recovery-from-archive run, leaving them laying around seems
awfully dangerous. One nasty possibility is that the archiving
mechanism will pick up these files and overwrite good copies in the
archive area with the obsolete ones from the backup :-(.

Bruce earlier proposed that we simply "rm pg_xlog/*" at the start of
a recovery-from-archive run, but as I said I'm scared to death of code
that does such a thing automatically. In particular this would make it
impossible to handle scenarios where you want to do a PITR recovery but
you need to use some recent WAL segments that didn't make it into your
archive yet. (Maybe you could get around this by forcibly transferring
such segments into the archive, but that seems like a bad idea for
incomplete segments.)

It would really be best for the DBA to make sure that the starting
condition for the recovery run does not have any obsolete segment files
in pg_xlog. He could do this either by setting up his backup policy so
that pg_xlog isn't included in the tar backup in the first place, or by
manually removing the included files just after restoring the backup,
before he tries to start the recovery run.

Of course the objection to that is "what if the DBA forgets to do it?"

The idea that we came to on the phone was for the postmaster, when it
enters recovery mode because a recovery.conf file exists, to look in
pg_xlog for existing segment files and refuse to start if any are there
--- *unless* the user has put a special, non-default overriding flag
into recovery.conf. Call it "use_unarchived_files" or something like
that. We'd have to provide good documentation and an extensive HINT of
course, but basically the DBA would have two choices when he gets this
refusal to start:

1. Remove all the segment files in pg_xlog. (This would be the right
thing to do if he knows they all came off the backup.)

2. Verify that pg_xlog contains only segment files that are newer than
what's stored in the WAL archive, and then set the override flag in
recovery.conf. In this case the DBA is taking responsibility for
leaving only segment files that are good to use.

One interesting point is that with such a policy, we could use locally
available WAL segments in preference to pulling the same segments from
archive, which would be at least marginally more efficient, and seems
logically cleaner anyway.

In particular it seems that this would be a useful arrangement in cases
where you have questionable WAL segments --- you're not sure if they're
good or not. Rather than having to push questionable data into your WAL
archive, you can leave it local, try a recovery run, and see if you like
the resulting state. If not, it's a lot easier to do-over when you have
not corrupted your archive area.

Comments? Better ideas?

regards, tom lane


From: Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, Zeugswetter Andreas SB SD <ZeugswetterA(at)spardat(dot)at>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Point in Time Recovery
Date: 2004-07-20 01:36:00
Message-ID: 40FC7700.7060001@familyhealth.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I've got a PITR set up here that's happily scp'ing WAL files across to
another machine. However, the NIC in the machine is currently stuffed,
so it gets like 50k/s :) What happens in general if you are generating
WAL file bytes faster always than they can be copied off?

Also, does the archive dir just basically keep filling up forever? How
do I know when I can prune some files? Anything older than the last
full backup?

Chris


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, Zeugswetter Andreas SB SD <ZeugswetterA(at)spardat(dot)at>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Point in Time Recovery
Date: 2004-07-20 02:08:12
Message-ID: 2564.1090289292@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au> writes:
> I've got a PITR set up here that's happily scp'ing WAL files across to
> another machine. However, the NIC in the machine is currently stuffed,
> so it gets like 50k/s :) What happens in general if you are generating
> WAL file bytes faster always than they can be copied off?

If you keep falling further and further behind, eventually your pg_xlog
directory will fill the space available on its disk, and I think at that
point PG will panic and shut down because it can't create any more xlog
segments.

> Also, does the archive dir just basically keep filling up forever? How
> do I know when I can prune some files? Anything older than the last
> full backup?

Anything older than the starting checkpoint of the last full backup that
you might want to restore to. We need to adjust the backup procedure so
that the starting segment number for a backup is more readily visible;
see recent discussions about logging that explicitly in some fashion.

regards, tom lane


From: Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, Zeugswetter Andreas SB SD <ZeugswetterA(at)spardat(dot)at>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Point in Time Recovery
Date: 2004-07-20 02:19:16
Message-ID: 40FC8124.1070107@familyhealth.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> If you keep falling further and further behind, eventually your pg_xlog
> directory will fill the space available on its disk, and I think at that
> point PG will panic and shut down because it can't create any more xlog
> segments.

Hang on, are you supposed to MOVE or COPY away WAL segments?

Chris


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Zeugswetter Andreas SB SD <ZeugswetterA(at)spardat(dot)at>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Point in Time Recovery
Date: 2004-07-20 02:22:50
Message-ID: 200407200222.i6K2MoS01269@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Christopher Kings-Lynne wrote:
> > If you keep falling further and further behind, eventually your pg_xlog
> > directory will fill the space available on its disk, and I think at that
> > point PG will panic and shut down because it can't create any more xlog
> > segments.
>
> Hang on, are you supposed to MOVE or COPY away WAL segments?

Copy. pg will delete them once they are archived.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, Zeugswetter Andreas SB SD <ZeugswetterA(at)spardat(dot)at>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Point in Time Recovery
Date: 2004-07-20 02:23:44
Message-ID: 2763.1090290224@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au> writes:
>> If you keep falling further and further behind, eventually your pg_xlog
>> directory will fill the space available on its disk, and I think at that
>> point PG will panic and shut down because it can't create any more xlog
>> segments.

> Hang on, are you supposed to MOVE or COPY away WAL segments?

COPY. The checkpoint code will then delete or recycle the segment file,
as appropriate.

regards, tom lane


From: Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, Zeugswetter Andreas SB SD <ZeugswetterA(at)spardat(dot)at>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Point in Time Recovery
Date: 2004-07-20 02:39:16
Message-ID: 40FC85D4.1080509@familyhealth.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

>>Hang on, are you supposed to MOVE or COPY away WAL segments?
>
> COPY. The checkpoint code will then delete or recycle the segment file,
> as appropriate.

So what happens if you just move it? Postgres breaks?

Chris


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, Zeugswetter Andreas SB SD <ZeugswetterA(at)spardat(dot)at>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Point in Time Recovery
Date: 2004-07-20 02:54:11
Message-ID: 3013.1090292051@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au> writes:
>>> Hang on, are you supposed to MOVE or COPY away WAL segments?
>>
>> COPY. The checkpoint code will then delete or recycle the segment file,
>> as appropriate.

> So what happens if you just move it? Postgres breaks?

I don't think so, but it seems like a much less robust way to do things.
What happens if you have a failure partway through? For instance
archive machine dies and loses recent data right after you've rm'd the
source file. The recommended COPY procedure at least provides some
breathing room between when you install the data on the archive and when
the original file is removed.

It's not like you save any effort by using a MOVE anyway. You're not
going to have the archive on the same machine as the database (or if you
are, you ain't gonna be *my* DBA ...)

regards, tom lane


From: Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, Zeugswetter Andreas SB SD <ZeugswetterA(at)spardat(dot)at>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Point in Time Recovery
Date: 2004-07-20 03:37:07
Message-ID: 40FC9363.3060200@familyhealth.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> I don't think so, but it seems like a much less robust way to do things.
> What happens if you have a failure partway through? For instance
> archive machine dies and loses recent data right after you've rm'd the
> source file. The recommended COPY procedure at least provides some
> breathing room between when you install the data on the archive and when
> the original file is removed.

Well, I tried it in 'cross your fingers' mode and it works, at least:

archive_command = 'rm %p'

:)

Chris


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Zeugswetter Andreas SB SD <ZeugswetterA(at)spardat(dot)at>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Point in Time Recovery
Date: 2004-07-28 16:19:50
Message-ID: 200407281619.i6SGJoG27790@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Oh, here is something else we need to add --- a GUC to control whether
pg_xlog is clean on recovery start.

---------------------------------------------------------------------------

Tom Lane wrote:
> Bruce and I had another phone chat about the problems that can ensue
> if you restore a tar backup that contains old (incompletely filled)
> versions of WAL segment files. While the current code will ignore them
> during the recovery-from-archive run, leaving them laying around seems
> awfully dangerous. One nasty possibility is that the archiving
> mechanism will pick up these files and overwrite good copies in the
> archive area with the obsolete ones from the backup :-(.
>
> Bruce earlier proposed that we simply "rm pg_xlog/*" at the start of
> a recovery-from-archive run, but as I said I'm scared to death of code
> that does such a thing automatically. In particular this would make it
> impossible to handle scenarios where you want to do a PITR recovery but
> you need to use some recent WAL segments that didn't make it into your
> archive yet. (Maybe you could get around this by forcibly transferring
> such segments into the archive, but that seems like a bad idea for
> incomplete segments.)
>
> It would really be best for the DBA to make sure that the starting
> condition for the recovery run does not have any obsolete segment files
> in pg_xlog. He could do this either by setting up his backup policy so
> that pg_xlog isn't included in the tar backup in the first place, or by
> manually removing the included files just after restoring the backup,
> before he tries to start the recovery run.
>
> Of course the objection to that is "what if the DBA forgets to do it?"
>
> The idea that we came to on the phone was for the postmaster, when it
> enters recovery mode because a recovery.conf file exists, to look in
> pg_xlog for existing segment files and refuse to start if any are there
> --- *unless* the user has put a special, non-default overriding flag
> into recovery.conf. Call it "use_unarchived_files" or something like
> that. We'd have to provide good documentation and an extensive HINT of
> course, but basically the DBA would have two choices when he gets this
> refusal to start:
>
> 1. Remove all the segment files in pg_xlog. (This would be the right
> thing to do if he knows they all came off the backup.)
>
> 2. Verify that pg_xlog contains only segment files that are newer than
> what's stored in the WAL archive, and then set the override flag in
> recovery.conf. In this case the DBA is taking responsibility for
> leaving only segment files that are good to use.
>
> One interesting point is that with such a policy, we could use locally
> available WAL segments in preference to pulling the same segments from
> archive, which would be at least marginally more efficient, and seems
> logically cleaner anyway.
>
> In particular it seems that this would be a useful arrangement in cases
> where you have questionable WAL segments --- you're not sure if they're
> good or not. Rather than having to push questionable data into your WAL
> archive, you can leave it local, try a recovery run, and see if you like
> the resulting state. If not, it's a lot easier to do-over when you have
> not corrupted your archive area.
>
> Comments? Better ideas?
>
> regards, tom lane
>

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073