Problem with PITR recovery

Lists: pgsql-hackers
From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Problem with PITR recovery
Date: 2005-04-16 05:11:06
Message-ID: 200504160511.j3G5B6x29421@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I had a problem using PITR recovery just now. If I do:

SELECT pg_start_backup('label');
do my tar
SELECT pg_stop_backup();

and stop the server, delete /data, then recover from the tar, delete
files in pg_xlog, then set recovery.conf to restore, it fails, I think
because no actual pg_xlog file was archived since the tar.

The problem is that we don't archive the partially written xlog file,
and in this case that xlog file contains the information needed to make
the tar file consistent.

Is this a known problem? Do we document this? If so, I can't find it.

I am concerned about folks cleaning out their archive directory after
the pg_stop_backup() not realizing they need that last xlog file to make
the tar valid.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Problem with PITR recovery
Date: 2005-04-16 16:00:28
Message-ID: 13133.1113667228@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> The problem is that we don't archive the partially written xlog file,
> and in this case that xlog file contains the information needed to make
> the tar file consistent.

> Is this a known problem? Do we document this? If so, I can't find it.

Yes, and yes. You did not follow the procedure:

http://www.postgresql.org/docs/8.0/static/backup-online.html#BACKUP-PITR-RECOVERY

In particular, step 2 says:

: ... you need at the least to copy the contents of the pg_xlog
: subdirectory of the cluster data directory, as it may contain logs which
: were not archived before the system went down.

Possibly this needs to be highlighted a little better.

regards, tom lane


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Problem with PITR recovery
Date: 2005-04-17 03:06:17
Message-ID: 200504170306.j3H36Hr01998@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> > The problem is that we don't archive the partially written xlog file,
> > and in this case that xlog file contains the information needed to make
> > the tar file consistent.
>
> > Is this a known problem? Do we document this? If so, I can't find it.
>
> Yes, and yes. You did not follow the procedure:
>
> http://www.postgresql.org/docs/8.0/static/backup-online.html#BACKUP-PITR-RECOVERY
>
> In particular, step 2 says:
>
> : ... you need at the least to copy the contents of the pg_xlog
> : subdirectory of the cluster data directory, as it may contain logs which
> : were not archived before the system went down.
>
> Possibly this needs to be highlighted a little better.

I figured that part of the goal of PITR was that you could recover from
just the tar backup and archived WAL files --- using the pg_xlog
contents is nice, but not something we can require.

I understood the last missing WAL log would cause missing information,
but not that it would make the tar backup unusable.

It would be nice if we could force a new WAL file on pg_stop_backup()
and archive the WAL file needed to match the tar file. How hard would
that be?

I see in the docs:

To make use of this backup, you will need to keep around all the WAL
segment files generated at or after the starting time of the backup. To
aid you in doing this, the pg_stop_backup function creates a backup
history file that is immediately stored into the WAL archive area. This
file is named after the first WAL segment file that you need to have to
make use of the backup. For example, if the starting WAL file is
0000000100001234000055CD the backup history file will be named something
like 0000000100001234000055CD.007C9330.backup. (The second part of this
file name stands for an exact position within the WAL file, and can
ordinarily be ignored.) Once you have safely archived the backup dump
file, you can delete all archived WAL segments with names numerically
preceding this one.

I am not clear on what the "backup dump file" is? I assume it means
0000000100001234000055CD. It is called "WAL segment file" above. I
will rename that phrase to match the above terminology. Patch attached
and applied.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

Attachment Content-Type Size
unknown_filename text/plain 1.5 KB

From: Ragnar Hafstað <gnari(at)simnet(dot)is>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Problem with PITR recovery
Date: 2005-04-17 10:25:45
Message-ID: 1113733545.31618.32.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, 2005-04-16 at 23:06 -0400, Bruce Momjian wrote:
[about backup procedure with PITR documentation

> I see in the docs:
>
> To make use of this backup, you will need to keep around all the WAL
> segment files generated at or after the starting time of the backup. To
> aid you in doing this, the pg_stop_backup function creates a backup
> history file that is immediately stored into the WAL archive area. This
> file is named after the first WAL segment file that you need to have to
> make use of the backup. For example, if the starting WAL file is
> 0000000100001234000055CD the backup history file will be named something
> like 0000000100001234000055CD.007C9330.backup. (The second part of this
> file name stands for an exact position within the WAL file, and can
> ordinarily be ignored.) Once you have safely archived the backup dump
> file, you can delete all archived WAL segments with names numerically
> preceding this one.
>
> I am not clear on what the "backup dump file" is? I assume it means
> 0000000100001234000055CD. It is called "WAL segment file" above. I
> will rename that phrase to match the above terminology. Patch attached
> and applied.

Doesn't it refer to the backup file itself (the tar file of the data
directory) ?
You do not want to start deleting WAL segments until that one is safely
archived.

gnari


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Ragnar Hafstað <gnari(at)simnet(dot)is>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Problem with PITR recovery
Date: 2005-04-17 13:38:14
Message-ID: 200504171338.j3HDcEE23341@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Ragnar Hafsta wrote:
> On Sat, 2005-04-16 at 23:06 -0400, Bruce Momjian wrote:
> [about backup procedure with PITR documentation
>
> > I see in the docs:
> >
> > To make use of this backup, you will need to keep around all the WAL
> > segment files generated at or after the starting time of the backup. To
> > aid you in doing this, the pg_stop_backup function creates a backup
> > history file that is immediately stored into the WAL archive area. This
> > file is named after the first WAL segment file that you need to have to
> > make use of the backup. For example, if the starting WAL file is
> > 0000000100001234000055CD the backup history file will be named something
> > like 0000000100001234000055CD.007C9330.backup. (The second part of this
> > file name stands for an exact position within the WAL file, and can
> > ordinarily be ignored.) Once you have safely archived the backup dump
> > file, you can delete all archived WAL segments with names numerically
> > preceding this one.
> >
> > I am not clear on what the "backup dump file" is? I assume it means
> > 0000000100001234000055CD. It is called "WAL segment file" above. I
> > will rename that phrase to match the above terminology. Patch attached
> > and applied.
>
> Doesn't it refer to the backup file itself (the tar file of the data
> directory) ?

No. That is what I thought it meant on first reading, but looking
closer it is referring to the numbered file, and the tar file has no
specific number.

> You do not want to start deleting WAL segments until that one is safely
> archived.

Right, but the point of the paragraph is that you need the WAL file that
goes with the backup history file number.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Problem with PITR recovery
Date: 2005-04-18 01:38:26
Message-ID: 200504180138.j3I1cQ827153@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Bruce Momjian wrote:
> I figured that part of the goal of PITR was that you could recover from
> just the tar backup and archived WAL files --- using the pg_xlog
> contents is nice, but not something we can require.
>
> I understood the last missing WAL log would cause missing information,
> but not that it would make the tar backup unusable.
>
> It would be nice if we could force a new WAL file on pg_stop_backup()
> and archive the WAL file needed to match the tar file. How hard would
> that be?
>
> I see in the docs:
>
> To make use of this backup, you will need to keep around all the WAL
> segment files generated at or after the starting time of the backup. To
> aid you in doing this, the pg_stop_backup function creates a backup
> history file that is immediately stored into the WAL archive area. This
> file is named after the first WAL segment file that you need to have to
> make use of the backup. For example, if the starting WAL file is
> 0000000100001234000055CD the backup history file will be named something
> like 0000000100001234000055CD.007C9330.backup. (The second part of this
> file name stands for an exact position within the WAL file, and can
> ordinarily be ignored.) Once you have safely archived the backup dump
> file, you can delete all archived WAL segments with names numerically
> preceding this one.
>
> I am not clear on what the "backup dump file" is? I assume it means
> 0000000100001234000055CD. It is called "WAL segment file" above. I
> will rename that phrase to match the above terminology. Patch attached
> and applied.

I found that the docs mention above are inaccurate because they state
you only need the WAL segment used at the start of the file system
backup, while you really need all the WAL segments used _during_ the
backup before you can safely delete the older WAL segments. Here is
updated text I have applied to HEAD and 8.0.X:

Once you have safely archived the WAL segment files used during the file
system backup (as specified in the backup history file), you can delete
all archived WAL segments with names numerically less. Keep in mind that
only completed WAL segment files are archived, so there will be delay
between running pg_stop_backup and the archiving of all WAL segment
files needed to make the file system backup consistent.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Jeff Davis <jdavis-pgsql(at)empires(dot)org>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Problem with PITR recovery
Date: 2005-04-18 03:25:53
Message-ID: 1113794753.7212.35.camel@jeff
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


I could still use a little clarification. It seems sort of like there is
an extra step, like:

(1) start archiving
(2) pg_start_backup()
(3) copy PGDATA directory with tar
(4) pg_stop_backup()
(5) ??

And the text you have at
http://candle.pha.pa.us/main/writings/pgsql/sgml/backup-online.html

says: "To make use of this backup, you will need to keep around all the
WAL segment files generated during and after the file system backup.".

How long after? Wouldn't you be keeping the WAL segments afterward
anyway by archiving?

I've tested and been able to recover using PITR before, but I'd like a
little clarification on the steps to make absolutely sure that the base
backup I have is viable.

Can you sort of run through the failure case again, and how to prevent
it?

Regards,
Jeff Davis

On Sun, 2005-04-17 at 21:38 -0400, Bruce Momjian wrote:
> Bruce Momjian wrote:
> > I figured that part of the goal of PITR was that you could recover from
> > just the tar backup and archived WAL files --- using the pg_xlog
> > contents is nice, but not something we can require.
> >
> > I understood the last missing WAL log would cause missing information,
> > but not that it would make the tar backup unusable.
> >
> > It would be nice if we could force a new WAL file on pg_stop_backup()
> > and archive the WAL file needed to match the tar file. How hard would
> > that be?
> >
> > I see in the docs:
> >
> > To make use of this backup, you will need to keep around all the WAL
> > segment files generated at or after the starting time of the backup. To
> > aid you in doing this, the pg_stop_backup function creates a backup
> > history file that is immediately stored into the WAL archive area. This
> > file is named after the first WAL segment file that you need to have to
> > make use of the backup. For example, if the starting WAL file is
> > 0000000100001234000055CD the backup history file will be named something
> > like 0000000100001234000055CD.007C9330.backup. (The second part of this
> > file name stands for an exact position within the WAL file, and can
> > ordinarily be ignored.) Once you have safely archived the backup dump
> > file, you can delete all archived WAL segments with names numerically
> > preceding this one.
> >
> > I am not clear on what the "backup dump file" is? I assume it means
> > 0000000100001234000055CD. It is called "WAL segment file" above. I
> > will rename that phrase to match the above terminology. Patch attached
> > and applied.
>
> I found that the docs mention above are inaccurate because they state
> you only need the WAL segment used at the start of the file system
> backup, while you really need all the WAL segments used _during_ the
> backup before you can safely delete the older WAL segments. Here is
> updated text I have applied to HEAD and 8.0.X:
>
> Once you have safely archived the WAL segment files used during the file
> system backup (as specified in the backup history file), you can delete
> all archived WAL segments with names numerically less. Keep in mind that
> only completed WAL segment files are archived, so there will be delay
> between running pg_stop_backup and the archiving of all WAL segment
> files needed to make the file system backup consistent.
>


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Jeff Davis <jdavis-pgsql(at)empires(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Problem with PITR recovery
Date: 2005-04-18 04:20:40
Message-ID: 200504180420.j3I4KeS21339@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Jeff Davis wrote:
>
> I could still use a little clarification. It seems sort of like there is
> an extra step, like:
>
> (1) start archiving
> (2) pg_start_backup()
> (3) copy PGDATA directory with tar
> (4) pg_stop_backup()
> (5) ??
>
> And the text you have at
> http://candle.pha.pa.us/main/writings/pgsql/sgml/backup-online.html
>
> says: "To make use of this backup, you will need to keep around all the
> WAL segment files generated during and after the file system backup.".
>
> How long after? Wouldn't you be keeping the WAL segments afterward
> anyway by archiving?
>
> I've tested and been able to recover using PITR before, but I'd like a
> little clarification on the steps to make absolutely sure that the base
> backup I have is viable.
>
> Can you sort of run through the failure case again, and how to prevent
> it?

The failure case in the original docs is that you do your
pg_stop_backup(), and then delete all the WAL file before the *.backup
file that was just created. However, you do not have a valid tar backup
until you have archived all the WAL files used from the *.backup WAL
file up to the WAL file that was active at pg_stop_backup(), which is
mentioned in the *.backup file. If you went and deleted your old WAL
files anyway, without waiting for those other WAL files to be archived,
and your disk drive crashed, you wouldn't have a tar backup you could
use, and you had deleted the old WAL files you would have needed to
recover your previous tar backup.

Is there something in the current wording that needs clarification?

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Jeff Davis <jdavis-pgsql(at)empires(dot)org>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Problem with PITR recovery
Date: 2005-04-18 06:28:53
Message-ID: 1113805733.10921.33.camel@jeff
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, 2005-04-18 at 00:20 -0400, Bruce Momjian wrote:
> Jeff Davis wrote:
> >
> > Can you sort of run through the failure case again, and how to prevent
> > it?
>
> The failure case in the original docs is that you do your
> pg_stop_backup(), and then delete all the WAL file before the *.backup
> file that was just created. However, you do not have a valid tar backup
> until you have archived all the WAL files used from the *.backup WAL
> file up to the WAL file that was active at pg_stop_backup(), which is
> mentioned in the *.backup file. If you went and deleted your old WAL
> files anyway, without waiting for those other WAL files to be archived,
> and your disk drive crashed, you wouldn't have a tar backup you could
> use, and you had deleted the old WAL files you would have needed to
> recover your previous tar backup.
>
> Is there something in the current wording that needs clarification?
>

So, as I understand it: everything works great as long as everything has
been archived up to and including the WAL file that was active when you
did pg_stop_backup(). However, if you do pg_stop_backup() and
immediately delete PGDATA (before any WAL files are archived), the
backup may fail.

I think, to clear it up a little, you might add a step 5 before saying
"If this returns successfully, you're done.", so that people know for
sure that they get a good base backup. It actually seems like something
that maybe pg_stop_backup() should do in the future.

It's a little unclear how you tell which WAL segment was active during
pg_stop_backup(), but that shouldn't be a practical concern since you
can just manually archive them all.

Maybe step 5 could be something like:
(5) Make a copy of all WAL segments above XXXX.backup and store with the
base backup. When it's time to recover, if those WAL segments were not
properly archived, you need to have them available.

(probably needs rewording)

Regards,
Jeff Davis


From: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: Jeff Davis <jdavis-pgsql(at)empires(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Problem with PITR recovery
Date: 2005-04-18 07:39:02
Message-ID: Pine.GSO.4.62.0504181135010.16872@ra.sai.msu.su
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, 18 Apr 2005, Bruce Momjian wrote:

> Jeff Davis wrote:
>>
>> I could still use a little clarification. It seems sort of like there is
>> an extra step, like:
>>
>> (1) start archiving
>> (2) pg_start_backup()
>> (3) copy PGDATA directory with tar
>> (4) pg_stop_backup()
>> (5) ??
>>
>> And the text you have at
>> http://candle.pha.pa.us/main/writings/pgsql/sgml/backup-online.html
>>
>> says: "To make use of this backup, you will need to keep around all the
>> WAL segment files generated during and after the file system backup.".
>>
>> How long after? Wouldn't you be keeping the WAL segments afterward
>> anyway by archiving?
>>
>> I've tested and been able to recover using PITR before, but I'd like a
>> little clarification on the steps to make absolutely sure that the base
>> backup I have is viable.
>>
>> Can you sort of run through the failure case again, and how to prevent
>> it?
>
> The failure case in the original docs is that you do your
> pg_stop_backup(), and then delete all the WAL file before the *.backup
> file that was just created. However, you do not have a valid tar backup
> until you have archived all the WAL files used from the *.backup WAL
> file up to the WAL file that was active at pg_stop_backup(), which is
> mentioned in the *.backup file. If you went and deleted your old WAL
> files anyway, without waiting for those other WAL files to be archived,
> and your disk drive crashed, you wouldn't have a tar backup you could
> use, and you had deleted the old WAL files you would have needed to
> recover your previous tar backup.
>
> Is there something in the current wording that needs clarification?

I'd say it's very not cool :) It's not we all expected from PITR.
I recall now Simon mentioned about that and have it in his TODO.
Other thing I don't understand what's the problem to generate WAL file
by demand ? Probably, TODO should says about this.

>
>

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Jeff Davis <jdavis-pgsql(at)empires(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Problem with PITR recovery
Date: 2005-04-18 13:22:14
Message-ID: 200504181322.j3IDMEX11712@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Jeff Davis wrote:
> On Mon, 2005-04-18 at 00:20 -0400, Bruce Momjian wrote:
> > Jeff Davis wrote:
> > >
> > > Can you sort of run through the failure case again, and how to prevent
> > > it?
> >
> > The failure case in the original docs is that you do your
> > pg_stop_backup(), and then delete all the WAL file before the *.backup
> > file that was just created. However, you do not have a valid tar backup
> > until you have archived all the WAL files used from the *.backup WAL
> > file up to the WAL file that was active at pg_stop_backup(), which is
> > mentioned in the *.backup file. If you went and deleted your old WAL
> > files anyway, without waiting for those other WAL files to be archived,
> > and your disk drive crashed, you wouldn't have a tar backup you could
> > use, and you had deleted the old WAL files you would have needed to
> > recover your previous tar backup.
> >
> > Is there something in the current wording that needs clarification?
> >
>
> So, as I understand it: everything works great as long as everything has
> been archived up to and including the WAL file that was active when you
> did pg_stop_backup(). However, if you do pg_stop_backup() and
> immediately delete PGDATA (before any WAL files are archived), the
> backup may fail.

Right, and that is the issue that wasn't documented before, and I was
even unclear about it myself when testing initially.

> I think, to clear it up a little, you might add a step 5 before saying
> "If this returns successfully, you're done.", so that people know for

I see your point. New text is:

4 Again connect to the database as a superuser, and issue the command

SELECT pg_stop_backup();

This should return successfully.

5 Once the WAL segment files used during the backup are archived as
part of normal database activity, you are done.

> sure that they get a good base backup. It actually seems like something
> that maybe pg_stop_backup() should do in the future.

Yes, I added that to the TODO list:

* Force archiving of partially-full WAL files when pg_stop_backup() is
called or the server is stopped

Doing this will allow administrators to know more easily when the
archive contins all the files needed for point-in-time recovery.

> It's a little unclear how you tell which WAL segment was active during
> pg_stop_backup(), but that shouldn't be a practical concern since you
> can just manually archive them all.

We do have this sentence:

Once you have safely archived the WAL segment files used during the file
system backup (as specified in the backup history file), you can delete
all archived WAL segments with names numerically less.

The information is actually in the *.backup file. I think that is the
only way to know.

And you can't manually copy the WAL files to the archive because they
aren't full and the recommended archive_command will fail if those files
are already in the archive. You could copy them off somewhere else, I
suppose.

> Maybe step 5 could be something like:
> (5) Make a copy of all WAL segments above XXXX.backup and store with the
> base backup. When it's time to recover, if those WAL segments were not
> properly archived, you need to have them available.

Again, that doesn't work because of the "no overwrite" behavior of the
archive_command.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
Cc: Jeff Davis <jdavis-pgsql(at)empires(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Problem with PITR recovery
Date: 2005-04-18 14:06:33
Message-ID: 200504181406.j3IE6XF19161@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Oleg Bartunov wrote:
> > Is there something in the current wording that needs clarification?
>
> I'd say it's very not cool :) It's not we all expected from PITR.
> I recall now Simon mentioned about that and have it in his TODO.
> Other thing I don't understand what's the problem to generate WAL file
> by demand ? Probably, TODO should says about this.

Yes, we have TODO items for that and I added another one yesterday.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Greg Stark <gsstark(at)mit(dot)edu>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Problem with PITR recovery
Date: 2005-04-18 14:48:59
Message-ID: 87vf6kkun8.fsf@stark.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:

> I see your point. New text is:
>
> 4 Again connect to the database as a superuser, and issue the command
>
> SELECT pg_stop_backup();
>
> This should return successfully.
>
> 5 Once the WAL segment files used during the backup are archived as
> part of normal database activity, you are done.
>
> > sure that they get a good base backup. It actually seems like something
> > that maybe pg_stop_backup() should do in the future.
>
> Yes, I added that to the TODO list:
>
> * Force archiving of partially-full WAL files when pg_stop_backup() is
> called or the server is stopped

You could even make pg_stop_backup() hang until that's complete.

--
greg


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Problem with PITR recovery
Date: 2005-04-18 15:08:30
Message-ID: 200504181508.j3IF8UB29044@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Greg Stark wrote:
> Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
>
> > I see your point. New text is:
> >
> > 4 Again connect to the database as a superuser, and issue the command
> >
> > SELECT pg_stop_backup();
> >
> > This should return successfully.
> >
> > 5 Once the WAL segment files used during the backup are archived as
> > part of normal database activity, you are done.
> >
> > > sure that they get a good base backup. It actually seems like something
> > > that maybe pg_stop_backup() should do in the future.
> >
> > Yes, I added that to the TODO list:
> >
> > * Force archiving of partially-full WAL files when pg_stop_backup() is
> > called or the server is stopped
>
> You could even make pg_stop_backup() hang until that's complete.

You mean don't force the archive copy but just have pg_stop_backup()
hang until the files fill? Yea, we could do that, but there is no way
to know how long the hang might take.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Greg Stark <gsstark(at)mit(dot)edu>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: Greg Stark <gsstark(at)mit(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Problem with PITR recovery
Date: 2005-04-18 15:29:38
Message-ID: 87k6n0ksrh.fsf@stark.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:

> You mean don't force the archive copy but just have pg_stop_backup()
> hang until the files fill? Yea, we could do that, but there is no way
> to know how long the hang might take.

Actually I meant both.

--
greg


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: Ragnar Hafstað <gnari(at)simnet(dot)is>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Problem with PITR recovery
Date: 2005-04-18 16:04:09
Message-ID: 22321.1113840249@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> Ragnar Hafsta wrote:
>> On Sat, 2005-04-16 at 23:06 -0400, Bruce Momjian wrote:
>>> I am not clear on what the "backup dump file" is? I assume it means
>>> 0000000100001234000055CD. It is called "WAL segment file" above. I
>>> will rename that phrase to match the above terminology. Patch attached
>>> and applied.
>>
>> Doesn't it refer to the backup file itself (the tar file of the data
>> directory) ?

> No. That is what I thought it meant on first reading, but looking
> closer it is referring to the numbered file, and the tar file has no
> specific number.

Yes, that is exactly what it meant, and your patch has destroyed the
meaning.

regards, tom lane


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Ragnar Hafstað <gnari(at)simnet(dot)is>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Problem with PITR recovery
Date: 2005-04-18 16:59:54
Message-ID: 200504181659.j3IGxs226388@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> > Ragnar Hafsta wrote:
> >> On Sat, 2005-04-16 at 23:06 -0400, Bruce Momjian wrote:
> >>> I am not clear on what the "backup dump file" is? I assume it means
> >>> 0000000100001234000055CD. It is called "WAL segment file" above. I
> >>> will rename that phrase to match the above terminology. Patch attached
> >>> and applied.
> >>
> >> Doesn't it refer to the backup file itself (the tar file of the data
> >> directory) ?
>
> > No. That is what I thought it meant on first reading, but looking
> > closer it is referring to the numbered file, and the tar file has no
> > specific number.
>
> Yes, that is exactly what it meant, and your patch has destroyed the
> meaning.

The sentence was:

Once you have safely archived the backup dump file, you can delete all
archived WAL segments with names numerically preceding this one.

so you were saying:

Once you have safely archived the file system backup, you can delete all
archived WAL segments with names numerically preceding this one.

I guess I didn't see the connection between the file system backup and
the WAL files, when in fact you need the WAL files that go with the file
system badckup to do the recovery. Do you have new suggested text?

The current text version is in CVS.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: Ragnar Hafstað <gnari(at)simnet(dot)is>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Problem with PITR recovery
Date: 2005-04-18 17:23:26
Message-ID: 23806.1113845006@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> I guess I didn't see the connection between the file system backup and
> the WAL files, when in fact you need the WAL files that go with the file
> system badckup to do the recovery. Do you have new suggested text?

I think it probably needs to mention *both* the tar dump and the WAL
segment file(s). I can take a whack at it if you like.

regards, tom lane


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Ragnar Hafstað <gnari(at)simnet(dot)is>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Problem with PITR recovery
Date: 2005-04-18 17:41:16
Message-ID: 200504181741.j3IHfGM03663@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> > I guess I didn't see the connection between the file system backup and
> > the WAL files, when in fact you need the WAL files that go with the file
> > system badckup to do the recovery. Do you have new suggested text?
>
> I think it probably needs to mention *both* the tar dump and the WAL
> segment file(s). I can take a whack at it if you like.

I modified the sentence to say:

Once you have safely archived the file system backup and the WAL segment
files used during the backup (as specified in the backup history file),
you can delete all archived WAL segments with names numerically less.

Feel free to whack it a second time.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Ragnar Hafstað <gnari(at)simnet(dot)is>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Problem with PITR recovery
Date: 2005-04-18 22:38:52
Message-ID: 1113863932.16721.2055.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, 2005-04-18 at 13:41 -0400, Bruce Momjian wrote:
> Tom Lane wrote:
> > Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> > > I guess I didn't see the connection between the file system backup and
> > > the WAL files, when in fact you need the WAL files that go with the file
> > > system badckup to do the recovery. Do you have new suggested text?
> >
> > I think it probably needs to mention *both* the tar dump and the WAL
> > segment file(s). I can take a whack at it if you like.
>
> I modified the sentence to say:
>
> Once you have safely archived the file system backup and the WAL segment
> files used during the backup (as specified in the backup history file),
> you can delete all archived WAL segments with names numerically less.
>
> Feel free to whack it a second time.

whack...

...you can delete all archived WAL segments with names numerically
less.

but I'm not sure it's best practice to delete them at that point. I
would recommend that users keep at least the last 3 backups. So, I'd
prefer the wording

...all archived WAL segments with names numerically less will no longer
be needed as part of that backup set. You may delete them at that point,
though you should consider keeping more than one backup set to be
absolutely certain that you are can recover your data.

Best Regards, Simon Riggs


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Ragnar Hafstað <gnari(at)simnet(dot)is>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Problem with PITR recovery
Date: 2005-04-19 01:39:25
Message-ID: 200504190139.j3J1dPG26995@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Simon Riggs wrote:
> On Mon, 2005-04-18 at 13:41 -0400, Bruce Momjian wrote:
> > Tom Lane wrote:
> > > Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> > > > I guess I didn't see the connection between the file system backup and
> > > > the WAL files, when in fact you need the WAL files that go with the file
> > > > system badckup to do the recovery. Do you have new suggested text?
> > >
> > > I think it probably needs to mention *both* the tar dump and the WAL
> > > segment file(s). I can take a whack at it if you like.
> >
> > I modified the sentence to say:
> >
> > Once you have safely archived the file system backup and the WAL segment
> > files used during the backup (as specified in the backup history file),
> > you can delete all archived WAL segments with names numerically less.
> >
> > Feel free to whack it a second time.
>
> whack...
>
> ...you can delete all archived WAL segments with names numerically
> less.
>
> but I'm not sure it's best practice to delete them at that point. I
> would recommend that users keep at least the last 3 backups. So, I'd
> prefer the wording
>
> ...all archived WAL segments with names numerically less will no longer
> be needed as part of that backup set. You may delete them at that point,
> though you should consider keeping more than one backup set to be
> absolutely certain that you are can recover your data.

OK, new wording:

Once you have safely archived the file system backup and the WAL segment
files used during the backup (as specified in the backup history file),
all archived WAL segments with names numerically less are no longer
needed to recover the file system backup and may be deleted. However,
you should consider keeping several backup sets to be absolutely certain
that you are can recover your data. Keep in mind that only completed WAL
segment files are archived, so there will be delay between running
<function>pg_stop_backup</> and the archiving of all WAL segment files
needed to make the file system backup consistent.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Ragnar Hafstað <gnari(at)simnet(dot)is>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Problem with PITR recovery
Date: 2005-04-19 04:55:33
Message-ID: Pine.GSO.4.62.0504190853030.4405@ra.sai.msu.su
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, 18 Apr 2005, Simon Riggs wrote:

> On Mon, 2005-04-18 at 13:41 -0400, Bruce Momjian wrote:
>> Tom Lane wrote:
>>> Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
>>>> I guess I didn't see the connection between the file system backup and
>>>> the WAL files, when in fact you need the WAL files that go with the file
>>>> system badckup to do the recovery. Do you have new suggested text?
>>>
>>> I think it probably needs to mention *both* the tar dump and the WAL
>>> segment file(s). I can take a whack at it if you like.
>>
>> I modified the sentence to say:
>>
>> Once you have safely archived the file system backup and the WAL segment
>> files used during the backup (as specified in the backup history file),
>> you can delete all archived WAL segments with names numerically less.
>>
>> Feel free to whack it a second time.
>
> whack...
>
> ...you can delete all archived WAL segments with names numerically
> less.
>
> but I'm not sure it's best practice to delete them at that point. I
> would recommend that users keep at least the last 3 backups. So, I'd
> prefer the wording
>
> ...all archived WAL segments with names numerically less will no longer
> be needed as part of that backup set. You may delete them at that point,
> though you should consider keeping more than one backup set to be
> absolutely certain that you are can recover your data.

I see that clear and deterministic procedure of online backup as I imagined
earlier becomes fuzzy and blurred :) This is obviously not suited even
for my notebook.

>
> Best Regards, Simon Riggs
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/docs/faq
>

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83


From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
Cc: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Ragnar Hafstað <gnari(at)simnet(dot)is>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Problem with PITR recovery
Date: 2005-04-19 08:06:16
Message-ID: 1113897976.16721.2128.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, 2005-04-19 at 08:55 +0400, Oleg Bartunov wrote:
> On Mon, 18 Apr 2005, Simon Riggs wrote:
> > but I'm not sure it's best practice to delete them at that point. I
> > would recommend that users keep at least the last 3 backups. So, I'd
> > prefer the wording
> >
> > ...all archived WAL segments with names numerically less will no longer
> > be needed as part of that backup set. You may delete them at that point,
> > though you should consider keeping more than one backup set to be
> > absolutely certain that you are can recover your data.
>
> I see that clear and deterministic procedure of online backup as I imagined
> earlier becomes fuzzy and blurred :)

The process is involved and requires strictly observed administration
procedures, just as it does with other database systems. Each of them
have difficulties that need to be surmounted and require much thought to
implement. If PostgreSQL is the first DBMS on which you have attempted
to implement transactional archive recovery then you will definitely
find it hard, just as most Oracle and SQLServer DBAs don't understand
how their log recovery systems work either.

> This is obviously not suited even
> for my notebook.

Thats a pretty silly comment Oleg.

Since most laptops require portability as the main objective and that
usually requires or at least must frequently expect disconnection from
networks and other peripheral devices such as tape units, then no, the
PITR design isn't suitable in general for laptop use. If you use your
notebook as a production system with online archiving then PITR is
suitable.

PITR was designed to offer data protection for major production systems.
My experience was that these sites would have a reasonable stream of
transactions coming through, making the time between log file switches
somewhat predictable and usually every few minutes. The use case of a
very low transaction rate system was not considered fully since it was
felt that people in that situation would be less bothered to protect
their data with a rigorous backup procedure, leaving the issue we have
been discussing.

If you want recoverability, use PITR. If you choose not to use PITR,
thats fine. If you'd like to help make it better, that's fine too.

Best Regards, Simon Riggs


From: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Ragnar Hafstað <gnari(at)simnet(dot)is>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Problem with PITR recovery
Date: 2005-04-19 11:23:37
Message-ID: Pine.GSO.4.62.0504191449340.4405@ra.sai.msu.su
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, 19 Apr 2005, Simon Riggs wrote:

> On Tue, 2005-04-19 at 08:55 +0400, Oleg Bartunov wrote:
>> On Mon, 18 Apr 2005, Simon Riggs wrote:
>>> but I'm not sure it's best practice to delete them at that point. I
>>> would recommend that users keep at least the last 3 backups. So, I'd
>>> prefer the wording
>>>
>>> ...all archived WAL segments with names numerically less will no longer
>>> be needed as part of that backup set. You may delete them at that point,
>>> though you should consider keeping more than one backup set to be
>>> absolutely certain that you are can recover your data.
>>
>> I see that clear and deterministic procedure of online backup as I imagined
>> earlier becomes fuzzy and blurred :)
>
> The process is involved and requires strictly observed administration
> procedures, just as it does with other database systems. Each of them
> have difficulties that need to be surmounted and require much thought to
> implement. If PostgreSQL is the first DBMS on which you have attempted
> to implement transactional archive recovery then you will definitely
> find it hard, just as most Oracle and SQLServer DBAs don't understand
> how their log recovery systems work either.

This is not an argument ! It's shame we still don't understand do we really
have reliable online backup or just hype with a lot of restriction and
caution. I'm not experienced Oracle DBA but I don't want to be a blind user.
I read seminal papers about recovery and I thought I understand how
it should works in our system. I want to be 110% sure to claim we're
ready to recommend it to our clients. I'm sure there are many experienced
DBA's who also don't understand what we have right now, especially after
this thread.

>
>> This is obviously not suited even
>> for my notebook.
>
> Thats a pretty silly comment Oleg.
>

Don't be silly, Simon. It was just my reaction !

> Since most laptops require portability as the main objective and that
> usually requires or at least must frequently expect disconnection from
> networks and other peripheral devices such as tape units, then no, the
> PITR design isn't suitable in general for laptop use. If you use your
> notebook as a production system with online archiving then PITR is
> suitable.
>
> PITR was designed to offer data protection for major production systems.
> My experience was that these sites would have a reasonable stream of
> transactions coming through, making the time between log file switches
> somewhat predictable and usually every few minutes. The use case of a
> very low transaction rate system was not considered fully since it was
> felt that people in that situation would be less bothered to protect
> their data with a rigorous backup procedure, leaving the issue we have
> been discussing.
>
> If you want recoverability, use PITR. If you choose not to use PITR,
> thats fine. If you'd like to help make it better, that's fine too.
>

These sentences are not fair, Simon. I understand your point but I want
to have postgresql applicable not just for major production systems.
You forget that before production stage you have a lot of development and
testing. I don't want something exotical and I'm a bit surprized
about your reaction. I don't want to think about how difficult backup in
Oracle and other major dbms you're so experienced ! I'm PostgreSQL user
and PostgreSQL is rather transparent system and I'd like to have
understandable recovery process. Now I see all limitations and cautions and
waiting for improvements. Nobody attack you, I'm a bit dissapointed, but
this is what we have.

> Best Regards, Simon Riggs
>
>

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83


From: Jeff Davis <jdavis-pgsql(at)empires(dot)org>
To: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Ragnar Hafstað <gnari(at)simnet(dot)is>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Problem with PITR recovery
Date: 2005-04-19 17:32:25
Message-ID: 1113931945.10921.90.camel@jeff
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, 2005-04-19 at 15:23 +0400, Oleg Bartunov wrote:

> This is not an argument ! It's shame we still don't understand do we really
> have reliable online backup or just hype with a lot of restriction and
> caution. I'm not experienced Oracle DBA but I don't want to be a blind user.
> I read seminal papers about recovery and I thought I understand how
> it should works in our system. I want to be 110% sure to claim we're
> ready to recommend it to our clients. I'm sure there are many experienced
> DBA's who also don't understand what we have right now, especially after
> this thread.
>

Unless I misunderstand something, I think you're overreacting a bit. The
failure case is that the machine on which the database resides vaporizes
after you've done "pg_stop_backup()" but before the archiver archives
the WAL segments used during the backup procedure.

In practice, there are many reasons why that is not a major problem. For
example, PITR base backups are often going to be taken when the archiver
is already archiving WAL segments, and you already have a previous,
working bask backup. You'd still be able to use that old base backup and
the newly archived WAL segments.

In general, it's just not realistic that you take a machine from having
no backups of any kind to running mission-critical transactions and
depending solely on the PITR backup, and then watch the server vaporize,
all in less time than it takes to archive a few WAL segments.

In almost all cases, the loss in data would be comparable to the loss
experienced by not having the last few WAL segments shipped, and PITR
never made a promise of keeping the transactions that never got
archived.

PITR works, and the developers are:
(1) Improving the current docs to make it absolutely clear how to make
100% assured backups.
(2) Making PITR easier to administer, probably for 8.1.
(3) Adding features to PITR, probably for 8.1.

If what I said above is incorrect, please correct me, because that means
that I'm one of the lost DBAs that Oleg is talking about.

Regards,
Jeff Davis


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Jeff Davis <jdavis-pgsql(at)empires(dot)org>
Cc: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Ragnar Hafstað <gnari(at)simnet(dot)is>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Problem with PITR recovery
Date: 2005-04-19 17:38:52
Message-ID: 200504191738.j3JHcqk05505@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Jeff Davis wrote:
> Unless I misunderstand something, I think you're overreacting a bit. The
> failure case is that the machine on which the database resides vaporizes
> after you've done "pg_stop_backup()" but before the archiver archives
> the WAL segments used during the backup procedure.
>
> In practice, there are many reasons why that is not a major problem. For
> example, PITR base backups are often going to be taken when the archiver
> is already archiving WAL segments, and you already have a previous,
> working bask backup. You'd still be able to use that old base backup and
> the newly archived WAL segments.
>
> In general, it's just not realistic that you take a machine from having
> no backups of any kind to running mission-critical transactions and
> depending solely on the PITR backup, and then watch the server vaporize,
> all in less time than it takes to archive a few WAL segments.
>
> In almost all cases, the loss in data would be comparable to the loss
> experienced by not having the last few WAL segments shipped, and PITR
> never made a promise of keeping the transactions that never got
> archived.
>
> PITR works, and the developers are:
> (1) Improving the current docs to make it absolutely clear how to make
> 100% assured backups.
> (2) Making PITR easier to administer, probably for 8.1.
> (3) Adding features to PITR, probably for 8.1.

You are right. The problem we really had was that the documentation
didn't mention the restrictions, and it said you could remove the old
archived WAL files once you did pg_stop_backup(). That has been
corrected and the new documentation will be in 8.0.3. I will mention
the PITR documentation clarification in the release notes for 8.0.3.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
To: Jeff Davis <jdavis-pgsql(at)empires(dot)org>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Ragnar Hafstað <gnari(at)simnet(dot)is>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Problem with PITR recovery
Date: 2005-04-19 18:03:56
Message-ID: Pine.GSO.4.62.0504192200100.28522@ra.sai.msu.su
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, 19 Apr 2005, Jeff Davis wrote:

>
> Unless I misunderstand something, I think you're overreacting a bit. The

Y're right. It's all emotions :)

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83