Re: PITR Recovery Question

Lists: pgsql-adminpgsql-hackers
From: "Gnanakumar" <gnanam(at)zoniac(dot)com>
To: <pgsql-hackers(at)postgresql(dot)org>
Subject: PITR Recovery Question
Date: 2010-06-03 13:37:07
Message-ID: 001201cb0321$df429d40$9dc7d7c0$@com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin pgsql-hackers

Hi,

My production server is running PostgreSQL v8.2.3 on CentOS release 5.2
(Final).

I've setup PITR in my production server. For some reason, after setting up
PITR, we're not able to manage and maintain it. Because of this our WAL
archive drive become full (100% use) approximately after 1 month.

PITR SETUP DETAILS
We've 2 drives. Primary drive (pgsql/data/ directory resides) is 400 GB and
secondary drive (WAL archive) is 30 GB. All WAL archives are written to
secondary drive.

Base backup taken on: Aug03, 2009
WAL archive drive become full (100% use) on: Sep05, 2009

Because this WAL archive drive has become full, all WAL archive segments to
be archived are accumulated into pg_xlog/ directory itself. Eventually, 9
months (as of today from Sep05, 2009) of WAL archives are residing in
pg_xlog/ directory.

My question is, in case if I would like to perform recovery process as it is
in this situation, will this work out? That is, I'm seeing/finding out
whether recovery process would perform successfully anywhere between the
date range Aug03, 2009 (my base backup date) and as of today - Jun03, 2009.
Reason I'm asking this is still all my WAL archives are residing in pg_xlog/
directory.

Experts advice/idea/suggestion on this appreciated.

Regards,
Gnanam


From: Florian Pflug <fgp(at)phlo(dot)org>
To: <gnanam(at)zoniac(dot)com>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PITR Recovery Question
Date: 2010-06-03 15:19:30
Message-ID: 0CC7BE82-C693-46EF-AE02-2E4E524F5527@phlo.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin pgsql-hackers

Hi,

I'll try to answer your questions below, but in the future please post questions concerning the usage and administration of postgres to pgsql-general or pgsql-admin. This list focus is the development of new features and bugfixes.

On Jun 3, 2010, at 15:37 , Gnanakumar wrote:
> PITR SETUP DETAILS
> We've 2 drives. Primary drive (pgsql/data/ directory resides) is 400 GB and
> secondary drive (WAL archive) is 30 GB. All WAL archives are written to
> secondary drive.
>
> Base backup taken on: Aug03, 2009
> WAL archive drive become full (100% use) on: Sep05, 2009
>
> Because this WAL archive drive has become full, all WAL archive segments to
> be archived are accumulated into pg_xlog/ directory itself. Eventually, 9
> months (as of today from Sep05, 2009) of WAL archives are residing in
> pg_xlog/ directory.

This is by design. WAL logs are only removed from pg_xlog once they have been archived successfully. Since your archive_command fails due to the disk being full, they remain in pg_xlog. Once you enlarge the filesystem holding the WAL archive they should be copied and subsequently removed from pg_xlog.

Note that you'd usually take a new base backup once in a while to limit the number of WAL segments you need to retain. You can take a base backup while postgres is running by issuing pg_start_backup() before you start the copy and pg_stop_backup() after it finished. Apart from creating additional IO load, doing so won't interfere with normal query execution in any way.

> My question is, in case if I would like to perform recovery process as it is
> in this situation, will this work out? That is, I'm seeing/finding out
> whether recovery process would perform successfully anywhere between the
> date range Aug03, 2009 (my base backup date) and as of today - Jun03, 2009.
> Reason I'm asking this is still all my WAL archives are residing in pg_xlog/
> directory.

For PITR, you'll obviously need the WAL segment starting from the time your base backup started up until the point you want to recover to. If some of those WAL segments still reside in pg_xlog, you'll either need to teach your restore_command to fetch them from there. Note that you cannot recover "in reverse". To recover up to a certain point in time you always need to start from a base backup taken *before* that time.

best regards,
Florian Pflug


From: "Gnanakumar" <gnanam(at)zoniac(dot)com>
To: "'Florian Pflug'" <fgp(at)phlo(dot)org>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PITR Recovery Question
Date: 2010-06-04 05:05:45
Message-ID: 007d01cb03a3$99bd3e60$cd37bb20$@com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin pgsql-hackers

Hi Florian,

Sure. I'll post my future questions either in pgsql-general or pgsql-admin.

> If some of those WAL segments still reside in pg_xlog, you'll either need
to teach your restore_command to fetch them from there. Note that you cannot
recover "in reverse".

My pg_xlog/ and walarchive/ directory locations are
"/usr/local/pgsql/data/pg_xlog" and "/mnt/pitr/walarchive" respectively.

If my normal restore command is: restore_command='cp
"/mnt/pitr/walarchive/%f "%p"', how should I instruct restore command to
fetch? Should I just replace this with something like restore_command='cp
/usr/local/pgsql/data/pg_xlog/%f "%p"'. Also you have mentioned that we
cannot recover "in reverse", what I understand from this is that even though
if I replace the restore command pointing to pg_xlog/ directory, this will
not work out in this situation? Is my understanding right?

-----Original Message-----
From: Florian Pflug [mailto:fgp(at)phlo(dot)org]
Sent: Thursday, June 03, 2010 8:50 PM
To: gnanam(at)zoniac(dot)com
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] PITR Recovery Question

Hi,

I'll try to answer your questions below, but in the future please post
questions concerning the usage and administration of postgres to
pgsql-general or pgsql-admin. This list focus is the development of new
features and bugfixes.

On Jun 3, 2010, at 15:37 , Gnanakumar wrote:
> PITR SETUP DETAILS
> We've 2 drives. Primary drive (pgsql/data/ directory resides) is 400 GB
and
> secondary drive (WAL archive) is 30 GB. All WAL archives are written to
> secondary drive.
>
> Base backup taken on: Aug03, 2009
> WAL archive drive become full (100% use) on: Sep05, 2009
>
> Because this WAL archive drive has become full, all WAL archive segments
to
> be archived are accumulated into pg_xlog/ directory itself. Eventually, 9
> months (as of today from Sep05, 2009) of WAL archives are residing in
> pg_xlog/ directory.

This is by design. WAL logs are only removed from pg_xlog once they have
been archived successfully. Since your archive_command fails due to the disk
being full, they remain in pg_xlog. Once you enlarge the filesystem holding
the WAL archive they should be copied and subsequently removed from pg_xlog.

Note that you'd usually take a new base backup once in a while to limit the
number of WAL segments you need to retain. You can take a base backup while
postgres is running by issuing pg_start_backup() before you start the copy
and pg_stop_backup() after it finished. Apart from creating additional IO
load, doing so won't interfere with normal query execution in any way.

> My question is, in case if I would like to perform recovery process as it
is
> in this situation, will this work out? That is, I'm seeing/finding out
> whether recovery process would perform successfully anywhere between the
> date range Aug03, 2009 (my base backup date) and as of today - Jun03,
2009.
> Reason I'm asking this is still all my WAL archives are residing in
pg_xlog/
> directory.

For PITR, you'll obviously need the WAL segment starting from the time your
base backup started up until the point you want to recover to. If some of
those WAL segments still reside in pg_xlog, you'll either need to teach your
restore_command to fetch them from there. Note that you cannot recover "in
reverse". To recover up to a certain point in time you always need to start
from a base backup taken *before* that time.

best regards,
Florian Pflug


From: Florian Pflug <fgp(at)phlo(dot)org>
To: <gnanam(at)zoniac(dot)com>
Cc: PostgreSQL-development hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PITR Recovery Question
Date: 2010-06-04 10:59:39
Message-ID: 2059001F-1297-4989-81C4-1212E10403FB@phlo.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin pgsql-hackers

On Jun 4, 2010, at 7:05 , Gnanakumar wrote:
>> If some of those WAL segments still reside in pg_xlog, you'll either need
> to teach your restore_command to fetch them from there. Note that you cannot
> recover "in reverse".
>
> My pg_xlog/ and walarchive/ directory locations are
> "/usr/local/pgsql/data/pg_xlog" and "/mnt/pitr/walarchive" respectively.
>
> If my normal restore command is: restore_command='cp
> "/mnt/pitr/walarchive/%f "%p"', how should I instruct restore command to
> fetch? Should I just replace this with something like restore_command='cp
> /usr/local/pgsql/data/pg_xlog/%f "%p"'. Also you have mentioned that we
> cannot recover "in reverse", what I understand from this is that even though
> if I replace the restore command pointing to pg_xlog/ directory, this will
> not work out in this situation? Is my understanding right?

If you point it at a cluster's own pg_xlog directory, it won't work.

You might want to re-ead the section on the recovery process in the PTITR documentation, at
http://www.postgresql.org/docs/8.4/interactive/continuous-archiving.html#BACKUP-PITR-RECOVERY

If you have further questions, please take this discussion to pgsql-general.

best regards,
Florian Pflug


From: "Gnanakumar" <gnanam(at)zoniac(dot)com>
To: <pgsql-admin(at)postgresql(dot)org>
Cc: <fgp(at)phlo(dot)org>
Subject: PITR Recovery Question
Date: 2010-06-04 11:54:14
Message-ID: 00ca01cb03dc$a9ffe780$fdffb680$@com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin pgsql-hackers

Hi Florian,

I'm moving this discussion to pgsql-admin. To give a picture of my original
question, it is given below, so that other users in this mailing list will
understand my original problem statement.

> If you point it at a cluster's own pg_xlog directory, it won't work.

> You might want to re-ead the section on the recovery process in the PTITR
documentation, at
http://www.postgresql.org/docs/8.4/interactive/continuous-archiving.html#BAC
KUP-PITR-RECOVERY

OK.

My pg_xlog/ and walarchive/ directory locations are
"/usr/local/pgsql/data/pg_xlog" and "/mnt/pitr/walarchive" respectively.

In case, if I decide to clean the old WAL archives and set right PITR from
today onwards by taking base backup, so that I can start managing and
maintaining atleast from now onwards, what is the correct way/method of
removing files from pg_xlog/, pg_xlog/archive_status/ and
/mnt/pitr/walarchive/ directories?

1. How do I clean up WAL archives that are accumulated in pg_xlog/
directory? Does it harm the database in anyway if I remove WAL segments
manually?
2. Few important command outputs:

[root(at)dbserver data]# pwd
/usr/local/pgsql/data
[root(at)dbserver data]# ls -l pg_xlog |wc -l
14438
[root(at)dbserver data]# ls -l pg_xlog/archive_status |wc -l
14436
[root(at)dbserver data]# ls -l /mnt/zoniacpitr/walarchive | wc -l
1783

NOTE: I can see all the files under pg_xlog/archive_status/ directory having
suffixed with "00*.ready".

-----Original Message-----
From: Gnanakumar [mailto:gnanam(at)zoniac(dot)com]
Sent: Thursday, June 03, 2010 7:07 PM
To: pgsql-hackers(at)postgresql(dot)org
Subject: PITR Recovery Question

Hi,

My production server is running PostgreSQL v8.2.3 on CentOS release 5.2
(Final).

I've setup PITR in my production server. For some reason, after setting up
PITR, we're not able to manage and maintain it. Because of this our WAL
archive drive become full (100% use) approximately after 1 month.

PITR SETUP DETAILS
We've 2 drives. Primary drive (pgsql/data/ directory resides) is 400 GB and
secondary drive (WAL archive) is 30 GB. All WAL archives are written to
secondary drive.

Base backup taken on: Aug03, 2009
WAL archive drive become full (100% use) on: Sep05, 2009

Because this WAL archive drive has become full, all WAL archive segments to
be archived are accumulated into pg_xlog/ directory itself. Eventually, 9
months (as of today from Sep05, 2009) of WAL archives are residing in
pg_xlog/ directory.

My question is, in case if I would like to perform recovery process as it is
in this situation, will this work out? That is, I'm seeing/finding out
whether recovery process would perform successfully anywhere between the
date range Aug03, 2009 (my base backup date) and as of today - Jun03, 2009.
Reason I'm asking this is still all my WAL archives are residing in pg_xlog/
directory.

Experts advice/idea/suggestion on this appreciated.

Regards,
Gnanam


From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: <pgsql-admin(at)postgresql(dot)org>,<gnanam(at)zoniac(dot)com>
Cc: <fgp(at)phlo(dot)org>
Subject: Re: PITR Recovery Question
Date: 2010-06-04 15:02:53
Message-ID: 4C08CF4D0200002500031F44@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin pgsql-hackers

"Gnanakumar" <gnanam(at)zoniac(dot)com> wrote:

> My pg_xlog/ and walarchive/ directory locations are
> "/usr/local/pgsql/data/pg_xlog" and "/mnt/pitr/walarchive"
> respectively.
>
> In case, if I decide to clean the old WAL archives and set right
> PITR from today onwards by taking base backup, so that I can start
> managing and maintaining atleast from now onwards, what is the
> correct way/method of removing files from pg_xlog/,
> pg_xlog/archive_status/ and /mnt/pitr/walarchive/ directories?

It is generally unsafe to delete any WAL files from pg_xlog. If
they are there because your archive command has been failing, you
need to turn off archiving or (probably more convenient) allow the
archive script to return success until things clear. One trick
would be to temporarily change your archive_command to 'true',
delete all files from your archive, and then change the command
back. Doing that without exposing yourself to a period where you
have no backup might be tricky, though.

If the only problem with the archive command is that the archive fs
is full, I would copy the contents of the archive directory to tape
or whatever medium you have for long-term storage, delete the
contents, and let archive succeed. The pg_xlog directory will
eventually clear, and then I would get a fresh PITR base backup
(following all the documented steps for doing so). You really want
to see WAL files flowing to your archive location before you start
the process of getting a new base backup.

If there's some other reason that the archive command has been
failing, what is it?

-Kevin


From: Florian Pflug <fgp(at)phlo(dot)org>
To: <gnanam(at)zoniac(dot)com>
Cc: <pgsql-admin(at)postgresql(dot)org>
Subject: Re: PITR Recovery Question
Date: 2010-06-05 00:00:39
Message-ID: F993CF5B-1C46-4936-99AB-7ED9AAF7B3F8@phlo.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin pgsql-hackers

On Jun 4, 2010, at 13:54 , Gnanakumar wrote:
> In case, if I decide to clean the old WAL archives and set right PITR from
> today onwards by taking base backup, so that I can start managing and
> maintaining atleast from now onwards, what is the correct way/method of
> removing files from pg_xlog/, pg_xlog/archive_status/ and
> /mnt/pitr/walarchive/ directories?

You should *never*, I repeat, *never* remove files from pg_xlog manually unless you know *exactly* what you're doing!

If you remove those, and postgres crashes, you *will* lose your whole database!

About the same holds for pg_xlog/archive_status. Removing files from there won't prevent postgres from starting, but it *will* interfere with the WAL archiving process, and may lead to unusable base backups!

Postgres will recycle old WAL segments automatically, once they're no longer needed for crash recovery *and* after they've been archived successfully (if archiving is enabled). Once a WAL segment is recycled, it's corresponding archive_status/ file(s) will be removed too.

The correct way to clean out pg_xlog therefore is to either disable WAL archiving, or to make sure your archive_command succeeds eventually.

A common way to use PITR is the following.

1) You setup an archive_command.
2) You take a base backup. That also triggers the creation of a backup history file (<number1>.<number2>.backup) in the archive directory.
3) You remove old base backups. You probably want to keep more than one, though, just in case. How long you retain your base backups determines
how far back in time you'll be able to go in the event of a disaster. The furthest point you can go back to is the time pg_stop_backup() was called while taking the oldest base backup you retained.
4) You remove all WAL segments that predate the remaining base backups. For that, you find the backup history file in the archive directory that corresponds to the oldest remaining base backup and then remove all WAL segments whose name is numerically smaller than the <number1> from that backup history file. Keeping older WAL segments buys you nothing - WAL files without a base backup that *predates* them are worthless.
5) Your archive directory will now slowly grow. At some point the cost of storing all those segments will out-weight the cost of taking a new base backup. Whether that happens after an hour, a day, a week or a year depends on the size of your database vs. the amount of modifications it receives. However, at some point or another you will reach that cutoff point, and that is when you continue with (2).

http://www.postgresql.org/docs/8.2/static/continuous-archiving.html explains all of this pretty exhaustively.

best regards,
Florian Pflug


From: "Gnanakumar" <gnanam(at)zoniac(dot)com>
To: "'Kevin Grittner'" <Kevin(dot)Grittner(at)wicourts(dot)gov>, <pgsql-admin(at)postgresql(dot)org>
Cc: <fgp(at)phlo(dot)org>
Subject: Re: PITR Recovery Question
Date: 2010-06-05 05:21:16
Message-ID: 00cb01cb046e$ee54f550$cafedff0$@com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin pgsql-hackers

Hi Kevin,

> It is generally unsafe to delete any WAL files from pg_xlog. If
> they are there because your archive command has been failing, you
> need to turn off archiving or (probably more convenient) allow the
> archive script to return success until things clear. One trick
> would be to temporarily change your archive_command to 'true',
> delete all files from your archive, and then change the command
> back. Doing that without exposing yourself to a period where you
> have no backup might be tricky, though.

I'm trying to see and understand your view point, but I couldn't able to get
this particular step clearly: "One trick would be to temporarily change
your archive_command to 'true', delete all files from your archive, and then
change the command back ". Can you please clarify and explain on this?

When you say *temporarily changing archive_command to 'true' *, do you mean
enabling/disabling of WAL archiving here? Per documentation, "If this is an
empty string (the default), WAL archiving is disabled.". And when you say
"change the command back", I understood it as *disabling*. Is my
understanding correct?

> If the only problem with the archive command is that the archive fs
> is full, I would copy the contents of the archive directory to tape
> or whatever medium you have for long-term storage, delete the
> contents, and let archive succeed. The pg_xlog directory will
> eventually clear, and then I would get a fresh PITR base backup
> (following all the documented steps for doing so). You really want
> to see WAL files flowing to your archive location before you start
> the process of getting a new base backup.

Yes, probably I should go ahead and proceed as you suggested above, that is
allowing archive script to run successfully until things are completely
clear.

> If there's some other reason that the archive command has been
> failing, what is it?

No other reason. It was failing only because my WAL archive drive was full.

Regards,
Gnanam


From: "Gnanakumar" <gnanam(at)zoniac(dot)com>
To: "'Florian Pflug'" <fgp(at)phlo(dot)org>
Cc: <pgsql-admin(at)postgresql(dot)org>
Subject: Re: PITR Recovery Question
Date: 2010-06-05 07:05:37
Message-ID: 00cc01cb047d$82079830$8616c890$@com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin pgsql-hackers

Hi Florian,

Thanks for your valuable suggestion and a detailed step on common way to use
PITR. Things are very clear now except that I've some other question in
connection to this.

> The correct way to clean out pg_xlog therefore is to either disable WAL
archiving, or to make sure your archive_command succeeds eventually.

Probably I would go with the 2nd option, that is allowing archive command to
run successfully until things are completely clear.

But this question is for my understanding: In case if I decide to go with
1st option, that is disable WAL archiving for a while, will it completely
clean out files from pg_xlog/ and pg_xlog/archive_status/ directories, so
that I can start the PITR by taking base backup by enabling WAL archiving
later?

> A common way to use PITR is the following.

> 4) You remove all WAL segments that predate the remaining base backups.
For that, you find the backup history file in the archive directory that
corresponds to the oldest remaining base backup and then remove all WAL
segments whose name is numerically smaller than the <number1> from that
backup history file. Keeping older WAL segments buys you nothing - WAL files
without a base backup that *predates* them are worthless.

Can you share with me any automated shell script that takes care of this
removal automatically? Or can you share any systematic way (steps) of doing
things if I want to do this manually?

Regards,
Gnanam


From: Florian Pflug <fgp(at)phlo(dot)org>
To: <gnanam(at)zoniac(dot)com>
Cc: <pgsql-admin(at)postgresql(dot)org>
Subject: Re: PITR Recovery Question
Date: 2010-06-06 12:41:21
Message-ID: 7B83DD06-04B8-4BE5-8573-3CDD48568F1D@phlo.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin pgsql-hackers

On Jun 5, 2010, at 9:05 , Gnanakumar wrote:
> Thanks for your valuable suggestion and a detailed step on common way to use
> PITR. Things are very clear now except that I've some other question in
> connection to this.
>
>> The correct way to clean out pg_xlog therefore is to either disable WAL archiving, or to make sure your archive_command succeeds eventually.
>
> Probably I would go with the 2nd option, that is allowing archive command to
> run successfully until things are completely clear.
>
> But this question is for my understanding: In case if I decide to go with
> 1st option, that is disable WAL archiving for a while, will it completely
> clean out files from pg_xlog/ and pg_xlog/archive_status/ directories, so
> that I can start the PITR by taking base backup by enabling WAL archiving
> later?

If you disable WAL archiving by setting archive_command to 'true', it'll surely clean out the files, since postgresql will actually believe it archived them successfully. I not sure what happens if you set archive_command to '' - that might disable the archiving process completely, and hence prevent the cleanup.

>> A common way to use PITR is the following.
>
>> 4) You remove all WAL segments that predate the remaining base backups. For that, you find the backup history file in the archive directory that corresponds to the oldest remaining base backup and then remove all WAL segments whose name is numerically smaller than the <number1> from that backup history file. Keeping older WAL segments buys you nothing - WAL files without a base backup that *predates* them are worthless.
>
> Can you share with me any automated shell script that takes care of this
> removal automatically? Or can you share any systematic way (steps) of doing
> things if I want to do this manually?

Sorry, I don't have a script for this at hand. But a quick search through the pgsql-admin archive brings up this post, which contains such a script.
http://archives.postgresql.org/pgsql-admin/2006-03/msg00337.php

best regards,
Florian Pflug


From: Samuel Stearns <SStearns(at)internode(dot)com(dot)au>
To: "gnanam(at)zoniac(dot)com" <gnanam(at)zoniac(dot)com>
Cc: "pgsql-admin(at)postgresql(dot)org" <pgsql-admin(at)postgresql(dot)org>
Subject: Re: PITR Recovery Question
Date: 2010-06-06 23:23:03
Message-ID: 68B59BEDCD36854AADBDF17E91B2937A01A5937796@EXCHMAIL.staff.internode.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin pgsql-hackers

Here's a script to make your backup and rsync it to a remote destination:

#!/bin/bash
echo "checkpoint"
echo "CHECKPOINT;" | /local/pkg/bin/psql template1
echo "start backup"
echo "SELECT pg_start_backup('cisoradr:/cis/pgsql/katana7/backup');" | /local/pkg/bin/psql template1
echo "rsync"
/local/pkg/bin/rsync --delete -azxH /local/app/postgres/data pgsql(at)cisoradr-ext:/cis/pgsql/katana7/backup/.
echo "Stop backup"
echo "SELECT pg_stop_backup();" | /local/pkg/bin/psql template1

Sam

-----Original Message-----
From: pgsql-admin-owner(at)postgresql(dot)org [mailto:pgsql-admin-owner(at)postgresql(dot)org] On Behalf Of Florian Pflug
Sent: Sunday, 6 June 2010 10:11 PM
To: gnanam(at)zoniac(dot)com
Cc: pgsql-admin(at)postgresql(dot)org
Subject: Re: [ADMIN] PITR Recovery Question

On Jun 5, 2010, at 9:05 , Gnanakumar wrote:
> Thanks for your valuable suggestion and a detailed step on common way to use
> PITR. Things are very clear now except that I've some other question in
> connection to this.
>
>> The correct way to clean out pg_xlog therefore is to either disable WAL archiving, or to make sure your archive_command succeeds eventually.
>
> Probably I would go with the 2nd option, that is allowing archive command to
> run successfully until things are completely clear.
>
> But this question is for my understanding: In case if I decide to go with
> 1st option, that is disable WAL archiving for a while, will it completely
> clean out files from pg_xlog/ and pg_xlog/archive_status/ directories, so
> that I can start the PITR by taking base backup by enabling WAL archiving
> later?

If you disable WAL archiving by setting archive_command to 'true', it'll surely clean out the files, since postgresql will actually believe it archived them successfully. I not sure what happens if you set archive_command to '' - that might disable the archiving process completely, and hence prevent the cleanup.

>> A common way to use PITR is the following.
>
>> 4) You remove all WAL segments that predate the remaining base backups. For that, you find the backup history file in the archive directory that corresponds to the oldest remaining base backup and then remove all WAL segments whose name is numerically smaller than the <number1> from that backup history file. Keeping older WAL segments buys you nothing - WAL files without a base backup that *predates* them are worthless.
>
> Can you share with me any automated shell script that takes care of this
> removal automatically? Or can you share any systematic way (steps) of doing
> things if I want to do this manually?

Sorry, I don't have a script for this at hand. But a quick search through the pgsql-admin archive brings up this post, which contains such a script.
http://archives.postgresql.org/pgsql-admin/2006-03/msg00337.php

best regards,
Florian Pflug


From: "Gnanakumar" <gnanam(at)zoniac(dot)com>
To: "'Florian Pflug'" <fgp(at)phlo(dot)org>
Cc: <pgsql-admin(at)postgresql(dot)org>
Subject: Re: PITR Recovery Question
Date: 2010-06-08 05:29:09
Message-ID: 000101cb06cb$87e57240$97b056c0$@com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin pgsql-hackers

Hi Florian,

Thanks for the clarification and a link to a post on automated script.

On Jun 5, 2010, at 9:05 , Gnanakumar wrote:
> Thanks for your valuable suggestion and a detailed step on common way to
use
> PITR. Things are very clear now except that I've some other question in
> connection to this.
>
>> The correct way to clean out pg_xlog therefore is to either disable WAL
archiving, or to make sure your archive_command succeeds eventually.
>
> Probably I would go with the 2nd option, that is allowing archive command
to
> run successfully until things are completely clear.
>
> But this question is for my understanding: In case if I decide to go with
> 1st option, that is disable WAL archiving for a while, will it completely
> clean out files from pg_xlog/ and pg_xlog/archive_status/ directories, so
> that I can start the PITR by taking base backup by enabling WAL archiving
> later?

If you disable WAL archiving by setting archive_command to 'true', it'll
surely clean out the files, since postgresql will actually believe it
archived them successfully. I not sure what happens if you set
archive_command to '' - that might disable the archiving process completely,
and hence prevent the cleanup.

>> A common way to use PITR is the following.
>
>> 4) You remove all WAL segments that predate the remaining base backups.
For that, you find the backup history file in the archive directory that
corresponds to the oldest remaining base backup and then remove all WAL
segments whose name is numerically smaller than the <number1> from that
backup history file. Keeping older WAL segments buys you nothing - WAL files
without a base backup that *predates* them are worthless.
>
> Can you share with me any automated shell script that takes care of this
> removal automatically? Or can you share any systematic way (steps) of
doing
> things if I want to do this manually?

Sorry, I don't have a script for this at hand. But a quick search through
the pgsql-admin archive brings up this post, which contains such a script.
http://archives.postgresql.org/pgsql-admin/2006-03/msg00337.php

best regards,
Florian Pflug


From: "Gnanakumar" <gnanam(at)zoniac(dot)com>
To: "'Samuel Stearns'" <SStearns(at)internode(dot)com(dot)au>
Cc: <pgsql-admin(at)postgresql(dot)org>
Subject: Re: PITR Recovery Question
Date: 2010-06-08 05:29:17
Message-ID: 000201cb06cb$8b068860$a1139920$@com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin pgsql-hackers

Hi Sam,

Thank your sharing this script.

> Here's a script to make your backup and rsync it to a remote destination:

> #!/bin/bash
> echo "checkpoint"
> echo "CHECKPOINT;" | /local/pkg/bin/psql template1
> echo "start backup"
> echo "SELECT pg_start_backup('cisoradr:/cis/pgsql/katana7/backup');" |
/local/pkg/bin/psql template1
> echo "rsync"
> /local/pkg/bin/rsync --delete -azxH /local/app/postgres/data
pgsql(at)cisoradr-ext:/cis/pgsql/katana7/backup/.
> echo "Stop backup"
> echo "SELECT pg_stop_backup();" | /local/pkg/bin/psql template1


From: "Gnanakumar" <gnanam(at)zoniac(dot)com>
To: <pgsql-admin(at)postgresql(dot)org>
Subject: pg_ctl: server does not shut down
Date: 2010-10-25 07:36:30
Message-ID: 002d01cb7417$595f78d0$0c1e6a70$@com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin pgsql-hackers

Hi,

My production server is running PostgreSQL v8.2.3 on CentOS release 5.2
(Final).

I've setup PITR in my production server. For some reason, after setting up
PITR, we're not able to manage and maintain it.

PITR SETUP DETAILS
We've 2 drives. Primary drive (pgsql/data/ directory resides) is 400 GB and
secondary drive (WAL archive) is 300 GB. All WAL archives are written to
secondary drive.

Base backup taken on: Aug03, 2009
WAL archive drive become full (100% use) on: Jul01, 2010

Because this WAL archive drive has become full, all WAL archive segments to
be archived are accumulated into pg_xlog/ directory itself. Eventually,
almost 4 months (as of today from Jul01, 2010) of WAL archives are residing
in pg_xlog/ directory.

ISSUE
We then decided to stop PITR in our database. Steps which I followed are:
1. service postgresql stop
2. Detached (unmounted) 300 GB secondary drive (WAL archive)
3. Disabled WAL archiving by commenting (#) 2 postgresql.conf parameters:
"archive_command" and "archive_timeout".
4. service postgresql start

Now, the database has started without any issues. But when I tried to stop
database using 'service postgresql stop' within few minutes after it's been
started, it was not shutting down and responded with the following message
after few minutes:
"Stopping PostgreSQL: pg_ctl: server does not shut down"

I even checked serverlog for whether any errors/issues are reported, but I
don't find any. Also, when I tried to login to my database at this time, it
says:
psql mydb mydbuser
psql: FATAL: the database system is shutting down

but not actually shutting down ever. I'm clueless why it is happening so
and not shutting down ever.

My questions are:
1. What could be the root cause of PostgreSQL server not shutting
down ever?
2. Am I following a different/incorrect way of disabling PITR?
3. Considering this situation, now how do I safely disable PITR in
my database?

Experts advice/idea/suggestion on this are appreciated.

Regards,
Gnanam


From: Greg Smith <greg(at)2ndquadrant(dot)com>
To: gnanam(at)zoniac(dot)com
Cc: pgsql-admin(at)postgresql(dot)org
Subject: Re: pg_ctl: server does not shut down
Date: 2010-10-25 15:22:46
Message-ID: 4CC5A0C6.8080901@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin pgsql-hackers

Gnanakumar wrote:
> 1. What could be the root cause of PostgreSQL server not shutting
> down ever?
> 2. Am I following a different/incorrect way of disabling PITR?
>

You can check if PITR is disabled by executing:

SHOW archive_command;

At a psql prompt. If that's empty, then you turned it off, and it isn't
involved in the shutdown issue you have anymore.

Normally when the server doesn't want to shut down, that's because of a
client that won't exit normally, not something in the main server. Take
a look at what processes are still running against the database and see
if there are clients attached after the fast shutdown attempt. If so,
those are your problem, not something to PITR.

If you know every client has been stopped or is doing nothing useful,
you can do an immediate shutdown of the server and kick everyone off.
You'll get messages about unclean shutdown, but no data should be lost.
The "service" scripts don't do that though. You'll need to do something
like this instead, presuming you start as root:

su postgres
pg_ctl -D /var/lib/pgsql/data -m immedate stop

--
Greg Smith, 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services and Support www.2ndQuadrant.us
Author, "PostgreSQL 9.0 High Performance" Pre-ordering at:
https://www.packtpub.com/postgresql-9-0-high-performance/book


From: "Gnanakumar" <gnanam(at)zoniac(dot)com>
To: "'Greg Smith'" <greg(at)2ndquadrant(dot)com>
Cc: <pgsql-admin(at)postgresql(dot)org>
Subject: Re: pg_ctl: server does not shut down
Date: 2010-10-26 05:32:06
Message-ID: 002701cb74cf$23607f50$6a217df0$@com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin pgsql-hackers

> You can check if PITR is disabled by executing:
> SHOW archive_command;
> At a psql prompt. If that's empty, then you turned it off, and it isn't
involved in the shutdown issue you have anymore.

Yes, I could see empty value here.

> Take a look at what processes are still running against the database and
see if there are clients attached after the fast shutdown attempt.
> If so, those are your problem, not something to PITR.

Yes, I can see the following processes still running. I used the command 'ps
-ef |grep postgres' to list.
postgres 3376 1 0 07:20 ? 00:00:00
/mnt/database/pgsql/bin/postgres -D /usr/local/pgsql/data
postgres 3381 3376 0 07:20 ? 00:00:00 postgres: logger process

postgres 3383 3376 0 07:20 ? 00:00:00 postgres: writer process

But why logger and writer process are still running/showing up here for a
very long time after issuing 'service postgresql stop'. What could be the
problem? Also, am sure here that no other clients (database connections)
are connected at this time.


From: "Gnanakumar" <gnanam(at)zoniac(dot)com>
To: "'Greg Smith'" <greg(at)2ndquadrant(dot)com>
Cc: <pgsql-admin(at)postgresql(dot)org>
Subject: Re: pg_ctl: server does not shut down
Date: 2010-10-28 07:57:02
Message-ID: 004f01cb7675$b7118120$25348360$@com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin pgsql-hackers

>> You can check if PITR is disabled by executing:
>> SHOW archive_command;
>> At a psql prompt. If that's empty, then you turned it off, and it isn't
>> involved in the shutdown issue you have anymore.

> Yes, I could see empty value here.

>> Take a look at what processes are still running against the database and
>> see if there are clients attached after the fast shutdown attempt.
>> If so, those are your problem, not something to PITR.

> Yes, I can see the following processes still running. I used the command
'ps
> -ef |grep postgres' to list.
> postgres 3376 1 0 07:20 ? 00:00:00
> /mnt/database/pgsql/bin/postgres -D /usr/local/pgsql/data
> postgres 3381 3376 0 07:20 ? 00:00:00 postgres: logger process
> postgres 3383 3376 0 07:20 ? 00:00:00 postgres: writer process

> But why logger and writer process are still running/showing up here for a
> very long time after issuing 'service postgresql stop'. What could be the
> problem? Also, am sure here that no other clients (database connections)
> are connected at this time.

I can CONSISTENTLY reproduce this problem (PITR walarchive drive is not
attached/mounted) and I still don't know the reason behind this issue:
1) If I disable PITR and start postgresql service, consistently server does
not shut down on 'service postgresql stop' command. I have tried and
checked this upto 4 times.
NOTE: After disabling PITR, on checking "SHOW archive_command" from psql,
it is showing empty value. But on 'ps -ef |grep postgres', it is always
showing up
logger and writer process running.

2) But if I enable back PITR and start postgresql service, consistently
server shuts down properly on 'service postgresql stop' command. Again I
have tried this cycle also upto 4 times.

When I mean enable/disable PITR, it is about enabling/disabling 2
configuration parameters: "archive_command" and "archive_timeout".

Since this is our Production server, experts assistance to solve this issue
are highly appreciated.