Re: FW: Setting up of PITR system.

From: "Rajesh Kumar Mallah" <mallah(dot)rajesh(at)gmail(dot)com>
To: andy(dot)shellam(at)mailnetwork(dot)co(dot)uk
Cc: pgsql-admin(at)postgresql(dot)org
Subject: Re: FW: Setting up of PITR system.
Date: 2006-03-29 12:45:59
Message-ID: a97c77030603290445o29f96099xd3cc5f15cd4fb364@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

On 3/28/06, Andy Shellam <andy(dot)shellam(at)mailnetwork(dot)co(dot)uk> wrote:
>
>
>
> we want to remove archived WAL log files which are no longer needed
> so that the size of wal_archive folder is under control.
>
> The DOCs say that filenames numerically less than the WAL record that
> pg_stop_backup()
> suggests can be removed. Will an alphabetical sorting be different from
> numerically sorted
> filename ? Sorting numerically is not easy as the filenames look like
> HEX numbers
> ut they are are so huge that they cant be stored in normal integers
> (32bits).
>
> The WAL files are all numerical – however the backup file has the checkpoint
> appended to it – eg. 00009012514000916.A0AC91.backup.
>
> You need to examine the contents of this file to find the earliest file
> needed (it's usually the one just before it – i.e. 00009012514000916 in
> this case) and any numerically later than the .backup file.
>
>
>
> Also you wouldn't particular need any heavy programming – I'm sure a simple
> shell script could be written in bash to pick out the correct files.

OK i am posting my full script [ its not heavy programming i guess :) ]
shall be grateful if you/someone could review it . (its well commented i think)
script also carries sample data.

it does following
1. takes base backup to a destined folder by rsync
2. waits for .backup file to arrive in archive folder
after pg_stop_bacup()
3. searches and removes unwanted archived log files.

I have run it many times in my server and it seems to
be working fine.

------------------------------------ BEGIN
--------------------------------------------
#!/bin/bash

# folder where base_backup is put
BACKUPFOLDER=/mnt/disk3/base_backups
today=`date +%d-%m-%Y-%H-%M-%S`
PSQL=/opt/usr/local/pgsql/bin/psql
RSYNC="/usr/bin/rsync -a"
PGDATADIR=/mnt/disk5/pgdatadir

# two table spaces.

TS1=/mnt/disk4/bigtables
TS2=/mnt/disk3/indexspace

# folder where *archived* logs are put.
WAL_ARCHIVE=/mnt/wal_archive

label=base_backup_${today}

echo "Executing pg_start_backup with label $label in server ... "

# get the checkpoint at which backup starts
# the .backup files seems to be bearing this string in it.

CP=`$PSQL -q -Upostgres -d template1 -c "SELECT
pg_start_backup('$label');" -P tuples_only -P format=unaligned`

echo "Begin CheckPoint is $CP" # this contain string like A/681D1214

if [ $? -ne 0 ]
then
echo "PSQL pg_start_backup failed"
exit 1;
fi
echo "pg_start_backup executed successfully"

echo "RSYNC begins.."

# rsync each of the folders to the backup folder.
for i in $TS1 $TS2 $PGDATADIR ;
do
echo "Syncing $i .. "
time $RSYNC $i $BACKUPFOLDER
echo "Done"
done

# fortunately rsync does *not* seems to be exitting with non zero exit code
# for expected file disappearances and modifications.
if [ $? -ne 0 ]
then
echo "RSYNC failed"
exit 1;
fi

echo "RSYNC Done successfully"

echo "Executing pg_stop_backup in server ... "
$PSQL -Upostgres template1 -c "SELECT pg_stop_backup();"
if [ $? -ne 0 ]
then
echo "PSQL pg_stop_backup failed"
exit 1;
fi
echo "pg_stop_backup done successfully"
TO_SEARCH="00${CP:4}" # $TO_SEARCH contains 1D1214

# now remove the unneeded files.

# strip off first 4 chars from CP and prefix 00 to the result.
# search the file that has the Checkpoint in its filename.
# it takes a while to come, so wait till it comes.

while true; do
REF_FILE=`ls -1 $WAL_ARCHIVE | grep $TO_SEARCH`
if [ ! $REF_FILE ]; then
echo "Waitng for file with $TO_SEARCH in $WAL_ARCHIVE ... "
else
break
fi
sleep 1
done

# REF_FILE is 000000010000000A00000068.001D1214.backup.bz2

# take only first 24 chars and store.
REF_FILE_NUM=${REF_FILE:0:24}

# REF_FILE_NUM is 000000010000000A00000068

echo "REF_FILE_NUM=$REF_FILE_NUM"

# iterate list of files in the WAL_ARCHIVE folder
for i in `ls -1 $WAL_ARCHIVE` ;
do
# $i is :000000010000000A0000005D.bz2 eg
# get first 24 chars in filename
FILE_NUM=${i:0:24}

# compare if the number is less than the reference
# here string comparison is being used.
if [[ $FILE_NUM < $REF_FILE_NUM ]]
then
echo "$FILE_NUM [ $i ] removed"
rm -f $WAL_ARCHIVE/$i
else
echo "$FILE_NUM [ $i ] not removed"
fi
done
------------------------------------ END
-----------------------------------------------

---------------------------------- REAL OUTPUT OF A RUN-----------------------
Executing pg_start_backup with label base_backup_29-03-2006-17-29-01
in server ...
Begin CheckPoint is A/681D1214
pg_start_backup executed successfully
RSYNC begins..
Syncing /mnt/disk4/bigtables ..

real 6m24.338s
user 1m12.831s
sys 0m55.295s
Done
Syncing /mnt/disk3/indexspace ..

real 5m45.245s
user 0m32.520s
sys 0m26.567s
Done
Syncing /mnt/disk5/pgdatadir ..
readlink pgdatadir/global/pgstat.tmp: No such file or directory
rsync error: some files could not be transferred (code 23) at main.c(620)

real 12m1.844s
user 1m43.698s
sys 1m24.486s
Done
RSYNC Done successfully
Executing pg_stop_backup in server ...
pg_stop_backup
----------------
A/6FA82D74
(1 row)

pg_stop_backup done successfully
Waitng for file with 001D1214 in /mnt/wal_archive ...
REF_FILE_NUM=000000010000000A00000068
000000010000000A0000005D [
000000010000000A0000005D.003C0B54.backup.bz2 ] removed
000000010000000A0000005D [ 000000010000000A0000005D.bz2 ] removed
000000010000000A0000005E [ 000000010000000A0000005E.bz2 ] removed
000000010000000A0000005F [ 000000010000000A0000005F.bz2 ] removed
000000010000000A00000060 [ 000000010000000A00000060.bz2 ] removed
000000010000000A00000061 [ 000000010000000A00000061.bz2 ] removed
000000010000000A00000062 [ 000000010000000A00000062.bz2 ] removed
000000010000000A00000063 [ 000000010000000A00000063.bz2 ] removed
000000010000000A00000064 [ 000000010000000A00000064.bz2 ] removed
000000010000000A00000065 [ 000000010000000A00000065.bz2 ] removed
000000010000000A00000066 [ 000000010000000A00000066.bz2 ] removed
000000010000000A00000067 [ 000000010000000A00000067.bz2 ] removed
000000010000000A00000068 [
000000010000000A00000068.001D1214.backup.bz2 ] not removed
000000010000000A00000068 [ 000000010000000A00000068.bz2 ] not removed
000000010000000A00000069 [ 000000010000000A00000069.bz2 ] not removed
000000010000000A0000006A [ 000000010000000A0000006A.bz2 ] not removed
000000010000000A0000006B [ 000000010000000A0000006B.bz2 ] not removed
000000010000000A0000006C [ 000000010000000A0000006C.bz2 ] not removed
000000010000000A0000006D [ 000000010000000A0000006D.bz2 ] not removed
000000010000000A0000006E [ 000000010000000A0000006E.bz2 ] not removed

-------------------------------
END-----------------------------------------------------

>
>
> Q2. We are attempting to automate the process of taking base backup and
> removal
> of the unneeded WAL files. Is there any reliable way of knowing the
> WAL file X from
> inside the shell script , such that files prior to X can be removed ?
>
> from a shell script we issue pg_stop_backup() by psql -c , it returns
> something
> which does not looks like a WAL file.
>
>
>
> Pg_stop_backup() returns the checkpoint record – something like 9/A0AC91 –
> this is purely a guess, but you might be able to find which backup file
> contains this checkpoint by taking the digits after the forward-slash in the
> checkpoint (i.e. A0AC91 in this case), then finding the filename that
> contains this – in my example it's 00009012514000916.A0AC91.backup,

I think its not correct , pg_start_backup() returns the checkpoint record
which becomes part of the backup filename.

and grep
> the file for the number after the text "Start WAL Location: " in this file –
> then remove anything numerically less.
>
>
>
> It'd be a very "bitty" process, but I'm certain it could be done – it would
> need heavy testing over a period of backups though to ensure the wrong files
> are not being deleted.
>
>
>
> Q3. tar exits with non zero status for the same reasons as mentioned in
> docs
> is there any better archiving tool for this purpose ? can we use cp
> -a ?
>
>
>
> The WAL archive command can be set to use either cp or mv – then why not
> have a regular cronjob that runs a shell script to add any new files to your
> tar archive every time a new file is detected in your WAL-archive directory?

Hmmm you got me wrong, am talking about taking the base_backup not
archival of WAL segments

>
>
>
> Regds
> Rajesh Kumar Mallah.
>
> !DSPAM:14,44293ed135042000516834!
>
>
> Regards
>
>
>
> Andy Shellam

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Rajesh Kumar Mallah 2006-03-29 12:58:51 Re: FW: Setting up of PITR system.
Previous Message Ola Sandbu 2006-03-29 12:38:27 Re: Bloated pg_shdepend_depender_index