Re: replication using WAL archives

From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Gaetano Mendola <mendola(at)bigfoot(dot)com>
Cc: Robert Treat <xzilla(at)users(dot)sourceforge(dot)net>, pgsql-admin(at)postgresql(dot)org, iain(at)mst(dot)co(dot)jp
Subject: Re: replication using WAL archives
Date: 2005-06-05 00:29:02
Message-ID: 200506050029.j550T2W29877@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin


This thread has been added as a link on the TODO list under TODO.detail.

---------------------------------------------------------------------------

Gaetano Mendola wrote:
> Robert Treat wrote:
> > On Thu, 2004-10-21 at 02:44, Iain wrote:
> >
> >>Hi,
> >>
> >>I thought I read something about this in relation to v8, but I can't
> >>find any reference to it now... is it (or will it be) possible to do
> >>master-slave style database replication by transmitting log files to the
> >>standby server and having it process them?
> >>
> >
> >
> > I'm not certain if this is 8.0, but some folks have created a working
> > version against the 8.0 code that will do something like this. Search
> > the pgsql-hacker mail list archives for more information on it.
>
> I sent a post on hackers, I put it here:
>
> =======================================================================
> Hi all,
> I seen that Eric Kerin did the work suggested by Tom about
> how to use the PITR in order to have an hot spare postgres,
> writing a C program.
>
> I did the same writing 2 shell scripts, one of them perform
> the restore the other one deliver the partial filled wal and
> check if the postmaster is alive ( check if the pid process
> still exist ).
>
> With these two scripts I'm able to have an hot spare installation,
> and the spare one go alive when the first postmaster dies.
>
> How test it:
>
> 1) Master node:
> modify postgresql.conf using:
>
> ~ archive_command = 'cp %p /mnt/server/archivedir/%f'
>
> ~ launch postgres and perform a backup as doc
>
> ~ http://developer.postgresql.org/docs/postgres/backup-online.html
>
> suggest to do
>
> launch the script:
>
> partial_wal_deliver.sh <PID> /mnt/server/partialdir <pg_xlog path>
>
> ~ this script will delivery each 10 seconds the "current" wal file,
> ~ and touch the "alive" file in order to notify the spare node that
> ~ the master node is up and running
>
>
> 2) Spare node:
> create a recovery.conf with the line:
>
> ~ restore_command = 'restore.sh /mnt/server/archivedir/%f %p /mnt/server/partialdir'
>
> ~ replace the content of data directory with the backup performed at point 1,
> ~ remove any file present in the pg_xlog directory ( leaving there the archive_status
> ~ directory ) and remove the postmaster.pid file ( this is necessary if you are running
> ~ the spare postgres on the same hw ).
>
> ~ launch the postmaster, the restore will continue till the "alive" file present in the
> ~ /mnt/server/partialdir directory is not updated for 60 seconds ( you can modify this
> ~ values inside the restore.sh script ).
>
> Be sure that restore.sh and all directories involved are accessible
>
> Let me know.
>
>
> This is a first step, of course, as Eric Kerin did, is better port these script
> in C and make it more robust.
>
> Postgres can help this process, as suggested by Tom creating a pg_current_wal()
> or even better having two new GUC parameters: archive_current_wal_command and
> archive_current_wal_delay.
>
> I problem I discover during the tests is that if you shut down the spare node
> and the restore_command is still waiting for a file then the postmaster will never
> exit :-(
> ==========================================================================
>
> I hope that is clear.
>
>
>
> Regards
> Gaetano Mendola
>
>

> #!/bin/bash
>
>
> SOURCE=$1
> TARGET=$2
> PARTIAL=$3
>
> SIZE_EXPECTED=16777216 #bytes 16 MB
> DIED_TIME=60 #seconds
>
> function test_existence
> {
> if [ -f ${SOURCE} ]
> then
> COUNTER=0
>
> #I have to check if the file is begin copied
> #I assume that it will reach the right
> #size in a few seconds
>
> while [ $(stat -c '%s' ${SOURCE} ) -lt $SIZE_EXPECTED ]
> do
> sleep 1
> let COUNTER+=1
> if [ 20 -lt $COUNTER ]
> then
> exit 1 # BAILING OUT
> fi
> done
>
> cp $SOURCE $TARGET
> exit 0
> fi
> echo ${SOURCE}"> not found"
>
> #if is looking for a history file and not exist
> #I have suddenly exit
> echo $SOURCE | grep history > /dev/null 2>&1 && exit 1
> }
>
>
> while [ 1 ]
> do
>
> test_existence
>
> #CHECK IF THE MASTER IS ALIVE
> DELTA_TIME=$(( $( date +'%s' ) - $( stat -c '%Z' ${PARTIAL}/alive ) ))
> if [ $DIED_TIME -lt $DELTA_TIME ]
> then
> echo "Master is dead..."
> # Master is dead
> CURRENT_WAL=$( basename $SOURCE )
> echo "Partial: " ${PARTIAL}
> echo "Current wal: " ${CURRENT_WAL}
> echo "Target: " ${TARGET}
> cp ${PARTIAL}/${CURRENT_WAL}.partial ${TARGET} > /dev/null 2>&1 && exit 0
> exit 1
> fi
>
> sleep 1
>
> done

> #!/bin/bash
>
> PID=$1
> PARTIAL=$2
> PGXLOG=$3
>
> function copy_last_wal
> {
> FILE=$( ls -t1p $PGXLOG | grep -v / | head -1 )
>
> echo "Last Wal> " $FILE
>
> cp ${PGXLOG}/${FILE} ${PARTIAL}/${FILE}.tmp
> mv ${PARTIAL}/${FILE}.tmp ${PARTIAL}/${FILE}.partial
> find ${PARTIAL} -name *.partial | grep -v ${FILE} | xargs -i rm -fr {}
> }
>
>
> while [ 1 ]
> do
> ps --pid $PID > /dev/null 2>&1
> ALIVE=$?
>
> if [ "${ALIVE}" == "1" ]
> then
> #The process is dead
> echo "Process dead"
> copy_last_wal
> exit 1
> fi
>
> #The process still exist
> touch ${PARTIAL}/alive
> copy_last_wal
>
> sleep 10
> done

>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/docs/faqs/FAQ.html

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

In response to

Browse pgsql-admin by date

  From Date Subject
Next Message Talal Mahyoub Abdo 2005-06-05 08:45:37 Can you help me
Previous Message mery lee 2005-06-04 07:37:04 unsuscribe