pg 9.0, streaming replication, fail over and fail back strategies

From: "Kyle R(dot) Burton" <kyle(dot)burton(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: pg 9.0, streaming replication, fail over and fail back strategies
Date: 2010-08-09 22:10:47
Message-ID: AANLkTikY_w=mD=s+zE5memdBFWiYJWuMNrG422X-+LC-@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hello,

I'm new to the list and not even sure if this is the right place to be
posting this...

I've worked through the documentation for postgres 9.0 (beta2) and
have successfully set up a master and hot slave configured with
streaming replication (and xlog shipping).  That configuration seems
to be correctly updating the slave and the slave accepts read queries
and shows up to date table data (based on testing by hand with some
DDL and insert queries).

Now that I have that successfully configured, I have manually
performed a fail over by stopping the master, moving a virtual IP
address from the master to the slave, and touched the trigger file on
the slave.  This worked as expected and the former slave promoted
itself to being a full read/write master.

I went through the process of failing back manually by dumping the
database on the slave, restoring it on the master, moving the VIP back
and renaming the recovery.done back to recovery.conf.  This took some
time and required several steps, but was also successful.

After I had moved the VIP from the master to the slave, I had to
restart (not just reload) the postgres daemon to get it to start
listening on the new ip address (it was previously listening to
another IP [10.x.x.y] on the same NIC [eth0]).  I have the
listen_addresses configured to listen on both an internal (10.x.x.y)
address as well as the vip (10.x.x.z), but the interface on the slave
did not have this ip address at the time Postgres was started (so I'm
not all that surprised it didn't bind to that address on becoming the
master).

Is there any way to get PostgreSQL to bind to a new ip address and
interface without actually shutting it down?  If it could, would I
need to break all the current (read only) client connections to get
them to reconnect and have the ability to write?  (am I confused about
this?)

I've set up corosync (part of linux-ha) to manage the VIP, but so far
not to manage postgres itself.  I've set up postgres to be managed
manually (start and stop).

Now that the master+slave configuration is up and running again, I'm
looking for advice on how to monitor for faults: I can fail over
manually, which is fine for now.  What aspects of the postgres system
should be monitored to watch for faults and what are the kinds of
faults that should lead to a fail over?  The machine crashing (OS/HW)
is an obvious one, which will be recognized by corosync and I can
script the initiation of failover (including using ipmi to power down
the master).

Thank you for your time.

Kyle Burton

--
Twitter: @kyleburton
Blog: http://asymmetrical-view.com/
Fun: http://snapclean.me/

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Scott Marlowe 2010-08-09 22:17:14 Re: pg 9.0, streaming replication, fail over and fail back strategies
Previous Message José María Terry Jiménez 2010-08-09 22:04:54 Re: How to reference a subquery column alias?