Re: Warm standby failover mechanism

Lists: pgsql-general
From: Thom Brown <thombrown(at)gmail(dot)com>
To: pgsql general <pgsql-general(at)postgresql(dot)org>
Subject: Warm standby failover mechanism
Date: 2009-02-24 16:55:45
Message-ID: bddc86150902240855g51ea8ff4t9378c9c6c3a69d13@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Hi all,

We're looking at setting up a warm-standby server using log shipping and
aren't too sure about how we should trigger failover. Is there a
commonly-used approach which is reliable enough to recommend? Looking at
the documentation, there doesn't seem to be any recommendation. I
preferrably don't want to use a witness server.

Also, what would you say is the best way to tell the failed primary server
that it is no longer the primary server?

Thanks

Thom


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Thom Brown <thombrown(at)gmail(dot)com>
Cc: pgsql general <pgsql-general(at)postgresql(dot)org>
Subject: Re: Warm standby failover mechanism
Date: 2009-02-24 18:39:49
Message-ID: 1235500789.16176.181.camel@ebony.2ndQuadrant
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general


On Tue, 2009-02-24 at 16:55 +0000, Thom Brown wrote:

> We're looking at setting up a warm-standby server using log shipping
> and aren't too sure about how we should trigger failover. Is there a
> commonly-used approach which is reliable enough to recommend? Looking
> at the documentation, there doesn't seem to be any recommendation. I
> preferrably don't want to use a witness server.
>
> Also, what would you say is the best way to tell the failed primary
> server that it is no longer the primary server?

http://www.postgresql.org/docs/8.3/static/pgstandby.html

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support


From: Thom Brown <thombrown(at)gmail(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: pgsql general <pgsql-general(at)postgresql(dot)org>
Subject: Re: Warm standby failover mechanism
Date: 2009-02-25 09:26:26
Message-ID: bddc86150902250126y3cc463a4o3102e283ac659bb7@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Thanks for the link Simon, but this doesn't recommend any method for
triggering failover, or telling the primary that another server is now
primary.

We've set up a primary server in archive mode to continuously archive to an
NFS mount, and the standby server to continuously recovery from that
directory (although I'm not sure that's actually working... I've probably
overlooked something). The problem we face is working out how to tell the
standby server that it is the primary. Yes this can be done with a trigger
file in /tmp, but how would that automatically appear there? And when the
failed server actually restarted, or the Postgres service restarts, how do
we tell it that it is no longer the primary?

Thanks

Thom

2009/2/24 Simon Riggs <simon(at)2ndquadrant(dot)com>

>
> On Tue, 2009-02-24 at 16:55 +0000, Thom Brown wrote:
>
> > We're looking at setting up a warm-standby server using log shipping
> > and aren't too sure about how we should trigger failover. Is there a
> > commonly-used approach which is reliable enough to recommend? Looking
> > at the documentation, there doesn't seem to be any recommendation. I
> > preferrably don't want to use a witness server.
> >
> > Also, what would you say is the best way to tell the failed primary
> > server that it is no longer the primary server?
>
> http://www.postgresql.org/docs/8.3/static/pgstandby.html
>
> --
> Simon Riggs www.2ndQuadrant.com
> PostgreSQL Training, Services and Support
>
>


From: Karsten Hilbert <Karsten(dot)Hilbert(at)gmx(dot)net>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Warm standby failover mechanism
Date: 2009-02-25 09:35:02
Message-ID: 20090225093502.GD3820@merkur.hilbert.loc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Wed, Feb 25, 2009 at 09:26:26AM +0000, Thom Brown wrote:

> We've set up a primary server in archive mode to continuously archive to an
> NFS mount, and the standby server to continuously recovery from that
> directory (although I'm not sure that's actually working... I've probably
> overlooked something). The problem we face is working out how to tell the
> standby server that it is the primary. Yes this can be done with a trigger
> file in /tmp, but how would that automatically appear there?

Think backwards: How would it automatically *dis*appear from
there. Have the secondary create it and check for existance
at regular intervals. Have the primary delete it at slightly
shorter intervals.

When the secondary finds it again after the check interval
the primary didn't delete it. Wait some more for good
measure and/or try to ping the primary some other way...

Just my 2 cents,
Karsten
--
GPG key ID E4071346 @ wwwkeys.pgp.net
E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346


From: Thom Brown <thombrown(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Warm standby failover mechanism
Date: 2009-02-25 09:48:13
Message-ID: bddc86150902250148t17acd5eeled757de0edb4ee92@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

2009/2/25 Karsten Hilbert <Karsten(dot)Hilbert(at)gmx(dot)net>

>
> Think backwards: How would it automatically *dis*appear from
> there. Have the secondary create it and check for existance
> at regular intervals. Have the primary delete it at slightly
> shorter intervals.
>
> When the secondary finds it again after the check interval
> the primary didn't delete it. Wait some more for good
> measure and/or try to ping the primary some other way...
>
>
Thanks for the suggestion. I'm wondering what the chances of such a
create/delete system would have of accidentally causing failover? Is it
possible to use the archive command parameter to firstly run the pg_standby
command, and follow it by a touch /tmp/pg.triggerfile by using &&?


From: Thom Brown <thombrown(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Warm standby failover mechanism
Date: 2009-02-25 13:54:25
Message-ID: bddc86150902250554n1b456248xf768a82b4b3ac461@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

I'm still trying to work out how this file creation/deletion thing will
work. If I can tag a "&& touch /tmp/pg.trigger" command to the end of the
recovery command, how often will that be called? If I can't, I still need
to ensure that it is created and deleted before the recovery command is
called, otherwise it will see it before it is deleted and put itself online.

Has anyone got a practical example of what they've set up, or know what
others have set up?


From: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To: Thom Brown <thombrown(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Warm standby failover mechanism
Date: 2009-02-25 15:54:16
Message-ID: 1235577256.16842.6.camel@jd-laptop.pragmaticzealot.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Wed, 2009-02-25 at 13:54 +0000, Thom Brown wrote:
> I'm still trying to work out how this file creation/deletion thing
> will work. If I can tag a "&& touch /tmp/pg.trigger" command to the
> end of the recovery command, how often will that be called? If I
> can't, I still need to ensure that it is created and deleted before
> the recovery command is called, otherwise it will see it before it is
> deleted and put itself online.
>
> Has anyone got a practical example of what they've set up, or know
> what others have set up?

You are doing this the hard way. Grab PITRTTools.

https://projects.commandprompt.com/public/pitrtools

Just pull down the stable branch:

svn co
https://projects.commandprompt.com/public/pitrtools/repo/branches/1.2

Sincerely,

Joshua D. Drake

--
PostgreSQL - XMPP: jdrake(at)jabber(dot)postgresql(dot)org
Consulting, Development, Support, Training
503-667-4564 - http://www.commandprompt.com/
The PostgreSQL Company, serving since 1997


From: Thom Brown <thombrown(at)gmail(dot)com>
To: jd(at)commandprompt(dot)com
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Warm standby failover mechanism
Date: 2009-02-25 16:33:26
Message-ID: bddc86150902250833l2b3d835fr4bb0a4643177fcd7@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

>
> You are doing this the hard way. Grab PITRTTools.
>
> https://projects.commandprompt.com/public/pitrtools
>
>
> I can't really dispute a recommendation from JD. I'll have to look into
that. It's a shame because we've spent ages trying to work out where we've
been going wrong in this whole process (we still haven't got it picking up
WALs from the archive).

Thanks!

Thom


From: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To: Thom Brown <thombrown(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Warm standby failover mechanism
Date: 2009-02-25 18:10:54
Message-ID: 1235585454.24423.12.camel@jd-laptop.pragmaticzealot.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Wed, 2009-02-25 at 16:33 +0000, Thom Brown wrote:
> You are doing this the hard way. Grab PITRTTools.
>
> https://projects.commandprompt.com/public/pitrtools
>
>
> I can't really dispute a recommendation from JD. I'll have to look
> into that. It's a shame because we've spent ages trying to work out
> where we've been going wrong in this whole process (we still haven't
> got it picking up WALs from the archive).

As a note, all PITRTools does is wrap around all the tools that you are
trying to make work. So you will still need pg_standby, rsync, ssh
etc...

I have updated the wiki to make it a bit more friendly.

https://projects.commandprompt.com/public/pitrtools/wiki

Sincerely,

Joshua D. Drake

--
PostgreSQL - XMPP: jdrake(at)jabber(dot)postgresql(dot)org
Consulting, Development, Support, Training
503-667-4564 - http://www.commandprompt.com/
The PostgreSQL Company, serving since 1997


From: Thom Brown <thombrown(at)gmail(dot)com>
To: jd(at)commandprompt(dot)com
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Warm standby failover mechanism
Date: 2009-02-25 21:39:35
Message-ID: bddc86150902251339t27992e1w454d25366bebe180@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

> As a note, all PITRTools does is wrap around all the tools that you are
> trying to make work. So you will still need pg_standby, rsync, ssh
> etc...
>
> I have updated the wiki to make it a bit more friendly.
>
> https://projects.commandprompt.com/public/pitrtools/wiki
>
> I've given PITRTools a try, but I have been unsuccessful in utilising it.
I've configured both ini files, everything owned by user postgres, and upon
running "./cmd_archiver -C cmd_archiver.ini" I get:

NOTICE: check_config_func()
NOTICE: Performing standard archive
NOTICE: archive_func()
NOTICE: send_queue_func()
NOTICE: list_queue_func()
NOTICE: generate_slave_list_func()
NOTICE: Your slaves are: ['192.168.1.17']
Traceback (most recent call last):
File "../pitr_tools/cmd_archiver", line 343, in <module>
archive_func()
File "../pitr_tools/cmd_archiver", line 254, in archive_func
queue = send_queue_func()
File "../pitr_tools/cmd_archiver", line 219, in send_queue_func
for host in list_queue_func():
File "../pitr_tools/cmd_archiver", line 202, in list_queue_func
list_archives = os.listdir(queuedir)
OSError: [Errno 2] No such file or directory: '/var/lib/postgresql/archive/
192.168.1.17'

The only time that IP address appears in the INI file is on the slaves line,
so I'm not sure why it's trying to find a directory called that.


From: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To: Thom Brown <thombrown(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Warm standby failover mechanism
Date: 2009-02-25 21:51:45
Message-ID: 1235598705.24423.28.camel@jd-laptop.pragmaticzealot.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Wed, 2009-02-25 at 21:39 +0000, Thom Brown wrote:
>
> As a note, all PITRTools does is wrap around all the tools
> that you are
> trying to make work. So you will still need pg_standby, rsync,
> ssh
> etc...
>
> I have updated the wiki to make it a bit more friendly.
>
> https://projects.commandprompt.com/public/pitrtools/wiki
>
>
>
> I've given PITRTools a try, but I have been unsuccessful in utilising
> it. I've configured both ini files, everything owned by user
> postgres, and upon running "./cmd_archiver -C cmd_archiver.ini" I get:
>
> NOTICE: check_config_func()
> NOTICE: Performing standard archive
> NOTICE: archive_func()
> NOTICE: send_queue_func()
> NOTICE: list_queue_func()
> NOTICE: generate_slave_list_func()
> NOTICE: Your slaves are: ['192.168.1.17']
> Traceback (most recent call last):
> File "../pitr_tools/cmd_archiver", line 343, in <module>
> archive_func()
> File "../pitr_tools/cmd_archiver", line 254, in archive_func
> queue = send_queue_func()
> File "../pitr_tools/cmd_archiver", line 219, in send_queue_func
> for host in list_queue_func():
> File "../pitr_tools/cmd_archiver", line 202, in list_queue_func
> list_archives = os.listdir(queuedir)
> OSError: [Errno 2] No such file or directory:
> '/var/lib/postgresql/archive/192.168.1.17'
>
> The only time that IP address appears in the INI file is on the slaves
> line, so I'm not sure why it's trying to find a directory called that.

Looks like you didn't run cmd_archiver -C <config_file> -I

Joshua D. Drake

--
PostgreSQL - XMPP: jdrake(at)jabber(dot)postgresql(dot)org
Consulting, Development, Support, Training
503-667-4564 - http://www.commandprompt.com/
The PostgreSQL Company, serving since 1997


From: Thom Brown <thombrown(at)gmail(dot)com>
To: jd(at)commandprompt(dot)com
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Warm standby failover mechanism
Date: 2009-02-25 22:06:35
Message-ID: bddc86150902251406y22dd0291q20a137b12950bb94@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

> Looks like you didn't run cmd_archiver -C <config_file> -I
>

Ahh, okay, that did something, which I think means it created a directory
named after the slave IP in the archive directory. I didn't see any mention
of that switch in the README file, unless it's mentioned elsewhere and I
missed it.

Now when I run "./cmd_archiver -C cmd_archiver.ini" I get:

rsync: link_stat "/var/lib/postgresql/pitr_tools/None" failed: No such file
or directory (2)
rsync error: some files could not be transferred (code 23) at main.c(1058)
[sender=3.0.3]
rsync: link_stat "/var/lib/postgresql/pitr_tools/None" failed: No such file
or directory (2)
rsync error: some files could not be transferred (code 23) at main.c(1058)
[sender=3.0.3]
FATAL: Unabled to rsync_transfer or queue_transfer
CRITICAL 5888


From: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To: Thom Brown <thombrown(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Warm standby failover mechanism
Date: 2009-02-25 22:10:47
Message-ID: 1235599847.24423.35.camel@jd-laptop.pragmaticzealot.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Wed, 2009-02-25 at 22:06 +0000, Thom Brown wrote:
>
> Looks like you didn't run cmd_archiver -C <config_file> -I
>
>
>
> Ahh, okay, that did something, which I think means it created a
> directory named after the slave IP in the archive directory.

Right that is the queue directory.

> Now when I run "./cmd_archiver -C cmd_archiver.ini" I get:
>
That command really shouldn't do anything but error. The whole point of
the archiver is to be placed in the archive_command option in the
postgresql.conf. E.g;

cmd_archive -C cmd_archive.ini -F %p

This really should be happening on the pitrtools list. Let's bounce over
there and resolve this.

Joshua D. Drake

>
--
PostgreSQL - XMPP: jdrake(at)jabber(dot)postgresql(dot)org
Consulting, Development, Support, Training
503-667-4564 - http://www.commandprompt.com/
The PostgreSQL Company, serving since 1997