Re: Warm standby: 1 to N

From: Yar Tikhiy <yar(at)barnet(dot)com(dot)au>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Warm standby: 1 to N
Date: 2009-07-08 05:17:44
Message-ID: 76CC251E-8A15-41D5-B569-C06B3200B9EB@barnet.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Tue, Jun 02, 2009 at 02:52:26PM -0400, Bruce Momjian wrote:
> Yaroslav Tykhiy wrote:
>> Hi All,
>>
>> Let's consider the following case: WAL segments from a master have
>> been shipped to N warm standby servers, and now the master fails.
>> Using this or that mechanism, one of the warm standbys takes over and
>> becomes the new master. Now the question is what to do with the
>> other
>> N-1 warm standbys. By the failure, all N warm standbys were the same
>> exact copies of the master. So at least in theory, the N-1 warm
>> standbys left can be fed with WAL segments from the new master. Do
>> you think it will work in practice? Are there any pitfalls?
>
> I think it should work.

Bruce, thank you a lot for the encouragement! I had a chance to go a
step further and fail over to a warm stand-by server without losing a
singe transaction. Now I'm happy to share my experience with the
community.

The initial setup was as follows: Server A was the master, servers B
and C were warm stand-bys. The task was to fail over from A to B in a
controlled manner whilst keeping C running as a warm stand-by.

Both B and C were initially running with archive_command set as follows:

archive_command='/some/path/archive.sh "%p" "%f"'

where archive.sh contained just "exit 1". So a real archive script
could be atomically mv'ed in place later without losing any WAL
segments. (Note that the archiver process is supposed to queue
segments and keep retrying as long as the archive command is exiting
with a non-zero status.)

After making sure B and C were keeping up with A, the latter was shut
down. Then the last, incomplete WAL segment NNN was manually copied
from A (pg_controldata was useful to find its name) to B's WAL
shipping spool for the restore script to pick it up.

B processed segment NNN and, upon reaching its logical end, exited
recovery mode. At this moment all the clients were switched over to
B. Now the master, B continued writing its transaction log to segment
NNN, filling it up and moving on to the next segment NNN+1.

(On the one hand, it was quite unexpected that B didn't move on to a
new timeline upon exiting recovery mode. On the other hand, had it
done so, the whole trick would have been impossible. Please correct
me if I'm wrong. Just in case, the Postgresql version was 8.0.6.
Yes, it's ancient and sorely needs an upgrade.)

Now segment NNN was full and contained both the last transactions from
A and the first transactions from B. It was time to ship NNN from B
to C in order to bring C in line with B -- without disrupting C's
recovery mode. A real archive script was substituted for the dummy
script on B. At the next retry the script shipped segment NNN to C
and so the WAL shipping train got going B->C.

A possible pitfall to watch out for is this: If the WAL shipping spool
is shared between B and C, e.g., NFS based, just copying segment NNN
to it will make both B and C exit recovery mode. To avoid that, at
least in theory, segment NNN can be copied directly into B's pg_xlog
and then B's restore command needs to be signalled to return a non-
zero status. According to the manual, the recovery process is
supposed to look in pg_xlog as a final resort in case the restore
command returned an error status. However, I didn't try that as I had
separate, local WAL spools on B and C.

Hoping all this stuff helps somebody...

Yar

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Mark Gabriel Paylaga 2009-07-08 06:30:23 Polygon data type
Previous Message CM J 2009-07-08 04:33:57 Re: Postgres 8.3.7 Server Crash: failed to add item to the right sibling in index