Re: Missing docs for SR

From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Missing docs for SR
Date: 2010-01-20 07:57:16
Message-ID: 3f0b79eb1001192357h5e0773eamd055c88f56a3c928@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jan 20, 2010 at 7:30 AM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
> So, here's a must-fix item for SR for release: we need adequate docs.
> I'm happy to write these but *I* need to understand the answers first.

Thanks a lot!

> The current docs and wiki page do not explain:
>
> * How (technically) the slave listens for LSNs

(Though I might not have understood your point correctly,) LSN is sent
from the master together with the WAL records. The protocol which SR
uses has been documented in the following page.
http://developer.postgresql.org/pgdocs/postgres/protocol-replication.html

> * Does the walreceiver need the archive (via archive_command) copies of
> the WAL files after it's caught up with the master?

No. And, an archived WAL file is not required for the slave even before
it's caught up with the master.

When the slave is started from the base backup;

1. The startup process tries to perform a normal archive recovery. If
restore_command is not supplied in the recovery.conf, only the WAL
files in pg_xlog are replayed. So restore_command is optional for SR.

2. When the startup process finds the invalid record (including "ENOENT"
of the next WAL file), it requests the postmaster to start walreceiver
process.

3. The walreceiver connects to the master and requests the WAL following
the LSN of that invalid record. Then the WAL records are shipped
continuously to the walreceiver, and written to the slave's disk.

OTOH, the startup process waits until the next record has been written
by the walreceiver, and then reads and applies it. The startup process
continues this stop-and-go recovery.

When you use the old base backup for the slave, not all of the WAL files
required for the slave exist in the master's pg_xlog, i.e., the master
might be unable to ship some of those files. In this case, if you use the
restore_command which accesses the master's archive, those missing files
can be applied on the slave in the phase #1.

> I've tried to dig this information out of the wiki and mailing list
> archives and can't quite figure it out.  Is there a tech doc which was
> not posted anywhere public, or do I need to just RTFC?

Nope. If you have any questions, please feel free to get back to me.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Joachim Wieland 2010-01-20 08:25:40 Re: Listen / Notify - what to do when the queue is full
Previous Message Heikki Linnakangas 2010-01-20 07:41:17 Re: Missing docs for SR