Cascade replication

From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Cascade replication
Date: 2011-06-14 05:08:26
Message-ID: BANLkTi=wZAR5DN28ZEyU6295+453_OeSAQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, May 25, 2011 at 2:01 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> I'd like to propose cascade replication feature (i.e., allow the
> standby to accept
> replication connection from another standby) for 9.2. This feature is useful to
> reduce the overhead of the master since by using that we can decrease the
> number of standbys directly connecting to the master.
>
> I attached the WIP patch, which changes walsender so that it starts replication
> even during recovery. Then, the walsender attempts to send all WAL that's
> already been fsync'd to the standby's disk (i.e., send WAL up to the bigger
> location between the receive location and the replay one). When the standby is
> promoted, all walsenders in that standby end because they cannot continue
> replication any more in that case because of the timeline mismatch.
>
> The standby must not accept replication connection from that standby itself.
> Otherwise, since any new WAL data would not appear in that standby,
> replication cannot advance any more. As a safeguard against this, I introduced
> new ID to identify each instance. The walsender sends that ID as the fourth
> field of the reply of IDENTIFY_SYSTEM, and then walreceiver checks whether
> the IDs are the same between two servers. If they are the same, which means
> that the standby is just connecting to that standby itself, so walreceiver
> emits ERROR.
>
> One remaining problem which I'll have to tackle is that: Even while walreceiver
> is not in progress (i.e., the startup process is retrieving WAL file from the
> archive), the cascading walsender should continuously send new WAL data.
> This means that the walsender should send the WAL file restored from the
> archive. The problem is that the name of such a restored WAL file is always
> "RECOVERYXLOG". For now, walsender cannot handle the WAL file with such
> a name.
>
> To address the above problem, I'm thinking to make the startup process restore
> the WAL file with its real name instead of "RECOVERYXLOG". Then, like in the
> master, the walsender can read and send the restored WAL file. The required
> WAL file can be recycled before being sent. So we might need to enable
> wal_keep_segments setting even in the standby.

Done.

Updated patch attached.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment Content-Type Size
cascade_replication_v1.patch text/x-patch 43.2 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Noah Misch 2011-06-14 05:08:27 Re: On-the-fly index tuple deletion vs. hot_standby
Previous Message Noah Misch 2011-06-14 04:28:53 Re: On-the-fly index tuple deletion vs. hot_standby