Cascade replication (WIP)

From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Cascade replication (WIP)
Date: 2011-05-24 17:01:31
Message-ID: BANLkTi=eNxW9eLPxDcYQZ+ADRU8-16aUrQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

I'd like to propose cascade replication feature (i.e., allow the
standby to accept
replication connection from another standby) for 9.2. This feature is useful to
reduce the overhead of the master since by using that we can decrease the
number of standbys directly connecting to the master.

I attached the WIP patch, which changes walsender so that it starts replication
even during recovery. Then, the walsender attempts to send all WAL that's
already been fsync'd to the standby's disk (i.e., send WAL up to the bigger
location between the receive location and the replay one). When the standby is
promoted, all walsenders in that standby end because they cannot continue
replication any more in that case because of the timeline mismatch.

The standby must not accept replication connection from that standby itself.
Otherwise, since any new WAL data would not appear in that standby,
replication cannot advance any more. As a safeguard against this, I introduced
new ID to identify each instance. The walsender sends that ID as the fourth
field of the reply of IDENTIFY_SYSTEM, and then walreceiver checks whether
the IDs are the same between two servers. If they are the same, which means
that the standby is just connecting to that standby itself, so walreceiver
emits ERROR.

One remaining problem which I'll have to tackle is that: Even while walreceiver
is not in progress (i.e., the startup process is retrieving WAL file from the
archive), the cascading walsender should continuously send new WAL data.
This means that the walsender should send the WAL file restored from the
archive. The problem is that the name of such a restored WAL file is always
"RECOVERYXLOG". For now, walsender cannot handle the WAL file with such
a name.

To address the above problem, I'm thinking to make the startup process restore
the WAL file with its real name instead of "RECOVERYXLOG". Then, like in the
master, the walsender can read and send the restored WAL file. The required
WAL file can be recycled before being sent. So we might need to enable
wal_keep_segments setting even in the standby.

Comments? Objections?

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment Content-Type Size
cascade_replication_v0.patch text/x-diff 18.7 KB

Browse pgsql-hackers by date

  From Date Subject
Next Message Noah Misch 2011-05-24 17:06:26 Re: Domains versus polymorphic functions, redux
Previous Message Greg Smith 2011-05-24 16:52:00 Re: 9.2 schedule