Re: Some problems about cascading replication

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Some problems about cascading replication
Date: 2011-08-16 14:56:13
Message-ID: 4E4A850D.1000202@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 16.08.2011 16:25, Simon Riggs wrote:
> On Tue, Aug 16, 2011 at 9:55 AM, Fujii Masao<masao(dot)fujii(at)gmail(dot)com> wrote:
>
>> When I tested the PITR on git master with max_wal_senders> 0,
>> I found that the following inappropriate log meesage was always
>> output even though cascading replication is not in progress. Attached
>> patch fixes this problem.
>>
>> LOG: terminating all walsender processes to force cascaded
>> standby(s) to update timeline and reconnect
>>
>> When making the patch, I found another problem about cascading
>> replication; When promoting a cascading standby, postmaster sends
>> SIGUSR2 to any cascading walsenders to kill them. But there is a
>> orner-case where such walsender fails to receive SIGUSR2 and
>> survives a standby promotion unexpectedly. This happens when
>> postmaster sends SIGUSR2 before the walsender marks itself as
>> a WAL sender, because postmaster sends SIGUSR2 to only the
>> processes marked as a WAL sender.
>>
>> To avoid the corner-case, I changed walsender so that it checks
>> whether recovery is in progress or not again after marking itself
>> as a WAL sender. If recovery is not in progress even though the
>> walsender is cascading one, it does the same thing as SIGUSR2
>> signal handler does, and then exits later. Attached patch also includes
>> this fix.
>
> Looks like valid problems and appropriate fixes to me. Will commit.

I think there's a race condition here. If a walsender is just starting
up, it might not have registered itself as a walsender yet. It's
actually been there before this patch to suppress the log message.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Magnus Hagander 2011-08-16 15:00:06 Re: some missing internationalization in pg_basebackup
Previous Message Jean-Baptiste Quenot 2011-08-16 14:52:49 Re: plpython crash