Re: WIP: Failover Slots

From: Oleksii Kliukin <alexk(at)hintbits(dot)com>
To: Craig Ringer <craig(at)2ndquadrant(dot)com>
Cc: Petr Jelinek <petr(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: WIP: Failover Slots
Date: 2016-02-22 15:39:57
Message-ID: FA68178E-F0D1-47F6-9791-8A3E2136C119@hintbits.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

> On 16 Feb 2016, at 09:11, Craig Ringer <craig(at)2ndquadrant(dot)com> wrote:
>
>
>
> Revision attached. There was a file missing from the patch too.
>

All attached patches apply normally. I only took a look at first 2, but also tried to run the Patroni with the modified version to check whether the basic replication works.

What it’s doing is calling pg_basebackup first to initialize the replica, and that actually failed with:

_basebackup: unexpected termination of replication stream: ERROR: requested WAL segment 000000010000000000000000 has already been removed

The segment name definitely looks bogus to me.

The actual command causing the failure was an attempt to clone the replica using pg_basebackup, turning on xlog streaming:

pg_basebackup --pgdata data/postgres1 --xlog-method=stream --dbname="host=localhost port=5432 user=replicator”

I checked the same command against the git master without the patches applied and could not reproduce this problem there.

On the code level, I have no comments on 0001, it’s well documented and I have no questions about the approach, although I might be not too knowledgable to judge the specifics of the implementation.

On the 0002, there are a few rough edges:

slots.c:294
elog(LOG, "persistency is %i", (int)slot->data.persistency);

Should be changed to DEBUG?

slot.c:468
Why did you drop “void" as a parameter type of ReplicationSlotDropAcquired?

walsender.c: 1509 at PhysicalConfirmReceivedLocation

I’ve noticed a comment stating that we don’t need to call ReplicationSlotSave(), but that pre-dated the WAL-logging of replication slot changes. Don’t we need to call it now, the same way it’s done for the logical slots in logical.c:at LogicalConfirmReceivedLocation?

Kind regards,
--
Oleksii

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2016-02-22 15:57:55 Re: checkpointer continuous flushing - V18
Previous Message Alvaro Herrera 2016-02-22 15:29:33 Re: Writing new unit tests with PostgresNode