Re: After switching primary server while using replication slot.

Lists: pgsql-hackers
From: Sawada Masahiko <sawada(dot)mshk(at)gmail(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: After switching primary server while using replication slot.
Date: 2014-08-18 14:16:45
Message-ID: CAD21AoDWkwwPmhWJY90UjO-4u1W5KK4NAY-9ewpU7f-8mgSmJA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi all,

After switching primary serer while using repliaction slot, the
standby server will not able to connect new primary server.
Imagine this situation, if primary server has two ASYNC standby
servers, also use each replication slots.
And the one standby(A) apply WAL without problems. But another one
standby(B) has stopped after connected to primary server.
(or sending WAL is too delayed)

In this situation, the standby(B) has not received WAL segment file
while stopping itself.
And the primary server can not remove WAL segments which has not been
received to all standby.
Therefore the primary server have to keep the WAL segment file which
has not been received to all standby.
But standby(A) can do checkpoint itself, and then it's possible to
recycle WAL segments.
The number of WAL segment of each server are different.
( The number of WAL files of standby(A) having smaller than primary server.)
After the primary server is crashed, the standby(A) promote to primary,
we can try to connect standby(B) to standby(A) as new standby server.
But it will be failed because the standby(A) server might not have WAL
segment files that standby(B) required.

To resolve this situation, I think that we should make master server
to notify about removal of WAL segment to all standby servers.
And the standby servers recycle WAL segments files base on that information.

Thought?

--
Regards,

-------
Sawada Masahiko


From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Sawada Masahiko <sawada(dot)mshk(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: After switching primary server while using replication slot.
Date: 2014-08-19 10:25:24
Message-ID: CAHGQGwGf=6rn7_Z6rCZ-H3Nz6VWp-ppXmsg6D41T2yM602m7XA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Aug 18, 2014 at 11:16 PM, Sawada Masahiko <sawada(dot)mshk(at)gmail(dot)com> wrote:
> Hi all,
>
> After switching primary serer while using repliaction slot, the
> standby server will not able to connect new primary server.
> Imagine this situation, if primary server has two ASYNC standby
> servers, also use each replication slots.
> And the one standby(A) apply WAL without problems. But another one
> standby(B) has stopped after connected to primary server.
> (or sending WAL is too delayed)
>
> In this situation, the standby(B) has not received WAL segment file
> while stopping itself.
> And the primary server can not remove WAL segments which has not been
> received to all standby.
> Therefore the primary server have to keep the WAL segment file which
> has not been received to all standby.
> But standby(A) can do checkpoint itself, and then it's possible to
> recycle WAL segments.
> The number of WAL segment of each server are different.
> ( The number of WAL files of standby(A) having smaller than primary server.)
> After the primary server is crashed, the standby(A) promote to primary,
> we can try to connect standby(B) to standby(A) as new standby server.
> But it will be failed because the standby(A) server might not have WAL
> segment files that standby(B) required.

This sounds valid concern.

> To resolve this situation, I think that we should make master server
> to notify about removal of WAL segment to all standby servers.
> And the standby servers recycle WAL segments files base on that information.
>
> Thought?

How does the server recycle WAL files after it's promoted from the
standby to master?
It does that as it likes? If yes, your approach would not be enough.

The approach prevents unexpected removal of WAL files while the standby
is running. But after the standby is promoted to master, it might recycle
needed WAL files immediately. So another standby may still fail to retrieve
the required WAL file after the promotion.

ISTM that, in order to address this, we might need to log all the replication
slot activities and replicate them to the standby. I'm not sure if this
breaks the design of replication slot at all, though.

Regards,

--
Fujii Masao


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: Sawada Masahiko <sawada(dot)mshk(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: After switching primary server while using replication slot.
Date: 2014-08-20 17:14:30
Message-ID: CA+TgmoYA7A0JJd_Nc+aode8yzePiVn6vwQy7ZHFLTQmrGGyyVQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Aug 19, 2014 at 6:25 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> On Mon, Aug 18, 2014 at 11:16 PM, Sawada Masahiko <sawada(dot)mshk(at)gmail(dot)com> wrote:
>> Hi all,
>> After switching primary serer while using repliaction slot, the
>> standby server will not able to connect new primary server.
>> Imagine this situation, if primary server has two ASYNC standby
>> servers, also use each replication slots.
>> And the one standby(A) apply WAL without problems. But another one
>> standby(B) has stopped after connected to primary server.
>> (or sending WAL is too delayed)
>>
>> In this situation, the standby(B) has not received WAL segment file
>> while stopping itself.
>> And the primary server can not remove WAL segments which has not been
>> received to all standby.
>> Therefore the primary server have to keep the WAL segment file which
>> has not been received to all standby.
>> But standby(A) can do checkpoint itself, and then it's possible to
>> recycle WAL segments.
>> The number of WAL segment of each server are different.
>> ( The number of WAL files of standby(A) having smaller than primary server.)
>> After the primary server is crashed, the standby(A) promote to primary,
>> we can try to connect standby(B) to standby(A) as new standby server.
>> But it will be failed because the standby(A) server might not have WAL
>> segment files that standby(B) required.
>
> This sounds valid concern.
>
>> To resolve this situation, I think that we should make master server
>> to notify about removal of WAL segment to all standby servers.
>> And the standby servers recycle WAL segments files base on that information.
>>
>> Thought?
>
> How does the server recycle WAL files after it's promoted from the
> standby to master?
> It does that as it likes? If yes, your approach would not be enough.
>
> The approach prevents unexpected removal of WAL files while the standby
> is running. But after the standby is promoted to master, it might recycle
> needed WAL files immediately. So another standby may still fail to retrieve
> the required WAL file after the promotion.
>
> ISTM that, in order to address this, we might need to log all the replication
> slot activities and replicate them to the standby. I'm not sure if this
> breaks the design of replication slot at all, though.

Yuck.

I believe that the reason why replication slots are not currently
replicated is because we had the idea that the standby could have
slots that don't exist on the master, for cascading replication. I'm
not sure that works yet, but I think Andres definitely had it in mind
in the original design.

It seems to me that if every machine needs to keep not only the WAL it
requires for itself, but also the WAL that any of other machine in the
replication hierarchy might need, that's pretty much sucks. Suppose
you have a master with 10 standbys, and each standby has 10 cascaded
standbys. If one of those standbys goes down, do we really want all
100 other machines to keep copies of all the WAL? That seems rather
unfortunate, since it's likely that only a few of those many standbys
are machines to which we would consider failing over.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Sawada Masahiko <sawada(dot)mshk(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: After switching primary server while using replication slot.
Date: 2014-08-22 14:29:12
Message-ID: 20140822142912.GQ17406@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

On 2014-08-20 13:14:30 -0400, Robert Haas wrote:
> On Tue, Aug 19, 2014 at 6:25 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> > On Mon, Aug 18, 2014 at 11:16 PM, Sawada Masahiko <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >> Hi all,
> >> After switching primary serer while using repliaction slot, the
> >> standby server will not able to connect new primary server.
> >> Imagine this situation, if primary server has two ASYNC standby
> >> servers, also use each replication slots.
> >> And the one standby(A) apply WAL without problems. But another one
> >> standby(B) has stopped after connected to primary server.
> >> (or sending WAL is too delayed)
> >>
> >> In this situation, the standby(B) has not received WAL segment file
> >> while stopping itself.
> >> And the primary server can not remove WAL segments which has not been
> >> received to all standby.
> >> Therefore the primary server have to keep the WAL segment file which
> >> has not been received to all standby.
> >> But standby(A) can do checkpoint itself, and then it's possible to
> >> recycle WAL segments.
> >> The number of WAL segment of each server are different.
> >> ( The number of WAL files of standby(A) having smaller than primary server.)
> >> After the primary server is crashed, the standby(A) promote to primary,
> >> we can try to connect standby(B) to standby(A) as new standby server.
> >> But it will be failed because the standby(A) server might not have WAL
> >> segment files that standby(B) required.
> >
> > This sounds valid concern.
> >
> >> To resolve this situation, I think that we should make master server
> >> to notify about removal of WAL segment to all standby servers.
> >> And the standby servers recycle WAL segments files base on that information.

I think that'll end up being really horrible, at least if done in an
obligatory fashion. In a cascaded setup it's really sensible to only
retain WAL on the intermediate nodes. Consider e.g. a setup - rather
common these days actually - where there's a master somewhere and then a
cascading standby on each continent feeding off to further nodes on that
continent. You don't want to retain nodes on each continent (or on the
primary) just because one node somewhere is down for maintenance.

If you really want something like this we should probably add the
infrastructure for one standby to maintain a replication slot on another
standby server. So, if you have a setup like:

A
/ \
/ \
B C
/ \ /\
.. .. .. ..

B and C can coordinate that they keep enough WAL for each other. You can
actually easily write a external tool for that today. Just create a
replication slot oin B for C and the other way round and have a tool
update them once a minute or so.

I'm not sure if we want that builtin.

> >> Thought?
> >
> > How does the server recycle WAL files after it's promoted from the
> > standby to master?
> > It does that as it likes? If yes, your approach would not be enough.
> >
> > The approach prevents unexpected removal of WAL files while the standby
> > is running. But after the standby is promoted to master, it might recycle
> > needed WAL files immediately. So another standby may still fail to retrieve
> > the required WAL file after the promotion.
> >
> > ISTM that, in order to address this, we might need to log all the replication
> > slot activities and replicate them to the standby. I'm not sure if this
> > breaks the design of replication slot at all, though.

Yes, that'd break it. You can't WAL log anything on a standby, and
replication slots can be modified on standbys.

> I believe that the reason why replication slots are not currently
> replicated is because we had the idea that the standby could have
> slots that don't exist on the master, for cascading replication. I'm
> not sure that works yet, but I think Andres definitely had it in mind
> in the original design.

That works. And it's absolutely required for adding logical decoding on
standbys (I've a prototype patch for it...).

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Sawada Masahiko <sawada(dot)mshk(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: After switching primary server while using replication slot.
Date: 2014-08-27 12:03:45
Message-ID: CAHGQGwHoMtUQRv8BQuoFqijP+DQA2Z7ErZ=6L6J4G+zVDENnWA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Aug 22, 2014 at 11:29 PM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> Hi,
>
> On 2014-08-20 13:14:30 -0400, Robert Haas wrote:
>> On Tue, Aug 19, 2014 at 6:25 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>> > On Mon, Aug 18, 2014 at 11:16 PM, Sawada Masahiko <sawada(dot)mshk(at)gmail(dot)com> wrote:
>> >> Hi all,
>> >> After switching primary serer while using repliaction slot, the
>> >> standby server will not able to connect new primary server.
>> >> Imagine this situation, if primary server has two ASYNC standby
>> >> servers, also use each replication slots.
>> >> And the one standby(A) apply WAL without problems. But another one
>> >> standby(B) has stopped after connected to primary server.
>> >> (or sending WAL is too delayed)
>> >>
>> >> In this situation, the standby(B) has not received WAL segment file
>> >> while stopping itself.
>> >> And the primary server can not remove WAL segments which has not been
>> >> received to all standby.
>> >> Therefore the primary server have to keep the WAL segment file which
>> >> has not been received to all standby.
>> >> But standby(A) can do checkpoint itself, and then it's possible to
>> >> recycle WAL segments.
>> >> The number of WAL segment of each server are different.
>> >> ( The number of WAL files of standby(A) having smaller than primary server.)
>> >> After the primary server is crashed, the standby(A) promote to primary,
>> >> we can try to connect standby(B) to standby(A) as new standby server.
>> >> But it will be failed because the standby(A) server might not have WAL
>> >> segment files that standby(B) required.
>> >
>> > This sounds valid concern.
>> >
>> >> To resolve this situation, I think that we should make master server
>> >> to notify about removal of WAL segment to all standby servers.
>> >> And the standby servers recycle WAL segments files base on that information.
>
> I think that'll end up being really horrible, at least if done in an
> obligatory fashion. In a cascaded setup it's really sensible to only
> retain WAL on the intermediate nodes. Consider e.g. a setup - rather
> common these days actually - where there's a master somewhere and then a
> cascading standby on each continent feeding off to further nodes on that
> continent. You don't want to retain nodes on each continent (or on the
> primary) just because one node somewhere is down for maintenance.
>
>
> If you really want something like this we should probably add the
> infrastructure for one standby to maintain a replication slot on another
> standby server. So, if you have a setup like:
>
> A
> / \
> / \
> B C
> / \ /\
> .. .. .. ..
>
> B and C can coordinate that they keep enough WAL for each other. You can
> actually easily write a external tool for that today. Just create a
> replication slot oin B for C and the other way round and have a tool
> update them once a minute or so.
>
> I'm not sure if we want that builtin.
>
>> >> Thought?
>> >
>> > How does the server recycle WAL files after it's promoted from the
>> > standby to master?
>> > It does that as it likes? If yes, your approach would not be enough.
>> >
>> > The approach prevents unexpected removal of WAL files while the standby
>> > is running. But after the standby is promoted to master, it might recycle
>> > needed WAL files immediately. So another standby may still fail to retrieve
>> > the required WAL file after the promotion.
>> >
>> > ISTM that, in order to address this, we might need to log all the replication
>> > slot activities and replicate them to the standby. I'm not sure if this
>> > breaks the design of replication slot at all, though.
>
> Yes, that'd break it. You can't WAL log anything on a standby, and
> replication slots can be modified on standbys.

So current solution for the problem Sawada reported is to increase
wal_keep_segments on the standby to enough high maybe.

Regards,

--
Fujii Masao