From: | Michael Paquier <michael(dot)paquier(at)gmail(dot)com> |
---|---|
To: | Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com> |
Cc: | Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: logical replication and PANIC during shutdown checkpoint in publisher |
Date: | 2017-04-26 01:47:46 |
Message-ID: | CAB7nPqRB4BPj7p8hqLFceV8fNTMyX2Fa7a+oqn1Pyr3gi_rvgg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Apr 26, 2017 at 3:17 AM, Peter Eisentraut
<peter(dot)eisentraut(at)2ndquadrant(dot)com> wrote:
> On 4/21/17 00:11, Michael Paquier wrote:
>> Hmm. I have been actually looking at this solution and I am having
>> doubts regarding its robustness. In short this would need to be
>> roughly a two-step process:
>> - In PostmasterStateMachine(), SIGUSR2 is sent to the checkpoint to
>> make it call ShutdownXLOG(). Prior doing that, a first signal should
>> be sent to all the WAL senders with
>> SignalSomeChildren(BACKEND_TYPE_WALSND). SIGUSR2 or SIGINT could be
>> used.
>> - At reception of this signal, all WAL senders switch to a stopping
>> state, refusing commands that can generate WAL.
>> - Checkpointer looks at the state of all WAL senders, looping with a
>> sleep call of a couple of ms, refusing to launch the shutdown
>> checkpoint as long as all WAL senders have not switched to the
>> stopping state.
>> - In reaper(), once checkpointer is confirmed as stopped, signal again
>> the WAL senders, and tell them to perform the last loop.
>
> Yeah that looks like a reasonable approach.
>
> I'm not sure why in your patch you process got_SIGUSR2 in
> WalSndErrorCleanup() instead of in the main loop.
Yes I was hesitating about this one when hacking it. Thinking an extra
time, the similar check in StartReplication() should also not use
got_SIGUSR2 to give the WAL sender a chance to do more work while the
shutdown checkpoint is running as it could take minutes.
Attached is an updated patch to reflect that.
--
Michael
Attachment | Content-Type | Size |
---|---|---|
walsender-chkpt-v2.patch | application/octet-stream | 12.6 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Bruce Momjian | 2017-04-26 01:56:58 | Re: PG 10 release notes |
Previous Message | Andres Freund | 2017-04-26 01:40:08 | Re: PG 10 release notes |