checkpointer code behaving strangely on postmaster -T

Lists: pgsql-hackers
From: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: checkpointer code behaving strangely on postmaster -T
Date: 2012-03-30 16:31:56
Message-ID: 1333124720-sup-6193@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


I noticed while doing some tests that the checkpointer process does not
recover very nicely after a backend crashes under postmaster -T (after
all processes have been kill -CONTd, of course, and postmaster told to
shutdown via Ctrl-C on its console). For some reason it seems to get
stuck on a loop doing sleep(0.5s) In other case I caught it trying to
do a checkpoint, but it was progressing a single page each time and then
sleeping. In that condition, the checkpoint took a very long time to
finish.

Pressing Ctrl-C in the postmaster console at that point does not have
any effect.

--
Álvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: checkpointer code behaving strangely on postmaster -T
Date: 2012-05-10 06:27:32
Message-ID: 20840.1336631252@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> writes:
> I noticed while doing some tests that the checkpointer process does not
> recover very nicely after a backend crashes under postmaster -T (after
> all processes have been kill -CONTd, of course, and postmaster told to
> shutdown via Ctrl-C on its console). For some reason it seems to get
> stuck on a loop doing sleep(0.5s) In other case I caught it trying to
> do a checkpoint, but it was progressing a single page each time and then
> sleeping. In that condition, the checkpoint took a very long time to
> finish.

Is this still a problem as of HEAD? I think I've fixed some issues in
the checkpointer's outer loop logic, but not sure if what you saw is
still there.

regards, tom lane


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: checkpointer code behaving strangely on postmaster -T
Date: 2012-05-10 15:04:54
Message-ID: 1336659379-sup-7447@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Excerpts from Tom Lane's message of jue may 10 02:27:32 -0400 2012:
> Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> writes:
> > I noticed while doing some tests that the checkpointer process does not
> > recover very nicely after a backend crashes under postmaster -T (after
> > all processes have been kill -CONTd, of course, and postmaster told to
> > shutdown via Ctrl-C on its console). For some reason it seems to get
> > stuck on a loop doing sleep(0.5s) In other case I caught it trying to
> > do a checkpoint, but it was progressing a single page each time and then
> > sleeping. In that condition, the checkpoint took a very long time to
> > finish.
>
> Is this still a problem as of HEAD? I think I've fixed some issues in
> the checkpointer's outer loop logic, but not sure if what you saw is
> still there.

Yep, it's still there as far as I can tell. A backtrace from the
checkpointer shows it's waiting on the latch.

It seems to me that the bug is in the postmaster state machine rather
than checkpointer itself. After a few false starts, this seems to fix
it:

--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -2136,6 +2136,8 @@ pmdie(SIGNAL_ARGS)
signal_child(WalWriterPID, SIGTERM);
if (BgWriterPID != 0)
signal_child(BgWriterPID, SIGTERM);
+ if (FatalError && CheckpointerPID != 0)
+ signal_child(CheckpointerPID, SIGUSR2);

/*
* If we're in recovery, we can't kill the startup process
@@ -2178,6 +2180,8 @@ pmdie(SIGNAL_ARGS)
signal_child(WalReceiverPID, SIGTERM);
if (BgWriterPID != 0)
signal_child(BgWriterPID, SIGTERM);
+ if (FatalError && CheckpointerPID != 0)
+ signal_child(CheckpointerPID, SIGUSR2);
if (pmState == PM_RECOVERY)
{
/* only checkpointer is active in this state */

Note that since checkpointer can only be running after we enter
FatalError when the -T (send SIGSTOP instead of SIGQUIT) switch is used,
this bug doesn't seem to affect normal usage. (I'm not sure SIGUSR2 is
the most appropriate signal to send at this time -- since we're in
FatalError, probably SIGQUIT is better suited.)

One good thing is that when I patched postmaster in a different way
(which I later realized to be bogus), I caused it to die with an
assertion while checkpointer was still running; the debug output let me
know that checkpointer went away immediately.

--
Álvaro Herrera <alvherre(at)commandprompt(dot)com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: checkpointer code behaving strangely on postmaster -T
Date: 2012-05-10 15:14:01
Message-ID: 994.1336662841@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> Excerpts from Tom Lane's message of jue may 10 02:27:32 -0400 2012:
>> Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> writes:
> I noticed while doing some tests that the checkpointer process does not
> recover very nicely after a backend crashes under postmaster -T

> It seems to me that the bug is in the postmaster state machine rather
> than checkpointer itself. After a few false starts, this seems to fix
> it:

> --- a/src/backend/postmaster/postmaster.c
> +++ b/src/backend/postmaster/postmaster.c
> @@ -2136,6 +2136,8 @@ pmdie(SIGNAL_ARGS)
> signal_child(WalWriterPID, SIGTERM);
> if (BgWriterPID != 0)
> signal_child(BgWriterPID, SIGTERM);
> + if (FatalError && CheckpointerPID != 0)
> + signal_child(CheckpointerPID, SIGUSR2);

Surely we do not want the checkpointer doing a shutdown checkpoint here.
If we need it to die immediately, SIGQUIT is the way. If we want a
shutdown checkpoint, that has to wait till after everything else is
known dead. So while I agree this may be a state machine bug, that
doesn't look like a good fix.

regards, tom lane


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: checkpointer code behaving strangely on postmaster -T
Date: 2012-05-11 10:05:29
Message-ID: CA+U5nMLtBrNSwY7espyJMkFLTD_7Bdesd84MCR0MDaExT+2dWA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 10 May 2012 16:14, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
>> Excerpts from Tom Lane's message of jue may 10 02:27:32 -0400 2012:
>>> Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> writes:
>> I noticed while doing some tests that the checkpointer process does not
>> recover very nicely after a backend crashes under postmaster -T
>
>> It seems to me that the bug is in the postmaster state machine rather
>> than checkpointer itself.  After a few false starts, this seems to fix
>> it:
>
>> --- a/src/backend/postmaster/postmaster.c
>> +++ b/src/backend/postmaster/postmaster.c
>> @@ -2136,6 +2136,8 @@ pmdie(SIGNAL_ARGS)
>>                     signal_child(WalWriterPID, SIGTERM);
>>                 if (BgWriterPID != 0)
>>                     signal_child(BgWriterPID, SIGTERM);
>> +               if (FatalError && CheckpointerPID != 0)
>> +                   signal_child(CheckpointerPID, SIGUSR2);
>
> Surely we do not want the checkpointer doing a shutdown checkpoint here.
> If we need it to die immediately, SIGQUIT is the way.  If we want a
> shutdown checkpoint, that has to wait till after everything else is
> known dead.  So while I agree this may be a state machine bug, that
> doesn't look like a good fix.

Is this now fixed? You've made a few changes so I'm confused. Thanks.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: checkpointer code behaving strangely on postmaster -T
Date: 2012-05-11 20:50:01
Message-ID: 15684.1336769401@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> Excerpts from Tom Lane's message of jue may 10 02:27:32 -0400 2012:
>> Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> writes:
>>> I noticed while doing some tests that the checkpointer process does not
>>> recover very nicely after a backend crashes under postmaster -T (after
>>> all processes have been kill -CONTd, of course, and postmaster told to
>>> shutdown via Ctrl-C on its console). For some reason it seems to get
>>> stuck on a loop doing sleep(0.5s) In other case I caught it trying to
>>> do a checkpoint, but it was progressing a single page each time and then
>>> sleeping. In that condition, the checkpoint took a very long time to
>>> finish.

>> Is this still a problem as of HEAD? I think I've fixed some issues in
>> the checkpointer's outer loop logic, but not sure if what you saw is
>> still there.

> Yep, it's still there as far as I can tell. A backtrace from the
> checkpointer shows it's waiting on the latch.

I'm confused about what you did here and whether this isn't just pilot
error. If you run with -T then the postmaster will just SIGSTOP the
remaining child processes, but then it will sit and wait for them to
die, since the state machine expects them to react as though they'd been
sent SIGQUIT. If you SIGCONT any of them then that process will resume,
totally ignorant that it's supposed to die. So "kill -CONTd, of course"
makes no sense to me. I tried killing one child with -KILL, then
sending SIGINT to the postmaster, then killing the remaining
already-stopped children, and the postmaster did exit as expected after
the last child died.

So I don't see any bug here. And, after closer inspection, your
previous proposed patch is quite bogus because checkpointer is not
supposed to stop yet when the other processes are being terminated
normally.

Possibly it'd be useful to teach the postmaster more thoroughly about
SIGSTOP and have a way for it to really kill the remaining children
after you've finished investigating their state. But frankly this
is the first time I've heard of anybody using that feature at all;
I always thought it was a vestigial hangover from days when the kernel
was too stupid to write separate core dump files for each backend.
I'd rather remove SendStop than add more complexity there.

regards, tom lane


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: checkpointer code behaving strangely on postmaster -T
Date: 2012-05-11 21:19:13
Message-ID: 1336770275-sup-7739@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Excerpts from Tom Lane's message of vie may 11 16:50:01 -0400 2012:

> > Yep, it's still there as far as I can tell. A backtrace from the
> > checkpointer shows it's waiting on the latch.
>
> I'm confused about what you did here and whether this isn't just pilot
> error. If you run with -T then the postmaster will just SIGSTOP the
> remaining child processes, but then it will sit and wait for them to
> die, since the state machine expects them to react as though they'd been
> sent SIGQUIT.

The sequence of events is:
postmaster -T
crash a backend
SIGINT postmaster
SIGCONT all child processes

My expectation is that postmaster should exit normally after this. What
happens instead is that all processes exit, except checkpointer. And in
fact, postmaster is now in PM_WAIT_BACKENDS state, so sending SIGINT a
second time will not shutdown checkpointer either.

Maybe we can consider this to be just pilot error, but then why do all
other processes exit normally? To me it just seems an oversight in
checkpointer shutdown handling in conjuction with -T.

> If you SIGCONT any of them then that process will resume,
> totally ignorant that it's supposed to die. So "kill -CONTd, of course"
> makes no sense to me. I tried killing one child with -KILL, then
> sending SIGINT to the postmaster, then killing the remaining
> already-stopped children, and the postmaster did exit as expected after
> the last child died.

Uhm, after you SIGINTd postmaster didn't it shutdown all children? That
would be odd.

> So I don't see any bug here. And, after closer inspection, your
> previous proposed patch is quite bogus because checkpointer is not
> supposed to stop yet when the other processes are being terminated
> normally.

Well, it does send the signal only when FatalError is set. So it should
only affect -T behavior.

> Possibly it'd be useful to teach the postmaster more thoroughly about
> SIGSTOP and have a way for it to really kill the remaining children
> after you've finished investigating their state. But frankly this
> is the first time I've heard of anybody using that feature at all;
> I always thought it was a vestigial hangover from days when the kernel
> was too stupid to write separate core dump files for each backend.
> I'd rather remove SendStop than add more complexity there.

Hah. I've used it a few times, but I can see that multiple core files
are okay. Maybe you're right and we should just remove it.

--
Álvaro Herrera <alvherre(at)commandprompt(dot)com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: checkpointer code behaving strangely on postmaster -T
Date: 2012-05-11 21:44:50
Message-ID: 17504.1336772690@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> Excerpts from Tom Lane's message of vie may 11 16:50:01 -0400 2012:
>> I'm confused about what you did here and whether this isn't just pilot
>> error.

> The sequence of events is:
> postmaster -T
> crash a backend
> SIGINT postmaster
> SIGCONT all child processes

> My expectation is that postmaster should exit normally after this.

Well, my expectation is that the postmaster should wait for the children
to finish dying, and then exit rather than respawn anything. It is not
on the postmaster's head to make them die anymore, because it already
(thinks it) sent them SIGQUIT. Using SIGCONT here is pilot error.

> Maybe we can consider this to be just pilot error, but then why do all
> other processes exit normally?

The reason for that is that the postmaster's SIGINT interrupt handler
(lines 2163ff) sent them SIGTERM, without bothering to notice that we'd
already sent them SIGQUIT/SIGSTOP; so once you CONT them they get the
SIGTERM and drop out normally. That handler knows it should not signal
the checkpointer yet, so the checkpointer doesn't get the memo. But the
lack of a FatalError check here is just a simplicity of implementation
thing; it should not be necessary to send any more signals once we are
in FatalError state. Besides, this behavior is all wrong for a crash
recovery scenario: there is no guarantee that shared memory is in good
enough condition for SIGTERM shutdown to work. And we *definitely*
don't want the checkpointer trying to write a shutdown checkpoint.

>> So I don't see any bug here. And, after closer inspection, your
>> previous proposed patch is quite bogus because checkpointer is not
>> supposed to stop yet when the other processes are being terminated
>> normally.

> Well, it does send the signal only when FatalError is set. So it should
> only affect -T behavior.

If FatalError is set, it should not be necessary to send any more
signals, period, because we already tried to kill every child. If we
need to defend against somebody using SIGSTOP/SIGCONT inappropriately,
it would take a lot more thought (and code) than this, and it would
still be extremely fragile because a SIGCONT'd backend is going to be
executing against possibly-corrupt shared memory.

regards, tom lane