Re: pgsql: Add test for postmaster crash restarts.

Lists: pgsql-committerspgsql-hackers
From: Andres Freund <andres(at)anarazel(dot)de>
To: pgsql-committers(at)postgresql(dot)org
Subject: pgsql: Add test for postmaster crash restarts.
Date: 2017-09-19 00:39:27
Message-ID: E1du6ZT-00043I-91@gemulon.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

Add test for postmaster crash restarts.

Given that I managed to break this... We probably should extend the
tests to also cover other sub-processes dying, but that's something
for later.

Author: Andres Freund
Discussion: https://postgr.es/m/20170917080752.rcmihzfmgbeuqjk2@alap3.anarazel.de

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/a1924a4ea29399111e5155532ca24c9c51d3c82d

Modified Files
--------------
src/test/recovery/t/013_crash_restart.pl | 192 +++++++++++++++++++++++++++++++
1 file changed, 192 insertions(+)


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-committers(at)postgresql(dot)org
Subject: Re: pgsql: Add test for postmaster crash restarts.
Date: 2017-09-19 03:55:35
Message-ID: 5250.1505793335@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

Andres Freund <andres(at)anarazel(dot)de> writes:
> Add test for postmaster crash restarts.

Hm, calliphoridae doesn't like this.

regards, tom lane


From: Andres Freund <andres(at)anarazel(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-committers(at)postgresql(dot)org
Subject: Re: pgsql: Add test for postmaster crash restarts.
Date: 2017-09-19 04:07:03
Message-ID: 8818AEBD-57F7-420D-8348-CDB426020BFE@anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

On September 18, 2017 8:55:35 PM PDT, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>Andres Freund <andres(at)anarazel(dot)de> writes:
>> Add test for postmaster crash restarts.
>
>Hm, calliphoridae doesn't like this.

Yea. Not clear to me why yet. The machine ran a number of instances with nearly the same config successfully. Can't imagine that copyparse makes a difference here. I suspect it's somehow load related... Ran a good number of iterations locally, didn't reproduce, even under high load. Think I'll add bit more error reporting.

Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-committers(at)postgresql(dot)org
Subject: Re: pgsql: Add test for postmaster crash restarts.
Date: 2017-09-19 16:13:54
Message-ID: 17084.1505837634@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

I discovered that prairiedog has been hung up for many hours in the
013_crash_restart.pl. It looks to me like the explanation is that
the test has a race condition, because what I find in the postmaster
log is

2017-09-19 00:31:48.194 EDT [27839] [unknown] LOG: connection received: host=[local]
2017-09-19 00:31:48.203 EDT [27839] [unknown] LOG: connection authorized: user=buildfarm database=postgres
2017-09-19 00:31:48.218 EDT [27839] t/013_crash_restart.pl LOG: statement: CREATE TABLE alive(status text);
2017-09-19 00:31:48.266 EDT [27839] t/013_crash_restart.pl LOG: statement: INSERT INTO alive VALUES($$committed-before-sigquit$$);
2017-09-19 00:31:48.271 EDT [27839] t/013_crash_restart.pl LOG: statement: SELECT pg_backend_pid();
2017-09-19 00:31:48.278 EDT [27839] t/013_crash_restart.pl LOG: statement: BEGIN;
2017-09-19 00:31:48.280 EDT [27839] t/013_crash_restart.pl LOG: statement: INSERT INTO alive VALUES($$in-progress-before-sigquit$$) RETURNING status;
2017-09-19 00:31:48.292 EDT [27839] t/013_crash_restart.pl WARNING: terminating connection because of crash of another server process
2017-09-19 00:31:48.292 EDT [27839] t/013_crash_restart.pl DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted
shared memory.
2017-09-19 00:31:48.292 EDT [27839] t/013_crash_restart.pl HINT: In a moment you should be able to reconnect to the database and repeat your command.
2017-09-19 00:31:48.299 EDT [27827] LOG: server process (PID 27839) exited with exit code 2
2017-09-19 00:31:48.299 EDT [27827] DETAIL: Failed process was running: INSERT INTO alive VALUES($$in-progress-before-sigquit$$) RETURNING status;
2017-09-19 00:31:48.300 EDT [27827] LOG: terminating any other active server processes
2017-09-19 00:31:48.307 EDT [27832] WARNING: terminating connection because of crash of another server process
2017-09-19 00:31:48.307 EDT [27832] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2017-09-19 00:31:48.307 EDT [27832] HINT: In a moment you should be able to reconnect to the database and repeat your command.
2017-09-19 00:31:48.317 EDT [27827] LOG: all server processes terminated; reinitializing
2017-09-19 00:31:48.333 EDT [27840] LOG: database system was interrupted; last known up at 2017-09-19 00:31:47 EDT
2017-09-19 00:31:48.338 EDT [27840] LOG: database system was not properly shut down; automatic recovery in progress
2017-09-19 00:31:48.346 EDT [27840] LOG: redo starts at 0/15A89EC
2017-09-19 00:31:48.361 EDT [27840] LOG: invalid record length at 0/15C6D74: wanted 24, got 0
2017-09-19 00:31:48.362 EDT [27840] LOG: redo done at 0/15C6D50
2017-09-19 00:31:48.362 EDT [27840] LOG: last completed transaction was at log time 2017-09-19 00:31:48.270076-04
2017-09-19 00:31:48.474 EDT [27827] LOG: database system is ready to accept connections
2017-09-19 00:31:48.492 EDT [27847] [unknown] LOG: connection received: host=[local]
2017-09-19 00:31:48.499 EDT [27847] [unknown] LOG: connection authorized: user=buildfarm database=postgres
2017-09-19 00:31:48.578 EDT [27847] t/013_crash_restart.pl LOG: statement: SELECT pg_sleep(3600);

IOW, the "$monitor" instance of psql did not complete making its
connection until after the crash/restart cycle had occurred.
So we're just sitting there waiting for a crash report that won't
come. Which is another very serious deficiency in this test:
lacking any sort of timeout, it will just freeze indefinitely
if anything doesn't happen exactly the way it expects. From a
buildfarm owner's standpoint, that's pretty damn unfriendly.
It means having to manually unwedge your animals from time to time.

I'd like to ask you to revert this test, at least pending making
it a whole lot more bulletproof. We don't really need crash
recovery testing in the buildfarm IMO --- we hackers crash the
system plenty often enough to notice problems there.

regards, tom lane


From: Andres Freund <andres(at)anarazel(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-committers(at)postgresql(dot)org
Subject: Re: pgsql: Add test for postmaster crash restarts.
Date: 2017-09-19 16:47:18
Message-ID: 20170919164718.66hcedq2rtlkntvf@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

Hi,

On 2017-09-19 12:13:54 -0400, Tom Lane wrote:
> IOW, the "$monitor" instance of psql did not complete making its
> connection until after the crash/restart cycle had occurred.

That'd be easy enough to fix...

Just something like

$monitor_stdin .= q[
SELECT $$am-i-up$$;
];
$monitor->pump until $monitor_stdout =~ /am-i-up/;
$monitor_stdout = '';

> So we're just sitting there waiting for a crash report that won't
> come. Which is another very serious deficiency in this test:
> lacking any sort of timeout, it will just freeze indefinitely
> if anything doesn't happen exactly the way it expects. From a
> buildfarm owner's standpoint, that's pretty damn unfriendly.
> It means having to manually unwedge your animals from time to time.

Note that I just copied the code for that from another test - this is
isn't unique to this test. I agree that it'd be good to add a timeout to
those pump calls.

> I'd like to ask you to revert this test, at least pending making
> it a whole lot more bulletproof.

Hm. Ok. That seems like an overreaction to me - the failure rate isn't
actualy that high so far. I'm happy to add both timeouts and "earlier
startup" of the $monitor, but I'd prefer to do so in-tree - I'd run the
test through 100+ iterations locally, without any of this showing up.

> We don't really need crash recovery testing in the buildfarm IMO ---
> we hackers crash the system plenty often enough to notice problems
> there.

I for one don't exercise that kind of crash restarts, my development
scripts all work with restart_after_crash = false. What I find more
concerning however is coverage of EXEC_BACKEND, which has far fewer
developers actively running it constantly.

Greetings,

Andres Freund


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-committers(at)postgresql(dot)org
Subject: Re: pgsql: Add test for postmaster crash restarts.
Date: 2017-09-19 16:53:28
Message-ID: 18695.1505840008@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

Andres Freund <andres(at)anarazel(dot)de> writes:
> On 2017-09-19 12:13:54 -0400, Tom Lane wrote:
>> IOW, the "$monitor" instance of psql did not complete making its
>> connection until after the crash/restart cycle had occurred.

> That'd be easy enough to fix...

> Just something like

> $monitor_stdin .= q[
> SELECT $$am-i-up$$;
> ];
> $monitor->pump until $monitor_stdout =~ /am-i-up/;
> $monitor_stdout = '';

Hm, do we care whether the sleep call has started? Maybe not.

>> I'd like to ask you to revert this test, at least pending making
>> it a whole lot more bulletproof.

> Hm. Ok. That seems like an overreaction to me - the failure rate isn't
> actualy that high so far.

It's not that it's high, it's that having to ask buildfarm owners to
manually unwedge their critters is not very cool. Maybe prairiedog
will be the only one that gets stuck, but I kinda doubt it.

> I'm happy to add both timeouts and "earlier
> startup" of the $monitor, but I'd prefer to do so in-tree - I'd run the
> test through 100+ iterations locally, without any of this showing up.

Well, please fix it ASAP, if you don't want to take it out pending
the fixes.

regards, tom lane


From: Andres Freund <andres(at)anarazel(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-committers(at)postgresql(dot)org
Subject: Re: pgsql: Add test for postmaster crash restarts.
Date: 2017-09-19 16:58:26
Message-ID: D2FBABBF-F0E3-43FE-8EA4-E2431DD6EEBA@anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

On September 19, 2017 9:53:28 AM PDT, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>Well, please fix it ASAP, if you don't want to take it out pending
>the fixes.

Will as soon as I finished my morning coffee. Uncaffeinated, which my phone fittingly autocorrects to unvaccinated, commits aren't a good idea.

Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.


From: Andres Freund <andres(at)anarazel(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-committers(at)postgresql(dot)org
Subject: Re: pgsql: Add test for postmaster crash restarts.
Date: 2017-09-19 17:42:32
Message-ID: 20170919174232.soc2skrqxxdtnh22@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

On 2017-09-19 09:58:26 -0700, Andres Freund wrote:
>
>
> On September 19, 2017 9:53:28 AM PDT, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >Well, please fix it ASAP, if you don't want to take it out pending
> >the fixes.
>
> Will as soon as I finished my morning coffee. Uncaffeinated, which my phone fittingly autocorrects to unvaccinated, commits aren't a good idea.

Done. Survived ~100 cycles here locally, while running make -j16 -s
check-world in two parallel branches. But I have a fast disk, so it's
not comparable.

If the buildfarm doesn't complain about the use of IPC::Run's timeout
functionality, we should probably patch that into the other use of
IPC::Run as well, but especially into the other user of the pump() until
... scheme.

Greetings,

Andres Freund


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-committers(at)postgresql(dot)org
Subject: Re: pgsql: Add test for postmaster crash restarts.
Date: 2017-09-28 12:21:37
Message-ID: 21486.1506601297@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

Andres Freund <andres(at)anarazel(dot)de> writes:
> If the buildfarm doesn't complain about the use of IPC::Run's timeout
> functionality, we should probably patch that into the other use of
> IPC::Run as well, but especially into the other user of the pump() until
> ... scheme.

jacana hasn't passed this regression test yet, but at least that proves
the timeout stuff works ;-)

I think jacana's problem is simply that "kill QUIT" is never gonna work
on Windows, there being no such animal there. It looks like "kill KILL"
doesn't work either, which surprises me slightly more but not much.

regards, tom lane


From: Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-committers <pgsql-committers(at)postgresql(dot)org>
Subject: Re: pgsql: Add test for postmaster crash restarts.
Date: 2017-09-30 22:21:33
Message-ID: 56b65da3-f7cc-494c-7952-ba5f6c804d7e@2ndQuadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers


[re-adding commiters which I inadvertently left off]

On 09/30/2017 06:10 PM, Andres Freund wrote:
>
>
>> I was just looking at this. Why aren't we using "pg_ctl kill" to
>> terminate the backend? That's supposed to be portable.
> Because pg_ctl can't do that for any process but postmaster, no? The
> test is supposed to find issues with backend death (and has
> defficiencies in error reporting already, and would have caught a bug
> I'd introduced previously).
>

No, I don't think so. That's not what the docs say. That's why you give
it a pid argument" "pg_ctl kill signal_name process_id"

cheers

andrew

--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Andres Freund <andres(at)anarazel(dot)de>
To: Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-committers <pgsql-committers(at)postgresql(dot)org>
Subject: Re: pgsql: Add test for postmaster crash restarts.
Date: 2017-09-30 22:27:12
Message-ID: 20170930222712.xgdgxnt4sjvig5mn@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

On 2017-09-30 18:21:33 -0400, Andrew Dunstan wrote:
>
> [re-adding commiters which I inadvertently left off]
>
>
> On 09/30/2017 06:10 PM, Andres Freund wrote:
> >
> >
> >> I was just looking at this. Why aren't we using "pg_ctl kill" to
> >> terminate the backend? That's supposed to be portable.
> > Because pg_ctl can't do that for any process but postmaster, no? The
> > test is supposed to find issues with backend death (and has
> > defficiencies in error reporting already, and would have caught a bug
> > I'd introduced previously).

> No, I don't think so. That's not what the docs say. That's why you give
> it a pid argument" "pg_ctl kill signal_name process_id"

Oh, cool. Didn't know that one. So the answer is:
"Because Andres doesn't know squat.".

But even after fixing that, there unfortunately is:

static void
set_sig(char *signame)
{

#if 0
/* probably should NOT provide SIGKILL */
else if (strcmp(signame, "KILL") == 0)
sig = SIGKILL;
#endif

I'm unclear on what that provision is achieving? If you can kill with
pg_ctl you can do other nasty stuff too (like just use kill instead of
pg_ctl)?

Greetings,

Andres Freund


From: Andres Freund <andres(at)anarazel(dot)de>
To: Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-committers <pgsql-committers(at)postgresql(dot)org>
Subject: Re: pgsql: Add test for postmaster crash restarts.
Date: 2017-09-30 22:44:24
Message-ID: 20170930224424.ud5ilchmclbl5y5n@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

On 2017-09-30 15:27:12 -0700, Andres Freund wrote:
> On 2017-09-30 18:21:33 -0400, Andrew Dunstan wrote:
> >
> > [re-adding commiters which I inadvertently left off]
> >
> >
> > On 09/30/2017 06:10 PM, Andres Freund wrote:
> > >
> > >
> > >> I was just looking at this. Why aren't we using "pg_ctl kill" to
> > >> terminate the backend? That's supposed to be portable.
> > > Because pg_ctl can't do that for any process but postmaster, no? The
> > > test is supposed to find issues with backend death (and has
> > > defficiencies in error reporting already, and would have caught a bug
> > > I'd introduced previously).
>
> > No, I don't think so. That's not what the docs say. That's why you give
> > it a pid argument" "pg_ctl kill signal_name process_id"
>
> Oh, cool. Didn't know that one. So the answer is:
> "Because Andres doesn't know squat.".
>
> But even after fixing that, there unfortunately is:
>
> static void
> set_sig(char *signame)
> {
> …
> #if 0
> /* probably should NOT provide SIGKILL */
> else if (strcmp(signame, "KILL") == 0)
> sig = SIGKILL;
> #endif
>
> I'm unclear on what that provision is achieving? If you can kill with
> pg_ctl you can do other nasty stuff too (like just use kill instead of
> pg_ctl)?

Could you perhaps test whether windows likes things after the following
patch? I don't think the kill_kill guarantees are really needed here,
so we might even be able to allow this on msvc.

Greetings,

Andres Freund

Attachment Content-Type Size
windowsify.patch text/x-diff 3.6 KB

From: Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-committers <pgsql-committers(at)postgresql(dot)org>
Subject: Re: pgsql: Add test for postmaster crash restarts.
Date: 2017-10-01 02:28:39
Message-ID: 75777957-d16f-f799-d0ec-bc3d5b7c72de@2ndQuadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

On 09/30/2017 06:44 PM, Andres Freund wrote:
> On 2017-09-30 15:27:12 -0700, Andres Freund wrote:
>> On 2017-09-30 18:21:33 -0400, Andrew Dunstan wrote:
>>> [re-adding commiters which I inadvertently left off]
>>>
>>>
>>> On 09/30/2017 06:10 PM, Andres Freund wrote:
>>>>
>>>>> I was just looking at this. Why aren't we using "pg_ctl kill" to
>>>>> terminate the backend? That's supposed to be portable.
>>>> Because pg_ctl can't do that for any process but postmaster, no? The
>>>> test is supposed to find issues with backend death (and has
>>>> defficiencies in error reporting already, and would have caught a bug
>>>> I'd introduced previously).
>>> No, I don't think so. That's not what the docs say. That's why you give
>>> it a pid argument" "pg_ctl kill signal_name process_id"
>> Oh, cool. Didn't know that one. So the answer is:
>> "Because Andres doesn't know squat.".
>>
>> But even after fixing that, there unfortunately is:
>>
>> static void
>> set_sig(char *signame)
>> {
>> …
>> #if 0
>> /* probably should NOT provide SIGKILL */
>> else if (strcmp(signame, "KILL") == 0)
>> sig = SIGKILL;
>> #endif
>>
>> I'm unclear on what that provision is achieving? If you can kill with
>> pg_ctl you can do other nasty stuff too (like just use kill instead of
>> pg_ctl)?

I put it in when we rewrote pg_ctl in C many years ago, possibly out of
a superabundance of caution. I agree it's worth revisiting. I think the
idea was that there's a difference between an ordinary footgun and an
officially sanctioned footgun :-)

> Could you perhaps test whether windows likes things after the following
> patch? I don't think the kill_kill guarantees are really needed here,
> so we might even be able to allow this on msvc.
>

Haven't tested on MSVC but with this patch it passes on jacana (mingw).

cheers

andrew

--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Andres Freund <andres(at)anarazel(dot)de>
To: Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-committers <pgsql-committers(at)postgresql(dot)org>
Subject: Re: pgsql: Add test for postmaster crash restarts.
Date: 2017-10-01 02:32:14
Message-ID: 20171001023214.zgw3gs7mxjmlkqcd@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

Hi,

On 2017-09-30 22:28:39 -0400, Andrew Dunstan wrote:
> >> But even after fixing that, there unfortunately is:
> >>
> >> static void
> >> set_sig(char *signame)
> >> {
> >> …
> >> #if 0
> >> /* probably should NOT provide SIGKILL */
> >> else if (strcmp(signame, "KILL") == 0)
> >> sig = SIGKILL;
> >> #endif
> >>
> >> I'm unclear on what that provision is achieving? If you can kill with
> >> pg_ctl you can do other nasty stuff too (like just use kill instead of
> >> pg_ctl)?
>
>
> I put it in when we rewrote pg_ctl in C many years ago, possibly out of
> a superabundance of caution. I agree it's worth revisiting. I think the
> idea was that there's a difference between an ordinary footgun and an
> officially sanctioned footgun :-)

Heh. I'm inclined to take it out. We could add a --use-the-force-luke
type parameter, but it doesn't seem worth it.

> Haven't tested on MSVC but with this patch it passes on jacana (mingw).

Yay! Thanks for testing.

Greetings,

Andres Freund


From: Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: pg_ctl kill support for KILL signal was Re: [COMMITTERS] pgsql: Add test for postmaster crash restarts.
Date: 2017-10-01 20:24:15
Message-ID: 45d42d41-6145-9be1-7261-84acf6d9e344@2ndQuadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

On 09/30/2017 10:32 PM, Andres Freund wrote:
> Hi,
>
> On 2017-09-30 22:28:39 -0400, Andrew Dunstan wrote:
>>>> But even after fixing that, there unfortunately is:
>>>>
>>>> static void
>>>> set_sig(char *signame)
>>>> {
>>>> …
>>>> #if 0
>>>> /* probably should NOT provide SIGKILL */
>>>> else if (strcmp(signame, "KILL") == 0)
>>>> sig = SIGKILL;
>>>> #endif
>>>>
>>>> I'm unclear on what that provision is achieving? If you can kill with
>>>> pg_ctl you can do other nasty stuff too (like just use kill instead of
>>>> pg_ctl)?
>>
>> I put it in when we rewrote pg_ctl in C many years ago, possibly out of
>> a superabundance of caution. I agree it's worth revisiting. I think the
>> idea was that there's a difference between an ordinary footgun and an
>> officially sanctioned footgun :-)
> Heh. I'm inclined to take it out. We could add a --use-the-force-luke
> type parameter, but it doesn't seem worth it.
>
>
>

I agree, but I think we need this discussed on -hackers. Does anyone
have an objection to allowing "pg_ctl kill KILL somepid"? As Andres
points out, in most places you can just call kill from the command line
anyway, so disallowing it is not really a security feature. Having it
would let us have portable crash restart tests.

cheers

andrew

--

Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-committers <pgsql-committers(at)postgresql(dot)org>, Craig Ringer <craig(at)2ndquadrant(dot)com>
Subject: Re: pgsql: Add test for postmaster crash restarts.
Date: 2017-10-01 20:32:24
Message-ID: c5341caf-4b75-dfb2-8bf8-449cf1859d65@2ndQuadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

On 09/30/2017 10:32 PM, Andres Freund wrote:
>> Haven't tested on MSVC but with this patch it passes on jacana (mingw).
> Yay! Thanks for testing.
>

I have now tested on bowerbird (MSVC) and it passes. This suggests that
we can run tests there in cases where we can use IPC::Run's finish()
instead of kill_kill(). Perhaps someone would like to look at the the
two other cases where we do that (recovery/t/011_crash_recovery.pl and
recovery/t/006_logical_decoding.pl) and see if they are amenable to this
treatment. It woould be nice to be able to run all the tests on all
platforms.

cheers

andrew

--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_ctl kill support for KILL signal was Re: [COMMITTERS] pgsql: Add test for postmaster crash restarts.
Date: 2017-10-01 20:42:44
Message-ID: 26945.1506890564@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com> writes:
> On 09/30/2017 10:32 PM, Andres Freund wrote:
>> Heh. I'm inclined to take it out. We could add a --use-the-force-luke
>> type parameter, but it doesn't seem worth it.

> I agree, but I think we need this discussed on -hackers. Does anyone
> have an objection to allowing "pg_ctl kill KILL somepid"? As Andres
> points out, in most places you can just call kill from the command line
> anyway, so disallowing it is not really a security feature. Having it
> would let us have portable crash restart tests.

+1 for portable tests, but it still seems like something we don't want
to encourage users to use. What do you think of leaving it out of the
documentation?

regards, tom lane


From: Andres Freund <andres(at)anarazel(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_ctl kill support for KILL signal was Re: [COMMITTERS] pgsql: Add test for postmaster crash restarts.
Date: 2017-10-01 20:48:54
Message-ID: 20171001204854.ph3l6qo7b55yred3@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

On 2017-10-01 16:42:44 -0400, Tom Lane wrote:
> Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com> writes:
> > On 09/30/2017 10:32 PM, Andres Freund wrote:
> >> Heh. I'm inclined to take it out. We could add a --use-the-force-luke
> >> type parameter, but it doesn't seem worth it.
>
> > I agree, but I think we need this discussed on -hackers. Does anyone
> > have an objection to allowing "pg_ctl kill KILL somepid"? As Andres
> > points out, in most places you can just call kill from the command line
> > anyway, so disallowing it is not really a security feature. Having it
> > would let us have portable crash restart tests.
>
> +1 for portable tests, but it still seems like something we don't want
> to encourage users to use. What do you think of leaving it out of the
> documentation?

As far as I can tell we've not documented the set of acceptable signals
anywhere but the source. I think we can just keep it that way?

Greetings,

Andres Freund


From: Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_ctl kill support for KILL signal was Re: [COMMITTERS] pgsql: Add test for postmaster crash restarts.
Date: 2017-10-01 21:47:52
Message-ID: 070f98a2-9448-8f4c-23a8-34d1123be3af@2ndQuadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

On 10/01/2017 04:48 PM, Andres Freund wrote:
> On 2017-10-01 16:42:44 -0400, Tom Lane wrote:
>> Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com> writes:
>>> On 09/30/2017 10:32 PM, Andres Freund wrote:
>>>> Heh. I'm inclined to take it out. We could add a --use-the-force-luke
>>>> type parameter, but it doesn't seem worth it.
>>> I agree, but I think we need this discussed on -hackers. Does anyone
>>> have an objection to allowing "pg_ctl kill KILL somepid"? As Andres
>>> points out, in most places you can just call kill from the command line
>>> anyway, so disallowing it is not really a security feature. Having it
>>> would let us have portable crash restart tests.
>> +1 for portable tests, but it still seems like something we don't want
>> to encourage users to use. What do you think of leaving it out of the
>> documentation?
> As far as I can tell we've not documented the set of acceptable signals
> anywhere but the source. I think we can just keep it that way?

As documented it's in the help text:

printf(_("\nAllowed signal names for kill:\n"));
printf("  ABRT HUP INT QUIT TERM USR1 USR2\n");

So we can leave it out of there. OTOH I'm not a huge fan of security by
obscurity. I guess this wouldn't be too bad a case.

cheers

andrew

--

Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Andres Freund <andres(at)anarazel(dot)de>
To: Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_ctl kill support for KILL signal was Re: [COMMITTERS] pgsql: Add test for postmaster crash restarts.
Date: 2017-10-01 21:51:30
Message-ID: 20171001215130.keeiy4ibceyqgyjz@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

On 2017-10-01 17:47:52 -0400, Andrew Dunstan wrote:
>
>
> On 10/01/2017 04:48 PM, Andres Freund wrote:
> > On 2017-10-01 16:42:44 -0400, Tom Lane wrote:
> >> Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com> writes:
> >>> On 09/30/2017 10:32 PM, Andres Freund wrote:
> >>>> Heh. I'm inclined to take it out. We could add a --use-the-force-luke
> >>>> type parameter, but it doesn't seem worth it.
> >>> I agree, but I think we need this discussed on -hackers. Does anyone
> >>> have an objection to allowing "pg_ctl kill KILL somepid"? As Andres
> >>> points out, in most places you can just call kill from the command line
> >>> anyway, so disallowing it is not really a security feature. Having it
> >>> would let us have portable crash restart tests.
> >> +1 for portable tests, but it still seems like something we don't want
> >> to encourage users to use. What do you think of leaving it out of the
> >> documentation?
> > As far as I can tell we've not documented the set of acceptable signals
> > anywhere but the source. I think we can just keep it that way?
>
>
> As documented it's in the help text:
>
> printf(_("\nAllowed signal names for kill:\n"));
> printf("  ABRT HUP INT QUIT TERM USR1 USR2\n");

Oh, hm. I'd looked above.

> So we can leave it out of there. OTOH I'm not a huge fan of security by
> obscurity. I guess this wouldn't be too bad a case.

I'd personally include it, given that we already allow and document
ABRT. There's no meaningful difference between the two.

Greetings,

Andres Freund


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_ctl kill support for KILL signal was Re: [COMMITTERS] pgsql: Add test for postmaster crash restarts.
Date: 2017-10-01 22:01:27
Message-ID: 31003.1506895287@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

Andres Freund <andres(at)anarazel(dot)de> writes:
> On 2017-10-01 17:47:52 -0400, Andrew Dunstan wrote:
>> So we can leave it out of there. OTOH I'm not a huge fan of security by
>> obscurity. I guess this wouldn't be too bad a case.

> I'd personally include it, given that we already allow and document
> ABRT. There's no meaningful difference between the two.

True.

regards, tom lane


From: Andres Freund <andres(at)anarazel(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_ctl kill support for KILL signal was Re: [COMMITTERS] pgsql: Add test for postmaster crash restarts.
Date: 2017-10-01 22:14:47
Message-ID: 20171001221447.o6meenzwfny6xe7v@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

On 2017-10-01 18:01:27 -0400, Tom Lane wrote:
> Andres Freund <andres(at)anarazel(dot)de> writes:
> > On 2017-10-01 17:47:52 -0400, Andrew Dunstan wrote:
> >> So we can leave it out of there. OTOH I'm not a huge fan of security by
> >> obscurity. I guess this wouldn't be too bad a case.
>
> > I'd personally include it, given that we already allow and document
> > ABRT. There's no meaningful difference between the two.
>
> True.

I'll push it that way then. Adapting a help text later is fairly
harmless.

Greetings,

Andres Freund