Unreliable "pg_ctl -w start" again

Lists: pgsql-hackers
From: "MauMau" <maumau307(at)gmail(dot)com>
To: <pgsql-hackers(at)postgresql(dot)org>
Subject: Unreliable "pg_ctl -w start" again
Date: 2012-01-27 15:45:19
Message-ID: 996B1BE9112D45A48F5497E68D90BFFE@maumau
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hello,

Last year, I asked for your opinions about how to fix the bug of unreliable
"pg_ctl -w start", as in the thread:

http://archives.postgresql.org/pgsql-hackers/2011-05/msg01407.php

The phenomenon was that "pg_ctl -w start" did not return for 60 seconds when
postgresql.conf contained a wrong parameter specification.

Recently, I've encountered another problem of "pg_ctl -w start", which I
cannot reliably avoid. I found the cause in pg_ctl.c. I'm willing to create
a patch, but I'm concerned about the correctness of the fix. I desire this
bug will be eliminated as soon as possible. I'd like to ask your opinions.

[Problem]
I use PostgreSQL 8.3.12 embedded in a packaged application. The problem
occurred on RHEL5 when the operating system was starting up. The packaged
application is started from /etc/init.d/myapp. That application internally
executes "pg_ctl -w start" and checks its return value. The application does
not start unless the return value is 0.

The problematic phenomenon is that "pg_ctl -w start" fails with return value
1 in only two seconds without waiting until 60 seconds pass. That is, -w did
not work. However, the database server started successfully.

The timeline was as follows:

18:09:45 the application executed "pg_ctl -w start"
18:09:47 "pg_ctl -w start" returned with 1

<PostgreSQL's server log (dates are intentionally eliminated)>
18:10:01 JST 22995 LOG: database system was interrupted;last known up at
2012-01-21 02:24:59 JST
18:10:32 JST 22995 LOG: database system was not properly shut down;
automatic recovery in progress
18:10:34 JST 22995 LOG: record with zero length at 0/23E35D4
18:10:34 JST 22995 LOG: redo is not required
18:11:38 JST 22893 LOG: database system is ready to accept connections
18:11:38 JST 23478 LOG: autovacuum launcher started

PostgreSQL took a long time to start. This is probably because the system
load was high with many processes booting up concurrently during OS boot.

[Cause]
The following part in do_start() of pg_ctl.c contains a bug:

if (old_pid != 0)
{
pg_usleep(1000000);
pid = get_pgpid();
if (pid == old_pid)
{
write_stderr(_("%s: could not start server\n"
"Examine the log output.\n"),
progname);
exit(1);
}
}

This part assumes that postmaster will overwrite postmaster.pid within a
second. This assumption is not correct under heavy load like OS startup.

In PostgreSQL 9.1, the wait processing is largely modified. However, the
same assumption seems to still remain, though the duration is 5 seconds. 5
seconds of wait is probably insufficient for my case. I think no fixed
duration is appropriate.

[Solution]
So, what is the reliable solution? The pipe-based one, which I proposed in
the past thread, would be reliable. However, that is not simple enough to
back-port to 8.3.

How about inserting postmaster_is_alive() as below? I know this is not
perfect, but this will work in most cases. I need some solution that
pratically helps.

if (old_pid != 0)
{
pg_usleep(1000000);
pid = get_pgpid();
if (pid == old_pid && postmaster_is_alive(pid))
{
write_stderr(_("%s: could not start server\n"
"Examine the log output.\n"),
progname);
exit(1);
}
}

Regards
MauMau


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "MauMau" <maumau307(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Unreliable "pg_ctl -w start" again
Date: 2012-01-27 18:18:51
Message-ID: 11911.1327688331@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"MauMau" <maumau307(at)gmail(dot)com> writes:
> In PostgreSQL 9.1, the wait processing is largely modified. However, the
> same assumption seems to still remain, though the duration is 5 seconds. 5
> seconds of wait is probably insufficient for my case. I think no fixed
> duration is appropriate.

Well, feel free to increase that duration if you want. The reason it's
there is to not wait for a long time if the postmaster falls over
instantly at startup, but in a non-interactive situation you might not
care.

> How about inserting postmaster_is_alive() as below?

Looks like complete nonsense to me, if the goal is to behave sanely when
postmaster.pid hasn't been created yet. Where do you think get_pgpid
gets the PID from?

If you want to do something useful about this, the correct hint is
buried in start_postmaster():

/*
* Since there might be quotes to handle here, it is easier simply to pass
* everything to a shell to process them.
*
* XXX it would be better to fork and exec so that we would know the child
* postmaster's PID directly; then test_postmaster_connection could use
* the PID without having to rely on reading it back from the pidfile.
*/

If we had the postmaster's PID a priori, we could detect postmaster
death directly instead of having to make assumptions about how long
is reasonable to wait for the pidfile to appear. The problem is that
we don't want to write a complete replacement for the shell's command
line parser and I/O redirection logic. It doesn't look like a small
project.

(But maybe we could bypass that by doing a fork() and then having
the child exec() the shell, telling it to exec postmaster in turn?)

And of course Windows as usual makes things twice as hard, since we
couldn't make such a change unless start_postmaster could return the
proper PID in that case too.

regards, tom lane


From: "MauMau" <maumau307(at)gmail(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Unreliable "pg_ctl -w start" again
Date: 2012-01-28 02:36:18
Message-ID: 4F0633450B7A4741B348040C8831EC90@maumau
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

From: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
> Well, feel free to increase that duration if you want. The reason it's
> there is to not wait for a long time if the postmaster falls over
> instantly at startup, but in a non-interactive situation you might not
> care.

Yes, just lengthening the wait duration causes unnecessary long wait when we
run pg_ctl interactively. Therefore, the current wait approach is is not
correct.

>> How about inserting postmaster_is_alive() as below?
>
> Looks like complete nonsense to me, if the goal is to behave sanely when
> postmaster.pid hasn't been created yet. Where do you think get_pgpid
> gets the PID from?

Yes, I understand that get_pgpid() gets the pid from postmaster.pid, which
may be the pid of the previous postmaster that did not stop cleanly.

I think my simple fix makes sense to solve the problem as follows. Could you
point out what might not be good?

1.The previous postmaster was terminated abruptly due to OS shutdown,
machine failure, etc. leaving postmaster.pid.
2.Run "pg_ctl -w start" to start new postmaster.
3.do_start() of pg_ctl reads the pid of previously running postmaster from
postmaster.pid. Say, let it be pid-1 (old_pid in code) here.

old_pid = get_pgpid();

4.Anyway, try to start postmaster by calling start_postmaster().
5.If postmaster.pid existed at step 3, it means either of:

(a) Previous postmaster did not stop cleanly and left postmaster.pid.
(b) Another postmaster is already running in the data directory (since
before running pg_ctl -w start this time.)

But we can't distinguish between them. Then, we read ostmaster.pid again to
judge the situation. Let it be pid-2 (pid in code).

if (old_pid != 0)
{
pg_usleep(1000000);
pid = get_pgpid();

6.If pid-1 != pid-2, it means that the situation (a) applies and the newly
started postmaster overwrote old postmaster.pid. Then, try to connect to
postmaster.

If pid-1 == pid-2, it means either of:

(a') Previous postmaster did not stop cleanly and left postmaster.pid. Newly
started postmaster will complete startup, but hasn't overwritten
postmaster.pid yet.
(b) Another postmaster is already running in the data directory (since
before running pg_ctl -w start this time.)

The current comparison logic cannot distinguish between them. In my problem
situation, situation a' happened, and pg_ctl mistakenly exited.

if (pid == old_pid)
{
write_stderr(_("%s: could not start server\n"
"Examine the log output.\n"),
progname);
exit(1);
}

7.To distinguish between a' and b, check if pid-1 is alive. If pid-1 is
alive, it means situation b. Otherwise, that is situation a'.

if (pid == old_pid && postmaster_is_alive(old_pid))

However, the pid of newly started postmaster might match the one of old
postmaster. To deal with that situation, it may be better to check the
modified timestamp of postmaster.pid in addition.

What do you think?

> If we had the postmaster's PID a priori, we could detect postmaster
> death directly instead of having to make assumptions about how long
> is reasonable to wait for the pidfile to appear. The problem is that
> we don't want to write a complete replacement for the shell's command
> line parser and I/O redirection logic. It doesn't look like a small
> project.

Yes, I understand this. I don't think we can replace shell's various work.

> (But maybe we could bypass that by doing a fork() and then having
> the child exec() the shell, telling it to exec postmaster in turn?)

Possibly. I hope this works. Then, we can pass unnamed pipe file descriptors
to postmaster via environment variables from the pg_ctl's forked child.

> And of course Windows as usual makes things twice as hard, since we
> couldn't make such a change unless start_postmaster could return the
> proper PID in that case too.

Well, we can make start_postmaster() return the pid of the newly created
postmaster. CreateProcess() sets the process handle in the structure passed
to it. We can pass the process handle to WaitForSingleObject8) to know
whether postmaster is alive.

Regards
MauMau


From: "MauMau" <maumau307(at)gmail(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Unreliable "pg_ctl -w start" again
Date: 2012-01-28 02:36:41
Message-ID: C81D34C8264145748A657C0954AAE9BB@maumau
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

From: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
> Well, feel free to increase that duration if you want. The reason it's
> there is to not wait for a long time if the postmaster falls over
> instantly at startup, but in a non-interactive situation you might not
> care.

Yes, just lengthening the wait duration causes unnecessary long wait when we
run pg_ctl interactively. Therefore, the current wait approach is is not
correct.

>> How about inserting postmaster_is_alive() as below?
>
> Looks like complete nonsense to me, if the goal is to behave sanely when
> postmaster.pid hasn't been created yet. Where do you think get_pgpid
> gets the PID from?

Yes, I understand that get_pgpid() gets the pid from postmaster.pid, which
may be the pid of the previous postmaster that did not stop cleanly.

I think my simple fix makes sense to solve the problem as follows. Could you
point out what might not be good?

1.The previous postmaster was terminated abruptly due to OS shutdown,
machine failure, etc. leaving postmaster.pid.
2.Run "pg_ctl -w start" to start new postmaster.
3.do_start() of pg_ctl reads the pid of previously running postmaster from
postmaster.pid. Say, let it be pid-1 (old_pid in code) here.

old_pid = get_pgpid();

4.Anyway, try to start postmaster by calling start_postmaster().
5.If postmaster.pid existed at step 3, it means either of:

(a) Previous postmaster did not stop cleanly and left postmaster.pid.
(b) Another postmaster is already running in the data directory (since
before running pg_ctl -w start this time.)

But we can't distinguish between them. Then, we read ostmaster.pid again to
judge the situation. Let it be pid-2 (pid in code).

if (old_pid != 0)
{
pg_usleep(1000000);
pid = get_pgpid();

6.If pid-1 != pid-2, it means that the situation (a) applies and the newly
started postmaster overwrote old postmaster.pid. Then, try to connect to
postmaster.

If pid-1 == pid-2, it means either of:

(a') Previous postmaster did not stop cleanly and left postmaster.pid. Newly
started postmaster will complete startup, but hasn't overwritten
postmaster.pid yet.
(b) Another postmaster is already running in the data directory (since
before running pg_ctl -w start this time.)

The current comparison logic cannot distinguish between them. In my problem
situation, situation a' happened, and pg_ctl mistakenly exited.

if (pid == old_pid)
{
write_stderr(_("%s: could not start server\n"
"Examine the log output.\n"),
progname);
exit(1);
}

7.To distinguish between a' and b, check if pid-1 is alive. If pid-1 is
alive, it means situation b. Otherwise, that is situation a'.

if (pid == old_pid && postmaster_is_alive(old_pid))

However, the pid of newly started postmaster might match the one of old
postmaster. To deal with that situation, it may be better to check the
modified timestamp of postmaster.pid in addition.

What do you think?

> If we had the postmaster's PID a priori, we could detect postmaster
> death directly instead of having to make assumptions about how long
> is reasonable to wait for the pidfile to appear. The problem is that
> we don't want to write a complete replacement for the shell's command
> line parser and I/O redirection logic. It doesn't look like a small
> project.

Yes, I understand this. I don't think we can replace shell's various work.

> (But maybe we could bypass that by doing a fork() and then having
> the child exec() the shell, telling it to exec postmaster in turn?)

Possibly. I hope this works. Then, we can pass unnamed pipe file descriptors
to postmaster via environment variables from the pg_ctl's forked child.

> And of course Windows as usual makes things twice as hard, since we
> couldn't make such a change unless start_postmaster could return the
> proper PID in that case too.

Well, we can make start_postmaster() return the pid of the newly created
postmaster. CreateProcess() sets the process handle in the structure passed
to it. We can pass the process handle to WaitForSingleObject8) to know
whether postmaster is alive.

Regards
MauMau


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "MauMau" <maumau307(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Unreliable "pg_ctl -w start" again
Date: 2012-01-28 02:51:26
Message-ID: 270.1327719086@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"MauMau" <maumau307(at)gmail(dot)com> writes:
> From: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
>> Looks like complete nonsense to me, if the goal is to behave sanely when
>> postmaster.pid hasn't been created yet. Where do you think get_pgpid
>> gets the PID from?

> Yes, I understand that get_pgpid() gets the pid from postmaster.pid, which
> may be the pid of the previous postmaster that did not stop cleanly.
> [ convoluted reasoning about what to do if that's the case ]

I don't see any point in worrying about that case when you can't handle
the basic case that the postmaster hasn't created postmaster.pid yet.
In any case, this does nothing at all to answer the question you posed,
which was how long is it reasonable to wait for the postmaster to
produce a new postmaster.pid file. We really need to know the PID of
the process we started in order to make any real improvement here.

regards, tom lane


From: "MauMau" <maumau307(at)gmail(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Unreliable "pg_ctl -w start" again
Date: 2012-01-28 07:05:30
Message-ID: 1B2CBF77B74043E9B8DDAD04E333E7C3@maumau
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

From: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
> I don't see any point in worrying about that case when you can't handle
> the basic case that the postmaster hasn't created postmaster.pid yet.
> In any case, this does nothing at all to answer the question you posed,
> which was how long is it reasonable to wait for the postmaster to
> produce a new postmaster.pid file. We really need to know the PID of
> the process we started in order to make any real improvement here.

I'm sorry, but I may be missing the point of discussion. Let me clarify my
point again.

1.To be honest, I wish a real solution that can eliminate the problem
completely, for example, by using SIGCHLD approach Tom-san proposed before,
or by using unnamed pipes I proposed last year. However, the real solution
seems complicated to back-port to past versions.

2.As you said, we need any real improvement, if not a perfect solution, that
can lower the possibility of the problem. Doing something which can somehow
mitigate the problem is better than doing nothing and leaving the problem.

3.How long is it reasonable to wait for the postmaster to produce a new
postmaster.pid file?
No duration is reasonable.

4.Then, what can we do to mitigate the problem in the past versions (8.3,
8.4, 9.0)?
My very simple fix works well in the cases where old postmaster.pid file
exists. But it does not make any improvement in the cases where old
postmaster.pid is not left (i.e., running pg_ctl after clean shutdown; a
basic case).
Am I missing anything? Does the fix result in degradation in some cases?

Otherwise, is Tom-san saying:

"We will be able to solve the remaining problems completely if we can get
the pid of postmaster started in pg_ctl. We thought that it was not easy to
get the pid during 9.1 development. However, we might be able to get it more
easily than we thought, such as by fork-execing and making the child process
exec shell to make the new shell exec postmaster. So let's pursue the real
solution here."

If so, I agree. But if we find the real solution difficult, I wish my
"non-perfect" but somewhat effective solution will be accepted.

Regards
MauMau