What is the best and easiest implementation to reliably wait for the completion of startup?

Lists: pgsql-hackers
From: "MauMau" <maumau307(at)gmail(dot)com>
To: <pgsql-hackers(at)postgresql(dot)org>
Subject: What is the best and easiest implementation to reliably wait for the completion of startup?
Date: 2011-05-27 12:24:04
Message-ID: E04816E53DDC4D53A74A6A51BC99C0B5@maumau
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hello,

I've encountered a problem of PostgreSQL startup, and I can think of a
simple solution for that. However, I don't yet have much knowledge about
PostgreSQL implementation, I'd like to ask you about what is the best and
easiest solution. If it is easy for me to work on during my spare time at
home, I'm willing to implement the patch.

[problem]
I can't reliably wait for the completion of PostgreSQL startup. I want
pg_ctl to wait until the server completes startup and accepts connections.

Yes, we have "-w" and "-t wait_second" options of pg_ctl. However, what
value should I specify to -t? I have to specify much time, say 3600 seconds,
in case the startup processing takes long for crash recovery or archive
recovery.

The bad thing is that pg_ctl continues to wait until the specified duration
passes, even if postgres fails to start. For example, it is naturally
desirable for pg_ctl to terminate when postgresql.conf contains a syntax
error.

[solution idea]
Use unnamed pipes for postmaster to notify pg_ctl of the completion of
startup. That is:

pg_ctl's steps:
1. create a pair of unnamed pipes.
2. starts postgres.
3. read the pipe, waiting for a startup completion message from postmaster.

postmaster's steps:
1. inherit a pair of unnamed pipes from pg_ctl.
2. do startup processing.
3. write a startup completion message to the pipe, then closes the pipe.

I'm wondering if this is correct and easy. One concern is whether postmaster
can inherit pipes through system() call.

Please give me your ideas. Of course, I would be very happy if some
experienced community member could address this problem.

And finally, do you think this should be handled as a bug, or an improvement
in 9.2?

Regards
MauMau


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "MauMau" <maumau307(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: What is the best and easiest implementation to reliably wait for the completion of startup?
Date: 2011-05-27 14:34:53
Message-ID: 12493.1306506893@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"MauMau" <maumau307(at)gmail(dot)com> writes:
> The bad thing is that pg_ctl continues to wait until the specified duration
> passes, even if postgres fails to start. For example, it is naturally
> desirable for pg_ctl to terminate when postgresql.conf contains a syntax
> error.

Hmm, I thought we'd fixed this in the last go-round of pg_ctl wait
revisions, but testing proves it does not work desirably in HEAD:
not only does pg_ctl wait till its timeout elapses, but it then reports
"server started" even though the server didn't start. That's clearly a
bug :-(

I think your proposal of a pipe-based solution might be overkill though.
Seems like it would be sufficient for pg_ctl to give up if it doesn't
see the postmaster.pid file present within a couple of seconds of
postmaster startup. I don't really want to add logic to the postmaster
to have the sort of reporting protocol you propose, because not
everybody uses pg_ctl to start the postmaster. In any case, we need a
fix in 9.1 ...

regards, tom lane


From: "MauMau" <maumau307(at)gmail(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: What is the best and easiest implementation to reliably wait for the completion of startup?
Date: 2011-05-28 03:42:39
Message-ID: 45EA0477951744379296D472397BCDD4@maumau
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

From: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
> "MauMau" <maumau307(at)gmail(dot)com> writes:
>> The bad thing is that pg_ctl continues to wait until the specified
>> duration
>> passes, even if postgres fails to start. For example, it is naturally
>> desirable for pg_ctl to terminate when postgresql.conf contains a syntax
>> error.
>
> Hmm, I thought we'd fixed this in the last go-round of pg_ctl wait
> revisions, but testing proves it does not work desirably in HEAD:
> not only does pg_ctl wait till its timeout elapses, but it then reports
> "server started" even though the server didn't start. That's clearly a
> bug :-(
>
> I think your proposal of a pipe-based solution might be overkill though.
> Seems like it would be sufficient for pg_ctl to give up if it doesn't
> see the postmaster.pid file present within a couple of seconds of
> postmaster startup. I don't really want to add logic to the postmaster
> to have the sort of reporting protocol you propose, because not
> everybody uses pg_ctl to start the postmaster. In any case, we need a
> fix in 9.1 ...

Yes, I was a bit afraid the pipe-based fix might be overkill, too, so I was
wondering if there might be a more easy solution.

"server started"... I missed it. That's certainly a bug, as you say.

I was also considering the postmaster.pid-based solution exactly as you
suggest, but that has a problem -- how many seconds do we assume for "a
couple of seconds"? If the system load is temporarily so high that
postmaster takes many seconds to create postmaster.pid, pg_ctl mistakenly
thinks that postmaster failed to start. I know this is a hypothetical rare
case. I don't like touching the postmaster logic and complicating it, but
logical correctness needs to come first (Japanese users are very severe).

Another problem with postmaster.pid-based solution happens after postmaster
crashes. When postmaster crashes, postmaster.pid is left. If the pid in
postmaster.pid is allocated to some non-postgres process and that process
remains, pg_ctl misjudges that postmaster is starting up, and waits for long
time.

Regards
MauMau