[bug fix] "pg_ctl stop" times out when it should respond quickly

From: "MauMau" <maumau307(at)gmail(dot)com>
To: <pgsql-hackers(at)postgresql(dot)org>
Subject: [bug fix] "pg_ctl stop" times out when it should respond quickly
Date: 2013-12-03 12:45:53
Message-ID: DF2AB03E91D547319F29A21458EA868E@maumau
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello,

I've encountered a small bug and fixed it. I guess this occurs on all major
releases. I saw this happen on 9.2 and 9.4devel. Please find attached the
patch and commit this.

[Problem]
If I mistakenly set an invalid value to listen_addresses, say '-1', and
start the database server, it fails to start as follows. In my environment
(RHEL6 for Intel64), it takes about 15 seconds before postgres prints the
messages. This is OK.

[maumau(at)myhost pgdata]$ pg_ctl -w start
waiting for server to start........................LOG: could not translate
host name "-1", service "5450" to address: Temporary failure in name
resolution
WARNING: could not create listen socket for "-1"
FATAL: could not create any TCP/IP sockets
stopped waiting
pg_ctl: could not start server
Examine the log output.
[maumau(at)myhost pgdata]$

When I start the server without -w and try to stop it, "pg_ctl stop" waits
for 60 seconds and timed out before it fails. This is what I'm seeing as a
problem. I expected "pg_ctl stop" to respond quickly with success or
failure depending on the timing.

[maumau(at)myhost pgdata]$ pg_ctl start
server starting
...(a few seconds later)
[maumau(at)myhost ~]$ pg_ctl stop
waiting for server to shut
down.................................................
.............. failed
pg_ctl: server does not shut down
HINT: The "-m fast" option immediately disconnects sessions rather than
waiting for session-initiated disconnection.
[maumau(at)myhost ~]$

[Cause]
The problem occurs in the sequence below:

1. postmaster creates $PGDATA/postmaster.pid.
2. postmaster tries to resolve the value of listen_addresses to IP
addresses. This took about 15 seconds in my failure scenario.
3. During 2, pg_ctl sends SIGTERM to postmaster.
4. postmaster terminates immediately without deleting
$PGDATA/postmaster.pid. This is because it hasn't set signal handlers yet.
5. "pg_ctl stop" waits in a loop until $PGDATA/postmaster.pid disappears.
But the file does not disappear and it times out.

[Fix]
Make pg_ctl check if postmaster is still alive, because postmaster might
have crashed unexpectedly.

Regards
MauMau

Attachment Content-Type Size
pg_stop_fail.patch application/octet-stream 1.6 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2013-12-03 12:55:45 Re: Skip hole in log_newpage
Previous Message Heikki Linnakangas 2013-12-03 12:20:17 Re: Skip hole in log_newpage