Buildfarm owners: check if your HEAD build is stuck

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: Buildfarm owners: check if your HEAD build is stuck
Date: 2006-08-12 15:29:46
Message-ID: 27932.1155396586@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

A number of the buildfarm machines have been failing HEAD builds
at the "make check" stage since last night, with complaints like
this one from emu:

================== pgsql.21911/src/test/regress/log/postmaster.log ===================
FATAL: lock file "/tmp/.s.PGSQL.55678.lock" already exists
HINT: Is another postmaster (PID 23692) using socket file "/tmp/.s.PGSQL.55678"?

What's happened is that that GUC patch that was in the tree for a few
hours broke postmaster startup on some machines (for as-yet-unidentified
reasons). The postmaster does actually start and establish its
lockfiles, but it never gets to the stage of being able to accept
connections.

After the buildfarm script rm -rf's the build tree, the postmaster
process is still there but "disembodied" (its executable file is
probably gone, for example, or at least in the state of zero remaining
directory links). But it's still got that socket file and lockfile
in /tmp, and this prevents another postmaster from starting with the
same port number.

If you've got this situation, you'll need to do a manual "kill" on the
PID mentioned in the lock file before things will start working again.
(pg_ctl won't work because it looks for the data directory
postmaster.pid file, which is long gone.) More generally you might want
to look through a ps listing for unexpected postgres-owned processes.

I'm not sure whether there's anything much we can do to prevent such
problems in future. Maybe it'd be reasonable for pg_regress to do a
kill -9 on its postmaster child process if it gives up waiting for the
postmaster to accept connections.

regards, tom lane

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Francisco Figueiredo Jr. 2006-08-12 19:59:37 SIg11 on suse linux
Previous Message Tom Lane 2006-08-12 14:59:13 Re: Forcing current WAL file to be archived