Quick Links

Lockfile restart failure is still there :-(

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	pgsql-hackers(at)postgreSQL(dot)org
Subject:	Lockfile restart failure is still there :-(
Date:	2005-03-17 21:20:33
Message-ID:	27501.1111094433@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Last fall I proposed a minor tweak to solve the problem of Postgres
not restarting after a system reboot, in cases where it looked at the
old lockfile and thought the old postmaster was still alive:
http://archives.postgresql.org/pgsql-hackers/2004-09/msg00935.php

However it turns out the bug is still there. We eliminated one case,
which is where the PID shown in the lockfile now belongs to the
immediate parent process of the postmaster (ie the shell that's spawning
it). But the PID might belong to an older process, for instance a
root-owned "su" that spawned the immediate parent shell. I argued in
the above message that this wouldn't be a problem because the kill()
would fail against a non-postgres-owned process. But I evidently didn't
read the code quite carefully enough: as CreateLockFile() is written,
it considers an EPERM error from kill() to be reason to treat the
lockfile as valid.

I was thinking at the time, and still think, it is reasonable to treat
EPERM as being a safe rather than unsafe case. EPERM implies that the
process exists but does not belong to the postgres userid, and therefore
it could not possibly be a competing postmaster. We can assume that any
postmaster successfully started in a particular data directory belongs
to the userid that owns that directory, because (a) we check that we are
not root, and (b) we check that the data directory has no group or world
permissions; therefore if we were not of its owner's userid we'd not
be able to do anything in it.

Can anyone see any holes in this reasoning? Are there any cases where
an EPERM failure could occur against a process that is of our own userid?

I am strongly tempted to add a direct check in checkDataDir() that the
data directory actually does belong to our own uid, just for paranoia's
sake. Someone might decide that they could relax the permission check
("hey, why not let the dbadmin group have write permission on $PGDATA")
without realizing they'd be weakening the startup safety interlock.

Comments?

regards, tom lane

Responses

Re: Lockfile restart failure is still there :-( at 2005-03-17 22:45:54 from Greg Stark
Re: Lockfile restart failure is still there :-( at 2005-03-17 23:00:04 from Andrew Dunstan
Re: Lockfile restart failure is still there :-( at 2005-03-18 03:56:06 from Tom Lane

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Simon Riggs	2005-03-17 21:30:12	Re: securing pg_proc
Previous Message	Bruno Wolff III	2005-03-17 21:00:00	Re: contrib/pgcrypto