F_SETLK is looking worse and worse...

Lists: pgsql-hackers
From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: F_SETLK is looking worse and worse...
Date: 2000-11-29 00:16:28
Message-ID: 25154.975456988@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

While testing interlocking of multiple postmasters, I discovered that
the HAVE_FCNTL_SETLK interlock code we have in StreamServerPort()
does not work at all on HPUX 10.20. This platform has F_SETLK according
to configure, but:

1. The lock is never applied to a socket, because the open() on the
newly-created socket (at line 303 of pqcomm.c) fails with EOPNOTSUPP,
Operation not supported.

2. If a postmaster finds a socket file in its way, it is unable to
remove it despite the lack of any lock, because the open() at line
230 fails with EADDRINUSE, Address already in use.

I have no idea whether the fcntl(F_SETLK) call would succeed if control
did get to it, but these results don't leave me very hopeful.

Between this and the already-known result that F_SETLK doesn't work on
sockets in shipping Linux kernels, I'm pretty unimpressed with the
usefulness of this interlock method.

We talked before about flushing the F_SETLK technique and using good
old interlock files containing PIDs, same method that we use for
interlocking the data directory. That is, if the socket file name is
/tmp/.s.PGSQL.5432, we'd create a plain file /tmp/.s.PGSQL.5432.lock
containing the owning process's PID. The code would insist on getting
this interlock file first, and if successful would just unconditionally
remove any existing socket file before doing the bind().

I can only think of one scenario where this is worse than what we have
now: if someone is running a /tmp-directory-sweeper that is bright
enough not to remove socket files, it would still zap the interlock
file, thus potentially allowing a second postmaster to take over the
socket file. This doesn't seem like a mainstream problem though.

BTW, it also seems like a good idea to reorder the postmaster's
startup operations so that the data-directory lockfile is checked
before trying to acquire the port lockfile, instead of after. That
way, in the common scenario where you're trying to start a second
postmaster in the same directory + same port, it'd fail cleanly
even if /tmp/.s.PGSQL.5432.lock had disappeared.

Comments?

regards, tom lane


From: Matthew Kirkwood <matthew(at)hairy(dot)beasts(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: F_SETLK is looking worse and worse...
Date: 2000-11-29 13:13:56
Message-ID: Pine.LNX.4.10.10011291312280.19733-100000@sphinx.mythic-beasts.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, 28 Nov 2000, Tom Lane wrote:

> That is, if the socket file name is /tmp/.s.PGSQL.5432, we'd create a
> plain file /tmp/.s.PGSQL.5432.lock

> I can only think of one scenario where this is worse than what we have
> now: if someone is running a /tmp-directory-sweeper that is bright
> enough not to remove socket files, it would still zap the interlock
> file, thus potentially allowing a second postmaster to take over the
> socket file. This doesn't seem like a mainstream problem though.

Surely the lock file could easily go somewhere other than
/tmp, since it won't be breaking existing setups?

Matthew.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Matthew Kirkwood <matthew(at)hairy(dot)beasts(dot)org>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: F_SETLK is looking worse and worse...
Date: 2000-11-29 15:55:36
Message-ID: 27454.975513336@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Matthew Kirkwood <matthew(at)hairy(dot)beasts(dot)org> writes:
> Surely the lock file could easily go somewhere other than
> /tmp, since it won't be breaking existing setups?

Such as where?

Given the fact that the recent UUNET patch allows the DBA to put the
socket files anywhere, it seems simplest to say that the lockfiles go
in the same directory as the socket files. Anything else is going to
be mighty confusing and probably unworkable. For example, it's not
a good idea to say we'll use a fixed directory for lockfiles regardless
of where the socket file is --- that would prevent people from starting
multiple postmasters with the same logical port number and different
socket directories, something that's really perfectly reasonable (at
least in UUNET's view of the world ;-)).

regards, tom lane


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: F_SETLK is looking worse and worse...
Date: 2000-11-29 16:37:31
Message-ID: Pine.LNX.4.21.0011291731510.796-100000@peter.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane writes:

> I can only think of one scenario where this is worse than what we have
> now: if someone is running a /tmp-directory-sweeper that is bright
> enough not to remove socket files, it would still zap the interlock
> file, thus potentially allowing a second postmaster to take over the
> socket file. This doesn't seem like a mainstream problem though.

Red Hat by default cleans out all files under /tmp and subdirectories that
haven't been accesses for 10 days. I assume other Linux distributions do
similar things. Red Hat's tmpwatch doesn't ever follow symlinks, though.
That means you could make /tmp/.s.PGSQL.5432.lock a symlink to
PGDATA/postmaster.pid. That might be a good idea in general, since
establishes an easy to examine correspondence between data directory and
port number.

--
Peter Eisentraut peter_e(at)gmx(dot)net http://yi.org/peter-e/


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: F_SETLK is looking worse and worse...
Date: 2000-11-29 16:53:13
Message-ID: 27690.975516793@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Peter Eisentraut <peter_e(at)gmx(dot)net> writes:
> Red Hat by default cleans out all files under /tmp and subdirectories that
> haven't been accesses for 10 days. I assume other Linux distributions do
> similar things. Red Hat's tmpwatch doesn't ever follow symlinks, though.

Nor remove them?

> That means you could make /tmp/.s.PGSQL.5432.lock a symlink to
> PGDATA/postmaster.pid. That might be a good idea in general, since
> establishes an easy to examine correspondence between data directory and
> port number.

I think this is a bad idea, because it assumes that the would-be
examiner (a) has read access to someone else's data directory, and
(b) has the same chroot setting as the someone else does (else the
symlink won't mean the same thing to both of them). UUNET was planning
to run postmasters chrooted into various subdirectories, IIRC, so
point (b) isn't hypothetical.

However, I have no objection to writing the value of DataDir into
the socket lockfile (along with the owner's PID) if that seems like
a worthwhile bit of info.

Would there be any value in having a postmaster re-read its own socket
lockfile every so often, to keep it looking active to /tmp sweepers?
Or is that too much of a kluge?

regards, tom lane


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: F_SETLK is looking worse and worse...
Date: 2000-12-09 20:04:13
Message-ID: 200012092004.PAA24209@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> However, I have no objection to writing the value of DataDir into
> the socket lockfile (along with the owner's PID) if that seems like
> a worthwhile bit of info.
>
> Would there be any value in having a postmaster re-read its own socket
> lockfile every so often, to keep it looking active to /tmp sweepers?
> Or is that too much of a kluge?

Removing 10-day-old files from /tmp seems pretty broken to me, and I
hate to code around broken stuff.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: F_SETLK is looking worse and worse...
Date: 2000-12-10 18:10:29
Message-ID: Pine.LNX.4.30.0012101903120.1095-100000@peter.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Bruce Momjian writes:

> Removing 10-day-old files from /tmp seems pretty broken to me, and I
> hate to code around broken stuff.

(It's not 10-day-old files, it's files that have not been used for 10
days.)

But both the Linux file system standard and POSIX 2 have requirements
and/or recommendations that call for /tmp to be cleaned out once in a
while. If you don't like that, put your files elsewhere. We're not in a
position to dictate system administration procedures.

--
Peter Eisentraut peter_e(at)gmx(dot)net http://yi.org/peter-e/