Re: file-locking and postmaster.pid

Lists: pgsql-hackers
From: Andreas Joseph Krogh <andreak(at)officenet(dot)no>
To: pgsql-hackers(at)postgresql(dot)org
Subject: file-locking and postmaster.pid
Date: 2006-05-23 15:23:16
Message-ID: 200605231723.16920.andreak@officenet.no
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi all.

I've experienced several times that PG has died somehow and the postmaster.pid
file still exists 'cause PG hasn't had the ability to delete it upon proper
shutdown. Upon start-up, after such an incidence, PG tells me another PG is
running and that I either have to shut down the other instance, or delete the
postmaster.pid file if there really isn't an instance running. This seems
totally unnecessary to me. Why doesn't PG use file-locking to tell if another
PG is running or not? If PG holds an exclusive-lock on the pid-file and the
process crashes, or shuts down, then the lock(which is process-based and
controlled by the kernel) will be removed and another PG which tries to start
up can detect that. Using the existence of the pid-file as the only evidence
gives too many false positives IMO.

I'm sure there's a good reason for having it the way it is, having so many
smart knowledgeable people working on this project. Could someone please
explain the rationale of the current solution to me?

--
Andreas Joseph Krogh <andreak(at)officenet(dot)no>
Senior Software Developer / Manager
gpg public_key: http://dev.officenet.no/~andreak/public_key.asc
------------------------+---------------------------------------------+
OfficeNet AS | The most difficult thing in the world is to |
Hoffsveien 17 | know how to do a thing and to watch |
PO. Box 425 Skøyen | somebody else doing it wrong, without |
0213 Oslo | comment. |
NORWAY | |
Phone : +47 22 13 01 00 | |
Direct: +47 22 13 10 03 | |
Mobile: +47 909 56 963 | |
------------------------+---------------------------------------------+


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andreas Joseph Krogh <andreak(at)officenet(dot)no>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: file-locking and postmaster.pid
Date: 2006-05-23 15:54:37
Message-ID: 14889.1148399677@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andreas Joseph Krogh <andreak(at)officenet(dot)no> writes:
> I've experienced several times that PG has died somehow and the postmaster.pid
> file still exists 'cause PG hasn't had the ability to delete it upon proper
> shutdown. Upon start-up, after such an incidence, PG tells me another PG is
> running and that I either have to shut down the other instance, or delete the
> postmaster.pid file if there really isn't an instance running. This seems
> totally unnecessary to me.

The postmaster does check to see whether the PID mentioned in the file
is still alive, so it's not that easy for the above to happen. If you
can provide details of a scenario where a failure is likely, we'd like
to know about it. Also, what PG version are you talking about?

> Why doesn't PG use file-locking to tell if another
> PG is running or not?

Portability.

regards, tom lane


From: Andreas Joseph Krogh <andreak(at)officenet(dot)no>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: file-locking and postmaster.pid
Date: 2006-05-23 16:10:27
Message-ID: 200605231810.27883.andreak@officenet.no
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tuesday 23 May 2006 17:54, Tom Lane wrote:
> Andreas Joseph Krogh <andreak(at)officenet(dot)no> writes:
> > I've experienced several times that PG has died somehow and the
> > postmaster.pid file still exists 'cause PG hasn't had the ability to
> > delete it upon proper shutdown. Upon start-up, after such an incidence,
> > PG tells me another PG is running and that I either have to shut down the
> > other instance, or delete the postmaster.pid file if there really isn't
> > an instance running. This seems totally unnecessary to me.
>
> The postmaster does check to see whether the PID mentioned in the file
> is still alive, so it's not that easy for the above to happen. If you
> can provide details of a scenario where a failure is likely, we'd like
> to know about it. Also, what PG version are you talking about?

I have experienced this with PG-8.1.3 and will provide details if I can make
it happen. Basically it has happened when I have had to "hard-reset" my
laptop due to some strange bugs in Linux which have made it hang.

> > Why doesn't PG use file-locking to tell if another
> > PG is running or not?
>
> Portability.

Ok.

--
Andreas Joseph Krogh <andreak(at)officenet(dot)no>
Senior Software Developer / Manager
gpg public_key: http://dev.officenet.no/~andreak/public_key.asc
------------------------+---------------------------------------------+
OfficeNet AS | The most difficult thing in the world is to |
Hoffsveien 17 | know how to do a thing and to watch |
PO. Box 425 Skøyen | somebody else doing it wrong, without |
0213 Oslo | comment. |
NORWAY | |
Phone : +47 22 13 01 00 | |
Direct: +47 22 13 10 03 | |
Mobile: +47 909 56 963 | |
------------------------+---------------------------------------------+


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andreas Joseph Krogh <andreak(at)officenet(dot)no>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: file-locking and postmaster.pid
Date: 2006-05-23 16:25:48
Message-ID: 15187.1148401548@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andreas Joseph Krogh <andreak(at)officenet(dot)no> writes:
> On Tuesday 23 May 2006 17:54, Tom Lane wrote:
>> The postmaster does check to see whether the PID mentioned in the file
>> is still alive, so it's not that easy for the above to happen. If you
>> can provide details of a scenario where a failure is likely, we'd like
>> to know about it. Also, what PG version are you talking about?

> I have experienced this with PG-8.1.3 and will provide details if I can make
> it happen. Basically it has happened when I have had to "hard-reset" my
> laptop due to some strange bugs in Linux which have made it hang.

If you're talking about a postmaster that's auto-started during the boot
sequence, then there is a risk depending on what start script you use.
The problem is that depending on what else runs during the system
startup, the PID assigned to the postmaster might be the same as in the
last boot cycle, or it might be different by one or two counts. The
postmaster disregards a pidfile containing its own PID, or its parent
process' PID, or a PID not belonging to a postgres-owned process.
That covers most cases but if your start script does something like

su -l postgres -c "pg_ctl start ..."

then you have a situation where not only the parent process (pg_ctl)
but also the grandparent (a shell) is postgres-owned, and if the pidfile
PID happens to match the grandparent then you lose. Solution is to
either not use pg_ctl here, or write "exec pg_ctl start ...", so that
there's only one postgres-owned process besides the postmaster itself.

Initscripts published by PGDG itself and by Red Hat have gotten this
right for awhile, but I suspect the word has not propagated to all
distros.

regards, tom lane


From: Adis Nezirovic <adis(at)linux(dot)org(dot)ba>
To: Andreas Joseph Krogh <andreak(at)officenet(dot)no>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: file-locking and postmaster.pid
Date: 2006-05-23 16:35:11
Message-ID: 20060523163511.GA12309@hiigarah.team.ba
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, May 23, 2006 at 05:23:16PM +0200, Andreas Joseph Krogh wrote:
> Hi all.
>
> I've experienced several times that PG has died somehow and the postmaster.pid
> file still exists 'cause PG hasn't had the ability to delete it upon proper
> shutdown. Upon start-up, after such an incidence, PG tells me another PG is
> running and that I either have to shut down the other instance, or delete the
> postmaster.pid file if there really isn't an instance running. This seems
> totally unnecessary to me. Why doesn't PG use file-locking to tell if another
> PG is running or not? If PG holds an exclusive-lock on the pid-file and the
> process crashes, or shuts down, then the lock(which is process-based and
> controlled by the kernel) will be removed and another PG which tries to start
> up can detect that. Using the existence of the pid-file as the only evidence
> gives too many false positives IMO.

Well, maybe you could tweak postgres startup script, add check for post
master (either 'pgrep postmaster' or 'ps -axu | grep [p]ostmaster'), and
delete pid file on negative results.

i.e.

#!/bin/bash
PID=`pgrep -f /usr/bin/postmaster`;

if [[ $PID ]]; then
echo "'$PID'";
# postgres is already running
else
echo "Postmaster is not running";
# delete stale PID file
fi


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Adis Nezirovic <adis(at)linux(dot)org(dot)ba>
Cc: Andreas Joseph Krogh <andreak(at)officenet(dot)no>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: file-locking and postmaster.pid
Date: 2006-05-23 17:36:41
Message-ID: 16627.1148405801@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Adis Nezirovic <adis(at)linux(dot)org(dot)ba> writes:
> Well, maybe you could tweak postgres startup script, add check for post
> master (either 'pgrep postmaster' or 'ps -axu | grep [p]ostmaster'), and
> delete pid file on negative results.

This is exactly what you should NOT do.

A start script that thinks it is smarter than the postmaster is almost
certainly wrong. It is certainly dangerous, too, because auto-deleting
that pidfile destroys the interlock against having two postmasters
running in the same data directory (which WILL corrupt your data,
quickly and irretrievably). All it takes to cause a problem is to
use the start script to start a postmaster, forgetting that you already
have one running ...

regards, tom lane


From: Adis Nezirovic <adis(at)linux(dot)org(dot)ba>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: file-locking and postmaster.pid
Date: 2006-05-23 18:05:45
Message-ID: 20060523180545.GA14466@hiigarah.team.ba
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, May 23, 2006 at 01:36:41PM -0400, Tom Lane wrote:
> This is exactly what you should NOT do.
>
> A start script that thinks it is smarter than the postmaster is almost
> certainly wrong. It is certainly dangerous, too, because auto-deleting
> that pidfile destroys the interlock against having two postmasters
> running in the same data directory (which WILL corrupt your data,
> quickly and irretrievably). All it takes to cause a problem is to
> use the start script to start a postmaster, forgetting that you already
> have one running ...

I do agree with you that we should not play games with postmaster.
Better to be safe than sorry. (So, manually deleting pid file is the
only safe option). I was just suggestion (possibly dangerous)
workaround.

Btw, I do check for running postmaster, using full path (I don't wan to
kill every postmaster on the system), is this safe? Or there could be
race condition?


From: Andreas Joseph Krogh <andreak(at)officenet(dot)no>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: file-locking and postmaster.pid
Date: 2006-05-24 09:36:22
Message-ID: 200605241136.22497.andreak@officenet.no
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tuesday 23 May 2006 19:36, Tom Lane wrote:
> Adis Nezirovic <adis(at)linux(dot)org(dot)ba> writes:
> > Well, maybe you could tweak postgres startup script, add check for post
> > master (either 'pgrep postmaster' or 'ps -axu | grep [p]ostmaster'), and
> > delete pid file on negative results.
>
> This is exactly what you should NOT do.
>
> A start script that thinks it is smarter than the postmaster is almost
> certainly wrong. It is certainly dangerous, too, because auto-deleting
> that pidfile destroys the interlock against having two postmasters
> running in the same data directory (which WILL corrupt your data,
> quickly and irretrievably). All it takes to cause a problem is to
> use the start script to start a postmaster, forgetting that you already
> have one running ...

My PG is not started with startup-scripts, but with this command:

pg_ctl -D $PGDATA -l $PGDIR/log/logfile-`date +%Y-%m-%d`.log start

--
Andreas Joseph Krogh <andreak(at)officenet(dot)no>
Senior Software Developer / Manager
gpg public_key: http://dev.officenet.no/~andreak/public_key.asc
------------------------+---------------------------------------------+
OfficeNet AS | The most difficult thing in the world is to |
Hoffsveien 17 | know how to do a thing and to watch |
PO. Box 425 Skøyen | somebody else doing it wrong, without |
0213 Oslo | comment. |
NORWAY | |
Phone : +47 22 13 01 00 | |
Direct: +47 22 13 10 03 | |
Mobile: +47 909 56 963 | |
------------------------+---------------------------------------------+


From: Andreas Joseph Krogh <andreak(at)officenet(dot)no>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: file-locking and postmaster.pid
Date: 2006-05-24 09:43:32
Message-ID: 200605241143.32994.andreak@officenet.no
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wednesday 24 May 2006 11:36, Andreas Joseph Krogh wrote:
> On Tuesday 23 May 2006 19:36, Tom Lane wrote:
> > Adis Nezirovic <adis(at)linux(dot)org(dot)ba> writes:
> > > Well, maybe you could tweak postgres startup script, add check for post
> > > master (either 'pgrep postmaster' or 'ps -axu | grep [p]ostmaster'),
> > > and delete pid file on negative results.
> >
> > This is exactly what you should NOT do.
> >
> > A start script that thinks it is smarter than the postmaster is almost
> > certainly wrong. It is certainly dangerous, too, because auto-deleting
> > that pidfile destroys the interlock against having two postmasters
> > running in the same data directory (which WILL corrupt your data,
> > quickly and irretrievably). All it takes to cause a problem is to
> > use the start script to start a postmaster, forgetting that you already
> > have one running ...
>
> My PG is not started with startup-scripts, but with this command:
>
> pg_ctl -D $PGDATA -l $PGDIR/log/logfile-`date +%Y-%m-%d`.log start

... and manually after login, ie. not at boot-time.

--
Andreas Joseph Krogh <andreak(at)officenet(dot)no>
Senior Software Developer / Manager
gpg public_key: http://dev.officenet.no/~andreak/public_key.asc
------------------------+---------------------------------------------+
OfficeNet AS | The most difficult thing in the world is to |
Hoffsveien 17 | know how to do a thing and to watch |
PO. Box 425 Skøyen | somebody else doing it wrong, without |
0213 Oslo | comment. |
NORWAY | |
Phone : +47 22 13 01 00 | |
Direct: +47 22 13 10 03 | |
Mobile: +47 909 56 963 | |
------------------------+---------------------------------------------+


From: "Andrej Ricnik-Bay" <andrej(dot)groups(at)gmail(dot)com>
To: "Andreas Joseph Krogh" <andreak(at)officenet(dot)no>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: file-locking and postmaster.pid
Date: 2006-05-24 18:52:27
Message-ID: b35603930605241152r63e6fccfud647e15e5e5c7df2@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 5/24/06, Andreas Joseph Krogh <andreak(at)officenet(dot)no> wrote:

> > My PG is not started with startup-scripts, but with this command:
> >
> > pg_ctl -D $PGDATA -l $PGDIR/log/logfile-`date +%Y-%m-%d`.log start
>
> ... and manually after login, ie. not at boot-time.
I'd suggest trying to fix your Linux-install instead of mucking
about with Postgres, and this really a pgsql-novice question,
not a -hackers thing.

Cheers,
Andrej

--
Please don't top post, and don't use HTML e-Mail :} Make your quotes concise.

http://www.american.edu/econ/notes/htmlmail.htm


From: korry <korry(at)appx(dot)com>
To: Andreas Joseph Krogh <andreak(at)officenet(dot)no>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: file-locking and postmaster.pid
Date: 2006-05-24 19:03:46
Message-ID: 1148497426.21335.46.camel@sakai.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> I'm sure there's a good reason for having it the way it is, having so many
> smart knowledgeable people working on this project. Could someone please
> explain the rationale of the current solution to me?

We've ignored Andreas' original question. Why not use a lock to
indicate that the postmaster is still running? At first blush, that
seems more reliable than checking for a (possibly recycled) process ID.

-- Korry


From: Andreas Joseph Krogh <andreak(at)officenet(dot)no>
To: korry(at)appx(dot)com
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: file-locking and postmaster.pid
Date: 2006-05-24 19:32:31
Message-ID: 200605242132.32006.andreak@officenet.no
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wednesday 24 May 2006 21:03, korry wrote:
> > I'm sure there's a good reason for having it the way it is, having so
> > many smart knowledgeable people working on this project. Could someone
> > please explain the rationale of the current solution to me?
>
> We've ignored Andreas' original question. Why not use a lock to
> indicate that the postmaster is still running? At first blush, that
> seems more reliable than checking for a (possibly recycled) process ID.

As Tom replied: Portability.

--
Andreas Joseph Krogh <andreak(at)officenet(dot)no>
Senior Software Developer / Manager
gpg public_key: http://dev.officenet.no/~andreak/public_key.asc
------------------------+---------------------------------------------+
OfficeNet AS | The most difficult thing in the world is to |
Hoffsveien 17 | know how to do a thing and to watch |
PO. Box 425 Skøyen | somebody else doing it wrong, without |
0213 Oslo | comment. |
NORWAY | |
Phone : +47 22 13 01 00 | |
Direct: +47 22 13 10 03 | |
Mobile: +47 909 56 963 | |
------------------------+---------------------------------------------+


From: Andreas Joseph Krogh <andreak(at)officenet(dot)no>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: file-locking and postmaster.pid
Date: 2006-05-24 19:33:27
Message-ID: 200605242133.27512.andreak@officenet.no
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wednesday 24 May 2006 20:52, Andrej Ricnik-Bay wrote:
> On 5/24/06, Andreas Joseph Krogh <andreak(at)officenet(dot)no> wrote:
> > > My PG is not started with startup-scripts, but with this command:
> > >
> > > pg_ctl -D $PGDATA -l $PGDIR/log/logfile-`date +%Y-%m-%d`.log start
> >
> > ... and manually after login, ie. not at boot-time.
>
> I'd suggest trying to fix your Linux-install instead of mucking
> about with Postgres, and this really a pgsql-novice question,
> not a -hackers thing.

I'm sorry, can't resist, but this has to be *the* dumbest reply to these sort
of questions. What makes you think it *only* happens when linux freezes(btw,
I suspect my NVIDIA-driver to be the problem on my laptop, not Linux itself).
Still - PG *should* handle that situation too, it's like a power outage. I've
been using Linux exclusively since '96 and PG since 6.5, so I don't consider
myself a novice in neither. Why PG doesn't use locking *is* definitely
a -hackers thing.

--
Andreas Joseph Krogh <andreak(at)officenet(dot)no>
Senior Software Developer / Manager
gpg public_key: http://dev.officenet.no/~andreak/public_key.asc
------------------------+---------------------------------------------+
OfficeNet AS | The most difficult thing in the world is to |
Hoffsveien 17 | know how to do a thing and to watch |
PO. Box 425 Skøyen | somebody else doing it wrong, without |
0213 Oslo | comment. |
NORWAY | |
Phone : +47 22 13 01 00 | |
Direct: +47 22 13 10 03 | |
Mobile: +47 909 56 963 | |
------------------------+---------------------------------------------+


From: korry <korry(at)appx(dot)com>
To: Andreas Joseph Krogh <andreak(at)officenet(dot)no>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: file-locking and postmaster.pid
Date: 2006-05-24 20:01:43
Message-ID: 1148500903.21335.51.camel@sakai.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> On Wednesday 24 May 2006 21:03, korry wrote:
> > > I'm sure there's a good reason for having it the way it is, having so
> > > many smart knowledgeable people working on this project. Could someone
> > > please explain the rationale of the current solution to me?
> >
> > We've ignored Andreas' original question. Why not use a lock to
> > indicate that the postmaster is still running? At first blush, that
> > seems more reliable than checking for a (possibly recycled) process ID.
>
> As Tom replied: Portability.

Thanks - I missed that part of Tom's message.

The only platform (although certainly not a minor issue) that I can
think of that would have a portability issue would be Win32. You can't
even read a locked byte in Win32. I usually solve that problem by
locking a byte past the end of the file (which is portable).

Is there some other portability issue that I'm missing?

-- Korry


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: korry <korry(at)appx(dot)com>
Cc: Andreas Joseph Krogh <andreak(at)officenet(dot)no>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: file-locking and postmaster.pid
Date: 2006-05-24 20:06:36
Message-ID: 20060524200636.GG5028@surnet.cl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

korry wrote:

> The only platform (although certainly not a minor issue) that I can
> think of that would have a portability issue would be Win32. You can't
> even read a locked byte in Win32. I usually solve that problem by
> locking a byte past the end of the file (which is portable).

Certainly on all platforms there must be *some* locking primitive. We
just need to figure out the appropiate parameters to fcntl() or flock()
or lockf() on each.

The Win32 API for locking seems mighty strange to me.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


From: korry <korry(at)appx(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: file-locking and postmaster.pid
Date: 2006-05-24 20:19:33
Message-ID: 1148501973.21335.62.camel@sakai.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> Certainly on all platforms there must be *some* locking primitive. We
> just need to figure out the appropiate parameters to fcntl() or flock()
> or lockf() on each.

Right.

> The Win32 API for locking seems mighty strange to me.

Linux/Unix byte locking is advisory (meaning that one lock can block
another lock, but it can't block a read). Win32 locking is mandatory
(at least in the most portable form) so a lock blocks a reader. To
avoid that problem, youlock a byte that you never intend to read (that
is, you lock a byte past the end of the file). Locking past the
end-of-file is portable to all Unix/Linux systems that I've seen (that
way, you can lock a region of a file before you grow the file).

-- Korry


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: korry <korry(at)appx(dot)com>, Andreas Joseph Krogh <andreak(at)officenet(dot)no>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: file-locking and postmaster.pid
Date: 2006-05-24 20:30:12
Message-ID: 4474C254.2090801@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Alvaro Herrera wrote:
> korry wrote:
>
>
>> The only platform (although certainly not a minor issue) that I can
>> think of that would have a portability issue would be Win32. You can't
>> even read a locked byte in Win32. I usually solve that problem by
>> locking a byte past the end of the file (which is portable).
>>
>
> Certainly on all platforms there must be *some* locking primitive. We
> just need to figure out the appropiate parameters to fcntl() or flock()
> or lockf() on each.
>
> The Win32 API for locking seems mighty strange to me.
>
>

We use file locking on Win32 (and on all other platforms) in the
buildfarm ... it's done from perl so maybe perl does some magic under
the hood. The call looks just the same, and works fine on W32, I
believe. It is roughly:

use Fcntl qw(:flock);
open($lockfile,">builder.LCK") || die "opening lockfile";
exit(0) unless flock($lockfile,LOCK_EX|LOCK_NB);

cheers

andrew


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: korry <korry(at)appx(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: file-locking and postmaster.pid
Date: 2006-05-24 20:34:40
Message-ID: 20060524203440.GA6607@surnet.cl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

korry wrote:

> > The Win32 API for locking seems mighty strange to me.
>
> Linux/Unix byte locking is advisory (meaning that one lock can block
> another lock, but it can't block a read).

No -- it is advisory meaning that a process that does not try to acquire
the lock is not locked out. You can certainly block a file in exclusive
mode, using the LOCK_EX flag. (And at least on my Linux system, there
is mandatory locking too, using the fcntl() interface).

I think the next question is -- how would the lock interface be used?
We could acquire an exclusive lock on postmaster start (to make sure no
backend is running), then reduce it to a shared lock. Every backend
would inherit the shared lock. But the lock exchange is not guaranteed
to be atomic so a new postmaster could start just after we acquire the
lock and acquire the shared lock. It'd need to be complemented with
another lock.

> Win32 locking is mandatory (at least in the most portable form) so a
> lock blocks a reader.

There is also shared/exclusive locking of a file on Win32. My comment
weas more directed at the fact that you have to "create some sort of
lock handle" from a file handle and then lock the lock handle, or
something like that. I don't recall the exact details but it was
strange (as opposed to just open and then flock).

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: korry <korry(at)appx(dot)com>, Andreas Joseph Krogh <andreak(at)officenet(dot)no>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: file-locking and postmaster.pid
Date: 2006-05-24 20:34:47
Message-ID: 1623.1148502887@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> Certainly on all platforms there must be *some* locking primitive. We
> just need to figure out the appropiate parameters to fcntl() or flock()
> or lockf() on each.

Quite aside from the hassle factor of needing to deal with N variants of
the syscalls, I'm not convinced that it's guaranteed to work. ISTR that
for instance NFS file locking is pretty much Alice-in-Wonderland :-(

Since the entire point here is to have a guaranteed bulletproof check,
locks that work most of the time on most platforms/filesystems aren't
gonna be an improvement.

regards, tom lane


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: korry <korry(at)appx(dot)com>, Andreas Joseph Krogh <andreak(at)officenet(dot)no>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: file-locking and postmaster.pid
Date: 2006-05-24 20:36:54
Message-ID: 20060524203654.GB6607@surnet.cl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Dunstan wrote:

> We use file locking on Win32 (and on all other platforms) in the
> buildfarm ... it's done from perl so maybe perl does some magic under
> the hood. The call looks just the same, and works fine on W32, I
> believe. It is roughly:
>
> use Fcntl qw(:flock);
> open($lockfile,">builder.LCK") || die "opening lockfile";
> exit(0) unless flock($lockfile,LOCK_EX|LOCK_NB);

flock on Perl is implemented using platform-dependent system calls. Per
the docs,

flock FILEHANDLE,OPERATION
Calls flock(2), or an emulation of it, on FILEHANDLE. Returns
true for success, false on failure. Produces a fatal error if
used on a machine that doesn't implement flock(2), fcntl(2)
locking, or lockf(3). "flock" is Perl's portable file locking
interface, although it locks only entire files, not records.

Note that it may fail! This seems to indicate that some platforms do
not provide either locking mechanism.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>, korry <korry(at)appx(dot)com>, Andreas Joseph Krogh <andreak(at)officenet(dot)no>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: file-locking and postmaster.pid
Date: 2006-05-24 20:40:04
Message-ID: 20060524204004.GC6607@surnet.cl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Alvaro Herrera wrote:

> Note that it may fail! This seems to indicate that some platforms do
> not provide either locking mechanism.

(Which means the whole discussion is a waste of time)

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>, korry <korry(at)appx(dot)com>, Andreas Joseph Krogh <andreak(at)officenet(dot)no>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: file-locking and postmaster.pid
Date: 2006-05-24 21:06:49
Message-ID: 4474CAE9.3030604@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Alvaro Herrera wrote:
> Alvaro Herrera wrote:
>
>
>> Note that it may fail! This seems to indicate that some platforms do
>> not provide either locking mechanism.
>>
>
> (Which means the whole discussion is a waste of time)
>
>

Umm, no, I don't think so. It will block instead of failing unless you
request a non blocking call. Failure means someone else holds the lock.

But what Tom says about NFS is probably true, and a good enough reason
not to trust locking in general for this purpose, I think

cheers

andrew


From: korry <korry(at)appx(dot)com>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: file-locking and postmaster.pid
Date: 2006-05-24 21:23:57
Message-ID: 1148505837.21335.76.camel@sakai.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, 2006-05-24 at 16:34 -0400, Alvaro Herrera wrote:

> korry wrote:
>
> > > The Win32 API for locking seems mighty strange to me.
> >
> > Linux/Unix byte locking is advisory (meaning that one lock can block
> > another lock, but it can't block a read).
>
> No -- it is advisory meaning that a process that does not try to acquire
> the lock is not locked out.

Right, that's why I said "can block" instead of "will block". An
advisory lock will only block another locker, not another reader (except
in Win32).

> You can certainly block a file in exclusive
> mode, using the LOCK_EX flag. (And at least on my Linux system, there
> is mandatory locking too, using the fcntl() interface).

My fault - I'm not really talking about "file locking", I'm talking
about byte-range locking (via lockf() and family).

I don't believe that you can use byte-range locking to block read-access
to a file, you can only use byte-range locking to block other locks.

A simple exclusive lock on the first byte past the end of the file will
do.

> I think the next question is -- how would the lock interface be used?
> We could acquire an exclusive lock on postmaster start (to make sure no
> backend is running), then reduce it to a shared lock. Every backend
> would inherit the shared lock. But the lock exchange is not guaranteed
> to be atomic so a new postmaster could start just after we acquire the
> lock and acquire the shared lock. It'd need to be complemented with
> another lock.

You never need to reduce it to a shared lock. On postmaster startup,
try to lock the sentinel byte (one byte past the end-of-file). If you
can lock it, you know that no other postmaster has that byte locked. If
you can't lock it, another postmaster is running. It is an atomic
operation.

However, Tom may be correct about NFS locking, but I guess I'm surprised
that anyone would care :-)

> > Win32 locking is mandatory (at least in the most portable form) so a
> > lock blocks a reader.
>
> There is also shared/exclusive locking of a file on Win32.

Yes, but Win32 shared locking only works on NTFS-type file systems. And
you don't need shared locking anyway.

-- Korry


From: korry <korry(at)appx(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: file-locking and postmaster.pid
Date: 2006-05-24 21:28:46
Message-ID: 1148506126.21335.81.camel@sakai.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> > Certainly on all platforms there must be *some* locking primitive. We
> > just need to figure out the appropiate parameters to fcntl() or flock()
> > or lockf() on each.

I use lockf() (not fcntl() or flock()) on every platform other than
Win32. Of course, I may not run on every system that PostgreSQL
supports.

>
> Quite aside from the hassle factor of needing to deal with N variants of
> the syscalls, I'm not convinced that it's guaranteed to work. ISTR that
> for instance NFS file locking is pretty much Alice-in-Wonderland :-(
>
> Since the entire point here is to have a guaranteed bulletproof check,
> locks that work most of the time on most platforms/filesystems aren't
> gonna be an improvement.

NFS file locking may certainly be problematic. I don't know about NFS
byte-range locking.

What we currently have in place is not bulletproof. I think holding a
byte-range lock in addition to the "is there some process with the right
pid?" check might be a little more bullet resistant :-)

-- Korry


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: korry(at)appx(dot)com
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: file-locking and postmaster.pid
Date: 2006-05-24 21:33:02
Message-ID: 2321.1148506382@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

korry <korry(at)appx(dot)com> writes:
> However, Tom may be correct about NFS locking, but I guess I'm surprised
> that anyone would care :-)

Whether we think it's a real good idea or not, *plenty* of people run
databases across NFS. We can't blow off that set of users.

regards, tom lane


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: korry <korry(at)appx(dot)com>, Andreas Joseph Krogh <andreak(at)officenet(dot)no>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: file-locking and postmaster.pid
Date: 2006-05-24 21:33:06
Message-ID: 20060524213306.GB7412@surnet.cl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Dunstan wrote:
> Alvaro Herrera wrote:
> >Alvaro Herrera wrote:
> >
> >>Note that it may fail! This seems to indicate that some platforms do
> >>not provide either locking mechanism.
> >
> >(Which means the whole discussion is a waste of time)
>
> Umm, no, I don't think so. It will block instead of failing unless you
> request a non blocking call. Failure means someone else holds the lock.

I removed the part of the manual I had written which said that it will
raise an error if the platform it's running doesn't have any locking
primitive.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: korry <korry(at)appx(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: file-locking and postmaster.pid
Date: 2006-05-24 21:35:02
Message-ID: 20060524213502.GC7412@surnet.cl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

korry wrote:

> > I think the next question is -- how would the lock interface be used?
> > We could acquire an exclusive lock on postmaster start (to make sure no
> > backend is running), then reduce it to a shared lock. Every backend
> > would inherit the shared lock. But the lock exchange is not guaranteed
> > to be atomic so a new postmaster could start just after we acquire the
> > lock and acquire the shared lock. It'd need to be complemented with
> > another lock.
>
> You never need to reduce it to a shared lock. On postmaster startup,
> try to lock the sentinel byte (one byte past the end-of-file). If you
> can lock it, you know that no other postmaster has that byte locked. If
> you can't lock it, another postmaster is running. It is an atomic
> operation.

This doesn't work if the postmaster dies but a backend continues to run,
which is arguably the most important case we need to protect against.

> However, Tom may be correct about NFS locking, but I guess I'm surprised
> that anyone would care :-)

Quite a lot of people run NFS-mounted data directories ...

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: korry(at)appx(dot)com
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: file-locking and postmaster.pid
Date: 2006-05-24 21:46:00
Message-ID: 2445.1148507160@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

korry <korry(at)appx(dot)com> writes:
> What we currently have in place is not bulletproof.

Well, it fails in the safe direction: the postmaster may occasionally
refuse to start when it should, but it won't ever start when it should
not. It appears to me that anything relying on file locking will tend
to fail in the other direction, and that's not acceptable IMHO.

regards, tom lane


From: korry <korry(at)appx(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: file-locking and postmaster.pid
Date: 2006-05-24 22:21:14
Message-ID: 1148509274.21335.90.camel@sakai.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> > What we currently have in place is not bulletproof.
>
> Well, it fails in the safe direction: the postmaster may occasionally
> refuse to start when it should, but it won't ever start when it should
> not. It appears to me that anything relying on file locking will tend
> to fail in the other direction, and that's not acceptable IMHO.

I was suggesting that we keep the current check in place too - if the
lock exists, another postmaster must be running, if the lock doesn't
exist, check the pid.

However...

Thinking a little harder about Andreas' original suggestion... what he's
really suggesting is an exclusion mechanism that relies on the kernel to
clean up after a shared process (with no danger of recycling, like a pid
will do).

How about a semaphore with a SEM_UNDO? That's guaranteed atomic (or it
better be :-), the kernel automatically cleans up after a failure, if
the mechanism fails, it fails in the safe direction (the kernel may not
have cleaned up the semaphore before a new postmaster starts). And, I
think it would be reasonably portable - I haven't carefully eyeballed
the Win32 semaphore code so I don't know if it supports SEM_UNDO.

(Sorry if this has been suggested before)

-- Korry


From: korry <korry(at)appx(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: file-locking and postmaster.pid
Date: 2006-05-24 22:34:30
Message-ID: 1148510070.21335.97.camel@sakai.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> > You never need to reduce it to a shared lock. On postmaster startup,
> > try to lock the sentinel byte (one byte past the end-of-file). If you
> > can lock it, you know that no other postmaster has that byte locked. If
> > you can't lock it, another postmaster is running. It is an atomic
> > operation.
>
> This doesn't work if the postmaster dies but a backend continues to run,
> which is arguably the most important case we need to protect against.

I may be confused here, but I don't see the problem - byte-range locks
are not inherited across a fork. A backend would never hold the lock, a
backend would never even look for the lock.

> > However, Tom may be correct about NFS locking, but I guess I'm surprised
> > that anyone would care :-)
>
> Quite a lot of people run NFS-mounted data directories ...

I'm happy to take your word for that, and I agree that if NFS is
important and locking is brain-dead on NFS, then relying solely on a
lock is unacceptable.

-- Korry


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: korry(at)appx(dot)com
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: file-locking and postmaster.pid
Date: 2006-05-24 22:36:47
Message-ID: 2800.1148510207@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

korry <korry(at)appx(dot)com> writes:
>> Well, it fails in the safe direction: the postmaster may occasionally
>> refuse to start when it should, but it won't ever start when it should
>> not. It appears to me that anything relying on file locking will tend
>> to fail in the other direction, and that's not acceptable IMHO.

> I was suggesting that we keep the current check in place too - if the
> lock exists, another postmaster must be running, if the lock doesn't
> exist, check the pid.

But then you've not accomplished anything. The complaints about the
pid-based mechanism are about false positives, not false negatives.
Adding an independent check won't eliminate the false positives.

> How about a semaphore with a SEM_UNDO? That's guaranteed atomic (or it
> better be :-), the kernel automatically cleans up after a failure, if
> the mechanism fails, it fails in the safe direction (the kernel may not
> have cleaned up the semaphore before a new postmaster starts). And, I
> think it would be reasonably portable - I haven't carefully eyeballed
> the Win32 semaphore code so I don't know if it supports SEM_UNDO.

We already have two platforms that don't use the SysV semaphore
interface, and even on ones that have it, I wouldn't want to assume they
all support SEM_UNDO.

But aside from any portability issues, ISTM this would have its own
failure modes. In particular you still have to rely on a pid-file
(only now it's holding a semaphore ID not a PID), and there's still
a bit of a leap of faith required to get from the observation that
somebody is holding a lock on semaphore X to the conclusion that that
somebody is a conflicting postmaster. It doesn't look to me like this
is any better than the PID solution, really, as far as false positives
go. As for false negatives: ipcrm.

regards, tom lane


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: korry <korry(at)appx(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: file-locking and postmaster.pid
Date: 2006-05-24 22:38:57
Message-ID: 20060524223857.GE7412@surnet.cl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

korry wrote:
> > > You never need to reduce it to a shared lock. On postmaster startup,
> > > try to lock the sentinel byte (one byte past the end-of-file). If you
> > > can lock it, you know that no other postmaster has that byte locked. If
> > > you can't lock it, another postmaster is running. It is an atomic
> > > operation.
> >
> > This doesn't work if the postmaster dies but a backend continues to run,
> > which is arguably the most important case we need to protect against.
>
> I may be confused here, but I don't see the problem - byte-range locks
> are not inherited across a fork. A backend would never hold the lock, a
> backend would never even look for the lock.

Well, you are wrong here. We _want_ every backend to hold a shared
lock. We need to stop a postmaster from starting if there is a backend
running that was started by a no-longer-running postmaster.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


From: korry <korry(at)appx(dot)com>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: file-locking and postmaster.pid
Date: 2006-05-24 22:53:11
Message-ID: 1148511191.16791.1.camel@sakai.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> > > > You never need to reduce it to a shared lock. On postmaster startup,
> > > > try to lock the sentinel byte (one byte past the end-of-file). If you
> > > > can lock it, you know that no other postmaster has that byte locked. If
> > > > you can't lock it, another postmaster is running. It is an atomic
> > > > operation.
> > >
> > > This doesn't work if the postmaster dies but a backend continues to run,
> > > which is arguably the most important case we need to protect against.
> >
> > I may be confused here, but I don't see the problem - byte-range locks
> > are not inherited across a fork. A backend would never hold the lock, a
> > backend would never even look for the lock.
>
> Well, you are wrong here. We _want_ every backend to hold a shared
> lock. We need to stop a postmaster from starting if there is a backend
> running that was started by a no-longer-running postmaster.

Oh... didn't know that. How is that accomplished now? There must be
some code beside the pid file check.

-- Korry


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: korry <korry(at)appx(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: file-locking and postmaster.pid
Date: 2006-05-24 22:53:34
Message-ID: 2905.1148511214@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> Well, you are wrong here. We _want_ every backend to hold a shared
> lock. We need to stop a postmaster from starting if there is a backend
> running that was started by a no-longer-running postmaster.

Note that we currently rely on checking for SysV shared memory attach
counts to protect against this case; the postmaster PID doesn't enter
into it. We don't have to insist on the postmaster interlock handling
this too. (Although surely it'd be nice to not depend on SysV attach
counts for this, because that's a portability issue in itself.)

regards, tom lane


From: korry <korry(at)appx(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: file-locking and postmaster.pid
Date: 2006-05-24 23:12:39
Message-ID: 1148512359.16791.14.camel@sakai.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> We already have two platforms that don't use the SysV semaphore
> interface, and even on ones that have it, I wouldn't want to assume they
> all support SEM_UNDO.

Which platforms, just out of curiousity? I assume that Win32 is one of
them.

> But aside from any portability issues, ISTM this would have its own
> failure modes. In particular you still have to rely on a pid-file
> (only now it's holding a semaphore ID not a PID)

You've lost me... why would you store the semid and not the pid? I was
thinking that the semid might be a postgresql.conf thingie.

> and there's still
> a bit of a leap of faith required to get from the observation that
> somebody is holding a lock on semaphore X to the conclusion that that
> somebody is a conflicting postmaster.

Isn't that sort of like saying that if a postmaster.pid file exists, it
must have been written by a postmaster? Pick a semaphore id and
dedicate it to postmaster exclusion.

> It doesn't look to me like this
> is any better than the PID solution, really, as far as false positives
> go.

As long as the kernel cleans up SEM_UNDO semaphores, I guess I don't see
have you would have a false positive. Oh, I guess I should say that is
you use a SEM_UNDO semaphore, you don't need the pid check anymore.
And, no worry about NFS.

> As for false negatives: ipcrm.

Yes, that's a problem, but I think it's the same as "rm postmaster.pid",
isn't it?


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: korry(at)appx(dot)com
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: file-locking and postmaster.pid
Date: 2006-05-24 23:55:49
Message-ID: 5779.1148514949@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

korry <korry(at)appx(dot)com> writes:
> Isn't that sort of like saying that if a postmaster.pid file exists, it
> must have been written by a postmaster? Pick a semaphore id and
> dedicate it to postmaster exclusion.

That's not workable, unless you want to assume that nothing on the
system except Postgres uses SysV semaphores. Otherwise something else
could randomly gobble up the semid you want to use. I don't care very
much for requiring a distinct semid to be hand-specified for each
postmaster on a machine, either. At least for my use, that would be a
grade-A PITA: I normally have several postmasters of different vintages
running on the same development machine, and having to configure each
one with its own semid is an extra step I'd rather not deal with.

> As long as the kernel cleans up SEM_UNDO semaphores, I guess I don't see
> have you would have a false positive.

My point was that you couldn't reliably tell a postmaster interested in
a different data directory from a postmaster interested in your own data
directory. Even with a configured semid, I don't see that that's real
reliable. I know the first thing I'd do is fix my postmaster start
scripts to specify semid on the command line rather than requiring it
to be in the conf file, and as soon as I do that, the connection to
the data directory is gone :-( --- now my security is utterly dependent
on not screwing up by launching a postmaster with the wrong semid for
the data directory it's pointed at.

The only scenario where the PID-based solution is at serious risk of
false positives is where there are multiple postmasters on the same
machine, so unless you've got a bulletproof answer for this case, you
haven't made an improvement over what we've got.

Anyway the real problem here is that neither PIDs nor semids are
strongly wired to a particular data directory, which is the thing you're
really trying to protect. File locks would really be much nicer all
around, if we could trust them, because they *would* be directly
connected to a data directory.

regards, tom lane


From: korry <korry(at)appx(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: file-locking and postmaster.pid
Date: 2006-05-25 12:35:14
Message-ID: 1148560514.18823.4.camel@sakai.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> That's not workable, unless you want to assume that nothing on the
> system except Postgres uses SysV semaphores. Otherwise something else
> could randomly gobble up the semid you want to use. I don't care very
> much for requiring a distinct semid to be hand-specified for each
> postmaster on a machine, either.

Yeah, that does suck. Ok, naming problems seem to make semaphores
useless.

I'm back to byte-range locking, but if NFS is important and is truly
unreliable, then that's out too.

I've never had locking problems on NFS (probably because we tell our
users not to use NFS), but now that I think about it, SMB locking is
very unreliable so Win32 would be an issue too.

-- Korry


From: Andreas Joseph Krogh <andreak(at)officenet(dot)no>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: file-locking and postmaster.pid
Date: 2006-05-25 12:52:18
Message-ID: 200605251452.18327.andreak@officenet.no
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thursday 25 May 2006 14:35, korry wrote:
> > That's not workable, unless you want to assume that nothing on the
> > system except Postgres uses SysV semaphores. Otherwise something else
> > could randomly gobble up the semid you want to use. I don't care very
> > much for requiring a distinct semid to be hand-specified for each
> > postmaster on a machine, either.
>
> Yeah, that does suck. Ok, naming problems seem to make semaphores
> useless.
>
> I'm back to byte-range locking, but if NFS is important and is truly
> unreliable, then that's out too.
>
> I've never had locking problems on NFS (probably because we tell our
> users not to use NFS), but now that I think about it, SMB locking is
> very unreliable so Win32 would be an issue too.

What I don't get is why everybody think that because one solution doesn't fit
all needs on all platforms(or NFS), it shouldn't be implemented on those
platforms it *does* work on. Why can't those platforms(like Linux) benefit
from a better solution, if one exists? There are plenty of examples of
software providing better solutions on platforms supporting more features.

--
Andreas Joseph Krogh <andreak(at)officenet(dot)no>
Senior Software Developer / Manager
gpg public_key: http://dev.officenet.no/~andreak/public_key.asc
------------------------+---------------------------------------------+
OfficeNet AS | The most difficult thing in the world is to |
Hoffsveien 17 | know how to do a thing and to watch |
PO. Box 425 Skøyen | somebody else doing it wrong, without |
0213 Oslo | comment. |
NORWAY | |
Phone : +47 22 13 01 00 | |
Direct: +47 22 13 10 03 | |
Mobile: +47 909 56 963 | |
------------------------+---------------------------------------------+


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andreas Joseph Krogh <andreak(at)officenet(dot)no>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: file-locking and postmaster.pid
Date: 2006-05-25 14:10:41
Message-ID: 12640.1148566241@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andreas Joseph Krogh <andreak(at)officenet(dot)no> writes:
> What I don't get is why everybody think that because one solution doesn't fit
> all needs on all platforms(or NFS), it shouldn't be implemented on those
> platforms it *does* work on.

(1) Because we're not really interested in supporting multiple fundamentally
different approaches to postmaster interlocking. The system is
complicated enough already.

(2) Because according to discussion so far, we can't rely on this "solution"
anywhere. Postgres can't easily tell whether its data directory is
mounted over NFS, for example.

regards, tom lane