Re: POSIX shared memory redux

From: "A(dot)M(dot)" <agentm(at)themactionfaction(dot)com>
To: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: POSIX shared memory redux
Date: 2011-04-14 14:34:14
Message-ID: 69D09C0C-B16A-4BE6-B360-CFB72ABB83EA@themactionfaction.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On Apr 14, 2011, at 8:22 AM, Florian Weimer wrote:

> * Tom Lane:
>
>> Well, the fundamental point is that "ignoring NFS" is not the real
>> world. We can't tell people not to put data directories on NFS,
>> and even if we did tell them not to, they'd still do it. And NFS
>> locking is not trustworthy, because the remote lock daemon can crash
>> and restart (forgetting everything it ever knew) while your own machine
>> and the postmaster remain blissfully awake.
>
> Is this still the case with NFSv4? Does the local daemon still keep
> the lock state?

The lock handling has been fixed in NFSv4.

http://nfs.sourceforge.net/
"NFS Version 4 introduces support for byte-range locking and share reservation. Locking in NFS Version 4 is lease-based, so an NFS Version 4 client must maintain contact with an NFS Version 4 server to continue extending its open and lock leases."

http://linux.die.net/man/2/flock
"flock(2) does not lock files over NFS. Use fcntl(2) instead: that does work over NFS, given a sufficiently recent version of Linux and a server which supports locking."

I would need some more time to dig up what "recent version of Linux" specifies, but NFSv4 is likely required.

>
>> None of this is to say that an fcntl lock might not be a useful addition
>> to what we do already. It is to say that fcntl can't just replace what
>> we do already, because there are real-world failure cases that the
>> current solution handles and fcntl alone wouldn't.
>
> If it requires NFS misbehavior (possibly in an older version), and you
> have to start postmasters on separate nodes (which you normally
> wouldn't do), doesn't this make it increasingly unlikely that it's
> going to be triggered in the wild?

With the patch I offer, it would be possible to use shared storage and failover postgresql nodes on different machines over NFS. (The second postmaster blocks and waits for the lock to be released.) Obviously, such as a setup isn't as strong as using replication, but given a sufficiently fail-safe shared storage setup, it could be made reliable.

Cheers,
M

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2011-04-14 14:43:16 Re: Single client performance on trivial SELECTs
Previous Message A.M. 2011-04-14 14:26:33 Re: POSIX shared memory redux