Re: POSIX shared memory support

Lists: pgsql-patches
From: Chris Marcellino <cmarcellino(at)apple(dot)com>
To: pgsql-patches(at)postgresql(dot)org
Subject: POSIX shared memory support
Date: 2007-02-27 05:00:09
Message-ID: 6C946C32-2887-416B-8DD3-EFAB9C13A4E8@apple.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches

Recapitulating and addressing some of the issues with my previous
attempt at this feature:

PostgreSQL currently uses the System V shared memory APIs to access
shared memory. On Mac OS X and other BSDs,
the default System V shared memory limits are often very low and
require adjustment for acceptable performance. Par-
ticularly, when Postgres is included as part of larger end-user
friendly software products, these kernel settings are often
difficult to change programatically.

The System V shared memory facilities provide a method to determine
who is attached to a shared memory segment.
This is used to prevent backends that were orphaned by crashed or
killed database processes from corrupting the data-
base as it is restarted. The same effect can be achieved with using
the POSIX APIs, but since the POSIX library does not
have a way to check who is attached to a segment, atomic segment
creation must be used to ensure exclusive access to
the database.

In order for this to work, the key name used to open and create the
shared memory segment must be unique for each
data directory. This is done by using a strong hash of the canonical
form of the data directory’s pathname. This also re-
moves any risk of other applications, or other databases’ memory
segments colliding with the current shared memory
segment, which conveniently simplifies the logic.

The algorithm changes are described by the before and after flowchart
PDF included in the tarball below.

There is also a Windows version of this patch included, which can
replace the current SysV-to-Win32 shared memory
layer as it currently does not check for orphaned backends in the
database. If this is used,
src/backend/port/win32/shmem.c and its makefile reference can be
removed.

Included is a patch for the configure.in file to opt-in all of the
supported platforms to use this POSIX shared memory
layer except those that are explicitly excluded (of course, this
could be done in reverse if desired). Notably, NetBSD and
OpenBSD do not support the POSIX shared memory calls and their
template patches are included. Other platforms
should be tested and excluded as needed. In addition, the
documentation section 16.4.1 ‘Shared Memory and Sema-
phores’ could be largely pruned and updated to reflect use of the
POSIX calls.

The patches are available here (the postings were being silently
dropped when I attached this large of a file):

http://homepage.mac.com/cmarcellino/postgres-posix-shmem.tar

Please let me know if there is a better way to post this.

Thanks for your feedback,
Chris Marcellino


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Chris Marcellino <cmarcellino(at)apple(dot)com>
Cc: pgsql-patches(at)postgresql(dot)org
Subject: Re: POSIX shared memory support
Date: 2007-02-27 06:43:11
Message-ID: 13087.1172558591@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches

Chris Marcellino <cmarcellino(at)apple(dot)com> writes:
> The System V shared memory facilities provide a method to determine
> who is attached to a shared memory segment.
> This is used to prevent backends that were orphaned by crashed or
> killed database processes from corrupting the data-
> base as it is restarted. The same effect can be achieved with using
> the POSIX APIs,

... except that it can't ...

> but since the POSIX library does not
> have a way to check who is attached to a segment, atomic segment
> creation must be used to ensure exclusive access to
> the database.

How does that fix the problem? If you can't actually tell whether
someone is attached to an existing segment, then you're still up against
the basic rock-and-a-hard-place issue: either you assume there is no one
there (and corrupt your database if you're wrong) or you assume there is
someone there (and force manual intervention by the DBA to recover after
postmaster crashes). Neither of these alternatives is really acceptable.

> In order for this to work, the key name used to open and create the
> shared memory segment must be unique for each
> data directory. This is done by using a strong hash of the canonical
> form of the data directorys pathname.

"Strong hash" is not a guarantee, even if you could promise that you
could get a unique canonical path, which I doubt you can. In any case
this fails if the DBA decides to rename the directory on the fly (don't
laugh; not only are there instances of that in our archives, there are
people opining that we need to allow it --- even with the postmaster
still running).

> This also re-
> moves any risk of other applications, or other databases memory
> segments colliding with the current shared memory
> segment, which conveniently simplifies the logic.

How exactly does it remove that risk? I think you're wishfully-thinking
that if you are creating an unreadable hash value then there will never
be any collisions against someone else with the same touching faith that
*his* unreadable hash values will never collide with anyone else's.
Doesn't give me a lot of comfort. Not that it matters, since the
approach is broken even if this specific assumption were sustainable.

regards, tom lane


From: Chris Marcellino <cmarcellino(at)apple(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-patches(at)postgresql(dot)org
Subject: Re: POSIX shared memory support
Date: 2007-02-27 07:29:10
Message-ID: B667A14D-EFA2-497B-9C5F-FEC70AFD7D57@apple.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches


On Feb 26, 2007, at 10:43 PM, Tom Lane wrote:

> Chris Marcellino <cmarcellino(at)apple(dot)com> writes:
>> The System V shared memory facilities provide a method to determine
>> who is attached to a shared memory segment.
>> This is used to prevent backends that were orphaned by crashed or
>> killed database processes from corrupting the data-
>> base as it is restarted. The same effect can be achieved with using
>> the POSIX APIs,
>
> ... except that it can't ...
>
>> but since the POSIX library does not
>> have a way to check who is attached to a segment, atomic segment
>> creation must be used to ensure exclusive access to
>> the database.
>
> How does that fix the problem? If you can't actually tell whether
> someone is attached to an existing segment, then you're still up
> against
> the basic rock-and-a-hard-place issue: either you assume there is
> no one
> there (and corrupt your database if you're wrong) or you assume
> there is
> someone there (and force manual intervention by the DBA to recover
> after
> postmaster crashes). Neither of these alternatives is really
> acceptable.

Ignoring the case where backends are still alive in the database,
since they would require intervention or patience either way, there
are two options:
1) There is a postmaster/backend still running and you try to start
another postmaster: the unique segment cannot be closed and
atomically recreated and will fail as it does in the current
implementation.
2) There are no errant processes still in the database: the segment
can be closed and atomically recreated.

Try making a build with the patch, then start a postmaster for a
given folder, delete the lock file and start another postmaster (on a
different port) in that folder. Please let me know if I am
overlooking something.

>
>> In order for this to work, the key name used to open and create the
>> shared memory segment must be unique for each
>> data directory. This is done by using a strong hash of the canonical
>> form of the data directory’s pathname.
>
> "Strong hash" is not a guarantee, even if you could promise that you
> could get a unique canonical path, which I doubt you can. In any case
> this fails if the DBA decides to rename the directory on the fly
> (don't
> laugh; not only are there instances of that in our archives, there are
> people opining that we need to allow it --- even with the postmaster
> still running).

Strong hash is an effective guarantee that many computing paradigms
are based upon. The collision rate is astronomically small, and can
be made astronomically smaller with longer hashes.
(For MD5 there would need to be 10^15 postmasters on a server before
a collision is likely, and they all would need to have crashed and
left backends in the database, etc. )

True, renaming is a problem that I had had not anticipated at all.
Now that you mention it, hard links might be an issue on some
machines that don't canonicalize them to a unique path, since that
isn't required by the POSIX docs. Oh, the horrible degenerate cases.
Good point though.

Perhaps there is some other unique identifying feature of a given
database. A per-database persistent UUID would fit nicely here. It
could just be the shmem key.

>
>> This also re-
>> moves any risk of other applications, or other databases’ memory
>> segments colliding with the current shared memory
>> segment, which conveniently simplifies the logic.
>
> How exactly does it remove that risk?

This is fruitless due to the renaming issue, but the hash isn't an
issue. I'm not sure that a hex string beginning with \pg_xxxxx is any
less readable than the shmem id integers that are generated ad-hoc by
the current implementation.

> I think you're wishfully-thinking
> that if you are creating an unreadable hash value then there will
> never
> be any collisions against someone else with the same touching faith
> that
> *his* unreadable hash values will never collide with anyone else's.

I'm flattered that you hold my coding abilities with such devout
conviction, but I assure you that cryptography, even in this limited
use, is based in rational thought :).
In addition, the astronomically unlikely collision isn't a risk as
the database can't be damaged. The admin would then need to clear the
lockfile, after he won the lottery twice and was stuck by lightning
in his overturned car.

> Doesn't give me a lot of comfort.
> Not that it matters, since the
> approach is broken even if this specific assumption were sustainable.

Postmasters failing to load don't give me much comfort either, and
that isn't a pipe dream.

I suppose that the renaming issue relegates this patch to situations
where the database cannot be renamed or hard linked to and started
more than once, yet require this to start up databases without
restarting and needing to control how many other databases are
consuming shmem on the same box.

Thanks for the reply,
Chris Marcellino

>
> regards, tom lane


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Chris Marcellino <cmarcellino(at)apple(dot)com>
Cc: pgsql-patches(at)postgresql(dot)org
Subject: Re: POSIX shared memory support
Date: 2007-02-27 08:56:12
Message-ID: 20070227085612.GD12721@svr2.hagander.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches

On Mon, Feb 26, 2007 at 09:00:09PM -0800, Chris Marcellino wrote:

>
> There is also a Windows version of this patch included, which can
> replace the current SysV-to-Win32 shared memory
> layer as it currently does not check for orphaned backends in the
> database. If this is used,
> src/backend/port/win32/shmem.c and its makefile reference can be
> removed.

This code is pretty close to the one I've been working on. Didn't
reliase you would be working on a win32 specific version, thought you
were doing POSIX only :-O

Anyway. My version does not use hashing of the name, it just puts the
name all the way in there. What are peoples feeling about that - win32
supports 32768 characters (though those are UTF16, so maybe the real
value is 16384 - still, if someone has his data directory that deep down
in a path, he deserves not to have it work). Should we be using hashing
or just plaintext there? I can see the argument for plaintext being that
you can then see in process explorer what data directory a backend is
connected to. But I'm open to arguments as to why a hash would be better
:-)

While looking at this, I noticed that Windows 2003 SP1 adds a parameter
SEC_LARGE_PAGES that enables "large pages to be used when mapping images
or backing from the pagefile". I would assume this is for performance
;-) Does anybody know anything more about this? Worth testing out to see
if it increases performance?

> The patches are available here (the postings were being silently
> dropped when I attached this large of a file):

Actually, I don't think it's becauseo f the size of the file. Last time
I checked we still had a configuration which had the lists silently drop
certain file extensions, including .tar, .zip and .tar.gz. I've had
several patches dropped that way. I've had no confirmation it has been
fixed, so I'm assuming it's still a problem.

//Magnus


From: Chris Marcellino <cmarcellino(at)apple(dot)com>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: pgsql-patches(at)postgresql(dot)org
Subject: Re: POSIX shared memory support
Date: 2007-02-27 09:09:46
Message-ID: 220335F2-1356-4309-ADBB-1EAD2CD2A19D@apple.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches

The Win32 version didn't materialize until very recently. The Win32
calls are similar semantically to the POSIX ones, so it was somewhat
straightforward.

Plaintext is nice if you can fit it, since Windows permits you to
have slashes and all sorts of other non-filename characters in them,
unlike POSIX shmem. Also for POSIX, certain platforms (ahem Darwin
has 30 chars) have very small segment name limits. As Tom said, the
renaming-while-running issue might be a deal breaker, but I'm not
sure that is a problem on Windows. It sounds like you are on the
right track.

I just mentioned this to Tom, but a solution for POSIX might be to
use the device and inode combination for the data directory, since
that is constant across renames and is more certain to be unique.
Like replacing the GenerateIPCName() function with something this:

static void
GenerateIPCName(char destIPCName[IPCSegNameLength])
{
struct stat statbuf;

/* Get the data directory's device and inode */
if (stat(DataDir, &statbuf) < 0)
ereport(FATAL,
(errcode_for_file_access(),
errmsg("could not stat data directory \"%s\": %m",
DataDir)));

/*
* POSIX requires that shared memory names begin with a single slash.
* They should not have any others slashes or any non-alphanumerics to
* maintain the broadest assumption of what is permitted in a filename.
* Also, case sensitivity should not be presumed.
*/
snprintf(destIPCName, IPCSegNameLength, "/PostgreSQL.%jx.%jx",
(intmax_t) statbuf.st_dev, (intmax_t) statbuf.st_ino);
}

Does Windows have a method to get a unique ID number for a given data
directory, or a token file in that directory? It would need to be
constant while the database is open. Perhaps
GetFileInformationByHandle? It returns a struct with a nFileIndex
value that seems to be that, although I'm not certain.
This might make it easier to avoid the complexity of fitting the
filename in the segment name, and avoid the rename problem,

Thanks,
Chris Marcellino

On Feb 27, 2007, at 12:56 AM, Magnus Hagander wrote:

> On Mon, Feb 26, 2007 at 09:00:09PM -0800, Chris Marcellino wrote:
>
>>
>> There is also a Windows version of this patch included, which can
>> replace the current SysV-to-Win32 shared memory
>> layer as it currently does not check for orphaned backends in the
>> database. If this is used,
>> src/backend/port/win32/shmem.c and its makefile reference can be
>> removed.
>
> This code is pretty close to the one I've been working on. Didn't
> reliase you would be working on a win32 specific version, thought you
> were doing POSIX only :-O
>
> Anyway. My version does not use hashing of the name, it just puts the
> name all the way in there. What are peoples feeling about that - win32
> supports 32768 characters (though those are UTF16, so maybe the real
> value is 16384 - still, if someone has his data directory that deep
> down
> in a path, he deserves not to have it work). Should we be using
> hashing
> or just plaintext there? I can see the argument for plaintext being
> that
> you can then see in process explorer what data directory a backend is
> connected to. But I'm open to arguments as to why a hash would be
> better
> :-)
>
>
> While looking at this, I noticed that Windows 2003 SP1 adds a
> parameter
> SEC_LARGE_PAGES that enables "large pages to be used when mapping
> images
> or backing from the pagefile". I would assume this is for performance
> ;-) Does anybody know anything more about this? Worth testing out
> to see
> if it increases performance?
>
>
>> The patches are available here (the postings were being silently
>> dropped when I attached this large of a file):
>
> Actually, I don't think it's becauseo f the size of the file. Last
> time
> I checked we still had a configuration which had the lists silently
> drop
> certain file extensions, including .tar, .zip and .tar.gz. I've had
> several patches dropped that way. I've had no confirmation it has been
> fixed, so I'm assuming it's still a problem.
>
> //Magnus


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Chris Marcellino <cmarcellino(at)apple(dot)com>
Cc: pgsql-patches(at)postgresql(dot)org
Subject: Re: POSIX shared memory support
Date: 2007-02-27 09:30:15
Message-ID: 20070227093015.GF12721@svr2.hagander.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches

On Tue, Feb 27, 2007 at 01:09:46AM -0800, Chris Marcellino wrote:
> The Win32 version didn't materialize until very recently. The Win32
> calls are similar semantically to the POSIX ones, so it was somewhat
> straightforward.
>
> Plaintext is nice if you can fit it, since Windows permits you to
> have slashes and all sorts of other non-filename characters in them,
> unlike POSIX shmem. Also for POSIX, certain platforms (ahem Darwin
> has 30 chars) have very small segment name limits. As Tom said, the
> renaming-while-running issue might be a deal breaker, but I'm not
> sure that is a problem on Windows. It sounds like you are on the
> right track.

This cannot happen on windows - you cannot rename a directory which
someone has open files in. The infamous sharing violation error.

> Does Windows have a method to get a unique ID number for a given data
> directory, or a token file in that directory? It would need to be
> constant while the database is open. Perhaps
> GetFileInformationByHandle? It returns a struct with a nFileIndex
> value that seems to be that, although I'm not certain.
> This might make it easier to avoid the complexity of fitting the
> filename in the segment name, and avoid the rename problem,

Yes, you could use the fileindex value. You need that one and the volume
serial number, total of 64+32 bits of data.

So yeah, we cuold use that instead of the full path name if we want to.
The advantage of this one is that it's shorter, the advantage of the
full path name is that you can see where the backend is from.

However, in most cases you will be able to see where the backend is from
anyway, because it is likely to have some other file open in the data
directory, so maybe that isn't such a big point after all?

//Magnus


From: Chris Marcellino <cmarcellino(at)apple(dot)com>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: pgsql-patches(at)postgresql(dot)org
Subject: Re: POSIX shared memory support
Date: 2007-02-27 09:36:26
Message-ID: 384404A0-A7B7-450B-8C22-6A36FAA30B82@apple.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches

The comment on that method vexes me:
"This value is useful only while the file is open by at least one
process. If no processes have it open, the index may change the next
time the file is opened."

I wonder how this applies to directories. I.e. is a directory open if
a file in it is open? You could use the path of an arbitrary file in
that directory (like postgresql.conf) if this wasn't the case. We
would need to be sure it was already opened by postgres somehow
(which it should be).

Chris

On Feb 27, 2007, at 1:30 AM, Magnus Hagander wrote:

> On Tue, Feb 27, 2007 at 01:09:46AM -0800, Chris Marcellino wrote:
>> The Win32 version didn't materialize until very recently. The Win32
>> calls are similar semantically to the POSIX ones, so it was somewhat
>> straightforward.
>>
>> Plaintext is nice if you can fit it, since Windows permits you to
>> have slashes and all sorts of other non-filename characters in them,
>> unlike POSIX shmem. Also for POSIX, certain platforms (ahem Darwin
>> has 30 chars) have very small segment name limits. As Tom said, the
>> renaming-while-running issue might be a deal breaker, but I'm not
>> sure that is a problem on Windows. It sounds like you are on the
>> right track.
>
> This cannot happen on windows - you cannot rename a directory which
> someone has open files in. The infamous sharing violation error.
>
>> Does Windows have a method to get a unique ID number for a given data
>> directory, or a token file in that directory? It would need to be
>> constant while the database is open. Perhaps
>> GetFileInformationByHandle? It returns a struct with a nFileIndex
>> value that seems to be that, although I'm not certain.
>> This might make it easier to avoid the complexity of fitting the
>> filename in the segment name, and avoid the rename problem,
>
> Yes, you could use the fileindex value. You need that one and the
> volume
> serial number, total of 64+32 bits of data.
>
> So yeah, we cuold use that instead of the full path name if we want
> to.
> The advantage of this one is that it's shorter, the advantage of the
> full path name is that you can see where the backend is from.
>
> However, in most cases you will be able to see where the backend is
> from
> anyway, because it is likely to have some other file open in the data
> directory, so maybe that isn't such a big point after all?
>
> //Magnus


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Chris Marcellino <cmarcellino(at)apple(dot)com>
Cc: pgsql-patches(at)postgresql(dot)org
Subject: Re: POSIX shared memory support
Date: 2007-02-27 09:40:07
Message-ID: 20070227094007.GI12721@svr2.hagander.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches

On Tue, Feb 27, 2007 at 10:30:15AM +0100, Magnus Hagander wrote:
> > Does Windows have a method to get a unique ID number for a given data
> > directory, or a token file in that directory? It would need to be
> > constant while the database is open. Perhaps
> > GetFileInformationByHandle? It returns a struct with a nFileIndex
> > value that seems to be that, although I'm not certain.
> > This might make it easier to avoid the complexity of fitting the
> > filename in the segment name, and avoid the rename problem,
>
> Yes, you could use the fileindex value. You need that one and the volume
> serial number, total of 64+32 bits of data.
>
> So yeah, we cuold use that instead of the full path name if we want to.
> The advantage of this one is that it's shorter, the advantage of the
> full path name is that you can see where the backend is from.
>
> However, in most cases you will be able to see where the backend is from
> anyway, because it is likely to have some other file open in the data
> directory, so maybe that isn't such a big point after all?

Actually, I'm not sure we can. It's only stable as long as someone has
the file open. It will change if it's closed and re-opened later.

Given that we don't actually open the directory, and only files inside
it, I don't know how that works.

//Magnus


From: Chris Marcellino <cmarcellino(at)apple(dot)com>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: pgsql-patches(at)postgresql(dot)org
Subject: Re: POSIX shared memory support
Date: 2007-02-27 10:24:46
Message-ID: F2E8E75C-100C-49FB-A04D-66A29F1BDA9D@apple.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches

I believe that all we need is the ID to be constant and unique while
the postmaster or its associated backends are running. If anything
from a given generation has the database open, it will remain
constant before any new process can connect to it successfully. Would
it be feasible to lookup the ID of an important file in the DataDir?

As far as the POSIX version goes, here is an updated patch. I changed
it to use the inode/device ID instead of the filename to avoid the
renaming cases. It also no longer needs the more clunky than
necessary hash stuff.

Attachment Content-Type Size
configure.in.patch application/octet-stream 1007 bytes
netbsd.patch application/octet-stream 345 bytes
openbsd.patch application/octet-stream 217 bytes
posix_shmem.c application/octet-stream 15.0 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Chris Marcellino <cmarcellino(at)apple(dot)com>
Cc: pgsql-patches(at)postgresql(dot)org
Subject: Re: POSIX shared memory support
Date: 2007-02-28 05:21:21
Message-ID: 21936.1172640081@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches

Chris Marcellino <cmarcellino(at)apple(dot)com> writes:
> Ignoring the case where backends are still alive in the database,

I grow weary of explaining this, but: that is EXACTLY the case you
cannot ignore.

regards, tom lane


From: Chris Marcellino <cmarcellino(at)apple(dot)com>
To: pgsql-patches(at)postgresql(dot)org
Subject: Re: POSIX shared memory support
Date: 2007-03-03 04:28:58
Message-ID: D9BA8B9B-A1CF-44E9-9F7A-B166875AEABB@apple.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches

In case you haven't had enough, here is another version of the code
to make Postgres use POSIX shared memory. Along with the issues that
have already been addressed, this version ensures that orphaned
backends are not in the database when restarting Postgres by using a
single 1 byte SysV segment to see who is attached to the segment
using shmctl/IPC_STAT/nattach.

This effectively frees Postgres from the SHMMAX and SHMALL limits.
Since this still takes one SysV segment, SHMMNI can still be reached
on most platforms if a ton of databases are opened simultaneously
(i.e. 32 on Mac OS X, 256 on Linux and Solaris).

If you have the need to ship a product with Postgres embedded in it
and are unable to change kernel settings (like myself), this might be
of use to you. I have tested all of the failure situations I could
think of by various combinations of deleting lockfiles while in use,
changing the PID inside the lockfile and trying to restart and run
more than one postmaster simultaneously.

Of course, this since this requires both POSIX and SysV shared
memory, this doesn't increase the portability of Postgres which might
make it less appropriate for mass distribution; I thought I would put
it out there for any feedback either way.

Thanks again,
Chris Marcellino

Attachment Content-Type Size
posix_shmem.c application/octet-stream 18.8 KB

From: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To: Chris Marcellino <cmarcellino(at)apple(dot)com>
Cc: pgsql-patches(at)postgresql(dot)org
Subject: Re: POSIX shared memory support
Date: 2007-03-03 17:09:47
Message-ID: 45E9ABDB.908@commandprompt.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches


> If you have the need to ship a product with Postgres embedded in it and
> are unable to change kernel settings (like myself), this might be of use
> to you. I have tested all of the failure situations I could think of by
> various combinations of deleting lockfiles while in use, changing the
> PID inside the lockfile and trying to restart and run more than one
> postmaster simultaneously.
>
> Of course, this since this requires both POSIX and SysV shared memory,
> this doesn't increase the portability of Postgres which might make it
> less appropriate for mass distribution; I thought I would put it out
> there for any feedback either way.

Well that depends, what systems don't use (or have) POSIX shared memory?
This sounds very interesting to me. Oddly enough I went to do some
digging on what various differences and I came up with:

http://www.nabble.com/POSIX-shared-memory-support-t3298386.html

Which happens to be you ;)

Sincerely,

Joshua D. Drake

>
> Thanks again,
> Chris Marcellino
>
>
>
>
> ------------------------------------------------------------------------
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster

--

=== The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive PostgreSQL solutions since 1997
http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/


From: Chris Marcellino <cmarcellino(at)apple(dot)com>
To: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
Cc: pgsql-patches(at)postgresql(dot)org
Subject: Re: POSIX shared memory support
Date: 2007-03-05 02:51:29
Message-ID: 3F2BEE0B-ED00-42A7-B978-3AC9E97A86A8@apple.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches

AFAIK, the only systems supported by Postgres that this patch won't
work on are NetBSD and OpenBSD.

The POSIX calls free the user from the SHMMAX and SHMALL limitations
of the SysV shared memory calls on platforms that support it. Since
this still takes one SysV segment, SHMMNI can still be reached on
some platforms if a ton of databases are opened simultaneously (i.e.
256 on Linux and Solaris, 100 on SCO Unix, 512 on HP-UX, 32 on Mac OS
X, unlimited on FreeBSD). This is the case without the patch anyhow.

Chris Marcellino

On Mar 3, 2007, at 9:09 AM, Joshua D. Drake wrote:

>
>> If you have the need to ship a product with Postgres embedded in
>> it and
>> are unable to change kernel settings (like myself), this might be
>> of use
>> to you. I have tested all of the failure situations I could think
>> of by
>> various combinations of deleting lockfiles while in use, changing the
>> PID inside the lockfile and trying to restart and run more than one
>> postmaster simultaneously.
>>
>> Of course, this since this requires both POSIX and SysV shared
>> memory,
>> this doesn't increase the portability of Postgres which might make it
>> less appropriate for mass distribution; I thought I would put it out
>> there for any feedback either way.
>
> Well that depends, what systems don't use (or have) POSIX shared
> memory?
> This sounds very interesting to me. Oddly enough I went to do some
> digging on what various differences and I came up with:
>
> http://www.nabble.com/POSIX-shared-memory-support-t3298386.html
>
> Which happens to be you ;)
>
> Sincerely,
>
> Joshua D. Drake
>
>
>
>>
>> Thanks again,
>> Chris Marcellino
>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> ---
>>
>>
>> ---------------------------(end of
>> broadcast)---------------------------
>> TIP 2: Don't 'kill -9' the postmaster
>
>
> --
>
> === The PostgreSQL Company: Command Prompt, Inc. ===
> Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
> Providing the most comprehensive PostgreSQL solutions since 1997
> http://www.commandprompt.com/
>
> Donate to the PostgreSQL Project: http://www.postgresql.org/about/
> donate
> PostgreSQL Replication: http://www.commandprompt.com/products/
>
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 6: explain analyze is your friend


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Chris Marcellino <cmarcellino(at)apple(dot)com>
Cc: pgsql-patches(at)postgresql(dot)org
Subject: Re: POSIX shared memory support
Date: 2007-03-25 02:07:18
Message-ID: 200703250207.l2P27Iu13311@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches


Newest version added:

Your patch has been added to the PostgreSQL unapplied patches list at:

http://momjian.postgresql.org/cgi-bin/pgpatches

It will be applied as soon as one of the PostgreSQL committers reviews
and approves it.

---------------------------------------------------------------------------

Chris Marcellino wrote:
> In case you haven't had enough, here is another version of the code
> to make Postgres use POSIX shared memory. Along with the issues that
> have already been addressed, this version ensures that orphaned
> backends are not in the database when restarting Postgres by using a
> single 1 byte SysV segment to see who is attached to the segment
> using shmctl/IPC_STAT/nattach.
>
> This effectively frees Postgres from the SHMMAX and SHMALL limits.
> Since this still takes one SysV segment, SHMMNI can still be reached
> on most platforms if a ton of databases are opened simultaneously
> (i.e. 32 on Mac OS X, 256 on Linux and Solaris).
>
> If you have the need to ship a product with Postgres embedded in it
> and are unable to change kernel settings (like myself), this might be
> of use to you. I have tested all of the failure situations I could
> think of by various combinations of deleting lockfiles while in use,
> changing the PID inside the lockfile and trying to restart and run
> more than one postmaster simultaneously.
>
> Of course, this since this requires both POSIX and SysV shared
> memory, this doesn't increase the portability of Postgres which might
> make it less appropriate for mass distribution; I thought I would put
> it out there for any feedback either way.
>
> Thanks again,
> Chris Marcellino
>

[ Attachment, skipping... ]

>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +