Re: dynamic shared memory

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: dynamic shared memory
Date: 2013-09-02 22:52:22
Message-ID: 20130902225222.GD11503@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Noah!

On 2013-09-01 12:07:04 -0400, Noah Misch wrote:
> On Sun, Sep 01, 2013 at 05:08:38PM +0200, Andres Freund wrote:
> > On 2013-09-01 09:24:00 -0400, Noah Misch wrote:
> > > The difficulty depends on whether processes other than the segment's creator
> > > will attach anytime or only as they start. Attachment at startup is enough
> > > for parallel query, but it's not enough for something like lock table
> > > expansion. I'll focus on the attach-anytime case since it's more general.
> >
> > Even on startup it might get more complicated than one immediately
> > imagines on EXEC_BACKEND type platforms because their memory layout
> > doesn't need to be the same. The more shared memory you need, the harder
> > that will be. Afair
>
> Non-Windows EXEC_BACKEND is already facing a dead end that way.

Not sure whether you mean non-windows EXEC_BACKEND isn't going to be
supported for much longer or that it already has problems.

> > > On a system supporting MAP_FIXED, implement this by having the postmaster
> > > reserve address space under a PROT_NONE mapping, then carving out from that
> > > mapping for each fixed-address dynamic segment. The size of the reservation
> > > would be controlled by a GUC; one might set it to several times anticipated
> > > peak usage. (The overhead of doing that depends on the kernel.) Windows
> > > permits the same technique with its own primitives.
> >
> > Note that allocating a large mapping, even without using it, has
> > noticeable cost, at least under linux. The kernel has to create & copy
> > data to track each pages state (without copying the memory content's
> > itself due to COW) for every fork afterwards. If you don't believe me,
> > check the whole discussion about go's (the language) memory
> > management...
>
> I believe you, but I'd appreciate a link to the discussion you have in mind.

Unfortunately I could only find the first half of the discussion about
the issue. Turns out it's not the greatest idea to name your fancy new
programming language "go" (yesyes, petpeeve of mine).

http://lkml.org/lkml/2011/2/8/118
https://lwn.net/Articles/428100/

So, after reading up on the issue a bit more and reading some more
kernel code, a large mmap(PROT_NONE, MAP_PRIVATE) won't cause much
problems except counting in ulimit -v. It will *not* cause overcommit
violations. mmap(PROT_NONE, MAP_SHARED) will tho, even if not yet
faulted. Which means that to be reliable and not violate overcommit we'd
need to munmap() a chunk of PROT_NONE, MAP_PRIVATE memory, and
immediately (without interceding mallocs, using mmap itself) map it again.

It only gets really expensive in the sense of making fork expensive if
you set protections on many regions in that mapping individually. Each
mprotect() call will split the VMA into distinct pieces and they won't
get merged even if there are neighboors with the same settings.

> > > I don't foresee fundamental differences on 32-bit. All the allocation
> > > maximums scale down, but that's the usual story for 32-bit.
> >
> > If you actually want to allocate memory after starting up, without
> > carving a section out for that from the beginning, the memory
> > fragmentation will make it very hard to find memory addresses of the
> > same across processes.
>
> True. I wouldn't feel bad if total dynamic shared memory usage above, say,
> 256 MiB were unreliable on 32-bit. If you're still running 32-bit in 2015,
> you probably have a low-memory platform.

Not sure. I think that will partially depend on whether x32 will have
any success which I still find hard to judge.

> I think the take-away is that we have a lot of knobs available, not a bright
> line between possible and impossible. Robert opted to omit provision for
> reliable fixed addresses, and the upsides of that decision are the absence of
> a DBA-unfriendly space-reservation GUC, trivial overhead when the APIs are not
> used, and a clearer portability outlook.

I guess my point is that if we want to develop stuff that requires
reliable addresses, we should build support for that from a low level
up. Not rely on a hack^Wlayer ontop of the actual dynamic shared memory
API.
That is, it should be a flag to dsm_create() that we require a fixed
address and dsm_attach() will then automatically use that or die
trying. Requiring implementations to take care about passing addresses
around and fiddling with mmap/windows api to make sure those mappings
are possible doesn't strike me to be a good idea.

In the end, you're going to be the primary/first user as far as I
understand things, so you'll have to argue whether we need fixed
addresses or not. I don't think it's a good idea to forgo this decision
on this layer and bolt on another ontop if we decide it's neccessary.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Nolan 2013-09-02 23:28:11 9.3 RC1 psql encoding reporting inconsistently?
Previous Message Jeff Davis 2013-09-02 22:42:57 Re: ENABLE/DISABLE CONSTRAINT NAME