Quick Links

dynamically allocating chunks from shared memory

Lists:	pgsql-hackers

From:	Markus Wanner <markus(at)bluegap(dot)ch>
To:	PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	dynamically allocating chunks from shared memory
Date:	2010-07-02 23:44:46
Message-ID:	ab0cd52a64e788f4ecb4515d1e6e4691@localhost
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

for quite some time, I've been under the impression, that there's still
one disadvantage left from using processes instead of threads: we can
only use statically sized chunks of shared memory. Every component that
wants to use shared memory needs to pre-allocate whatever it thinks is
sufficient. It cannot enlarge its share, nor can unused memory be
allocated to other components.

Having written a very primitive kind of a dynamic memory allocator for
imessages [1], I've always wanted a better alternative. So I've
investigated a bit, refactored step-by-step, and finally came up with
the attached, lock based dynamic shared memory allocator. Its interface
is as simple as malloc() and free(). A restart of the postmaster should
truncate the whole area.

Being a component which needs to pre-allocate its area in shared memory
in advance, you need to define a maximum size for the pool of
dynamically allocatable memory. That's currently defined in shmem.h
instead of a GUC.

This kind of feature has been requested at the Tokyo Clusting Meeting
(by myself) in 2009 and is listed on the Wiki [2].

I'm now using that allocator as the basis for a reworked imessages
patch, which I've attached as well. Both are tested as a basis for
Postgres-R.

While I think other components could use this dynamic memory allocator,
too, I didn't write any code for that. Imessages currently is the only
user available. (So please apply the dynshmem patch first, then
imessages).

Comments?

Greetings from Oxford, and thanks to Joachim Wieland for providing me
the required Internet connectivity ;-)

Markus Wanner

[1]: Postgres-R: internal messages
http://archives.postgresql.org/message-id/4886DB0B.1090508@bluegap.ch

[2]: Mentioned Cluster Feature
http://wiki.postgresql.org/wiki/ClusterFeatures#Dynamic_shared_memory_allocation

For git adicts: here's a git repository with both patches applied:
http://git.postgres-r.org/?p=imessages;a=summary

Attachment	Content-Type	Size
dynshmem.20100703.diff	text/x-diff	21.8 KB
imessages.20100703.diff	text/x-diff	13.0 KB

From:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To:	Markus Wanner <markus(at)bluegap(dot)ch>
Cc:	PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: dynamically allocating chunks from shared memory
Date:	2010-07-20 17:50:20
Message-ID:	1279647881-sup-2390@alvh.no-ip.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Excerpts from Markus Wanner's message of vie jul 02 19:44:46 -0400 2010:

> Having written a very primitive kind of a dynamic memory allocator for
> imessages [1], I've always wanted a better alternative. So I've
> investigated a bit, refactored step-by-step, and finally came up with
> the attached, lock based dynamic shared memory allocator. Its interface
> is as simple as malloc() and free(). A restart of the postmaster should
> truncate the whole area.

Interesting, thanks.

I gave it a skim and found that it badly needs a lot more code comments.

I'm also unconvinced that spinlocks are the best locking primitive here.
Why not lwlocks?

> Being a component which needs to pre-allocate its area in shared memory
> in advance, you need to define a maximum size for the pool of
> dynamically allocatable memory. That's currently defined in shmem.h
> instead of a GUC.

This should be an easy change; I agree that it needs to be configurable.

I'm not sure what kind of resistance you'll see to the idea of a
dynamically allocatable shmem area. Maybe we could use this in other
areas such as allocating space for heavyweight lock objects. Right now
the memory usage for them could grow due to a transitory increase in
lock traffic, leading to out-of-memory conditions later in other
modules. We've seen reports of that problem, so it'd be nice to be able
to fix that with this infrastructure.

I didn't look at the imessages patch (except to notice that I didn't
very much like the handling of out-of-memory, but you already knew that).

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc:	Markus Wanner <markus(at)bluegap(dot)ch>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: dynamically allocating chunks from shared memory
Date:	2010-07-20 18:23:11
Message-ID:	AANLkTimbtRx2VBwJRk-5fVCijHj8yW1KLR=zWiF=GB7R@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Jul 20, 2010 at 1:50 PM, Alvaro Herrera
<alvherre(at)commandprompt(dot)com> wrote:
> I'm not sure what kind of resistance you'll see to the idea of a
> dynamically allocatable shmem area. Maybe we could use this in other
> areas such as allocating space for heavyweight lock objects. Right now
> the memory usage for them could grow due to a transitory increase in
> lock traffic, leading to out-of-memory conditions later in other
> modules. We've seen reports of that problem, so it'd be nice to be able
> to fix that with this infrastructure.

Well, you can't really fix that problem with this infrastructure,
because this infrastructure only allows shared memory to be
dynamically allocated from a pool set aside for such allocations in
advance. If a surge in demand can exhaust all the heavyweight lock
space in the system, it can also exhaust the shared pool from which
more heavyweight lock space can be allocated. The failure might
manifest itself in a totally different subsystem though, since the
allocation that failed wouldn't necessarily be a heavyweight lock
allocation, but some other allocation that failed as a result of space
used by the heavyweight locks.

It would be more interesting if you could expand (or contract) the
size of shared memory as a whole while the system is up and running.
Then, perhaps, max_locks_per_transaction and other, similar GUCs could
be made PGC_SIGHUP, which would give you a way out of such situations
that didn't involve taking down the entire cluster. I'm not too sure
how to do that, though.

With respect to imessages specifically, what is the motivation for
using shared memory rather than something like an SLRU? The new
LISTEN implementation uses an SLRU and handles variable-size messages,
so it seems like it might be well-suited to this task.

Incidentally, the link for the imessages patch on the CommitFest page
points to http://archives.postgresql.org/message-id/ab0cd52a64e788f4ecb4515d1e6e4691@localhost
- which is the dynamic shmem patch. So I'm not sure where to find the
latest imessages patch.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

From:	Markus Wanner <markus(at)bluegap(dot)ch>
To:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc:	PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: dynamically allocating chunks from shared memory
Date:	2010-07-20 18:36:55
Message-ID:	4C45ECC7.7080802@bluegap.ch
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hello Alvaro,

thank you for looking through this code.

On 07/20/2010 07:50 PM, Alvaro Herrera wrote:
> Interesting, thanks.
>
> I gave it a skim and found that it badly needs a lot more code comments.

Hm.. yeah, the dynshmem stuff could probably need more comments. (The
bgworker stuff is probably a better example).

> I'm also unconvinced that spinlocks are the best locking primitive here.
> Why not lwlocks?

It's derived from a completely lock-free algorithm, as proposed by Maged
M. Michael in: Scalable Lock-Free Dynamic Memory Allocator. I dropped
all of the CAS primitives with their retry loop around and did further
simplifications. Spinlocks simply looked like the simplest thing to
fall-back to. But yeah, splitting into read and write accesses and using
lwlocks might be a win. Or it might not. I honestly don't know. And it's
probably not the best performing allocator ever. But it's certainly
better than nothing.

I did recently release the lock-free variant as well as a lock based
one, see http://www.bluegap.ch/projects/wamalloc/ for more information.

> I'm not sure what kind of resistance you'll see to the idea of a
> dynamically allocatable shmem area.

So far neither resistance nor applause. I'd love to hear more of an
echo. Even if it's resistance.

> Maybe we could use this in other
> areas

..which is why I've published this separately from Postgres-R.

> such as allocating space for heavyweight lock objects. Right now
> the memory usage for them could grow due to a transitory increase in
> lock traffic, leading to out-of-memory conditions later in other
> modules. We've seen reports of that problem, so it'd be nice to be able
> to fix that with this infrastructure.

Maybe, yes. Sounds like a nice idea.

> I didn't look at the imessages patch (except to notice that I didn't
> very much like the handling of out-of-memory, but you already knew that).

As all of the allocation problem has now been ripped out, the imessages
patch got quite a bit smaller. imsg.c now consists of only around 370
lines of code.

The handling of out-of-(shared)-memory situation could certainly be
improved, yes. Note that I've already separated out a
IMessageCreateInternal() method, which simply returns NULL in that case.
Is that the API you'd prefer?

Getting back to the dynshmem stuff: I don't mind much *which* allocator
to use. I also looked at jemalloc, but haven't been able to integrate it
into Postgres. So I've extended my experiment with wamalloc and turned
it into something usable for Postgres.

Regards

Markus Wanner

From:	Markus Wanner <markus(at)bluegap(dot)ch>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: dynamically allocating chunks from shared memory
Date:	2010-07-20 18:54:42
Message-ID:	4C45F0F2.7090109@bluegap.ch
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

On 07/20/2010 08:23 PM, Robert Haas wrote:
> Well, you can't really fix that problem with this infrastructure,

No, but it would allow you to better use the existing amount of shared
memory. Possibly avoiding the problem is certain scenarios.

> The failure might
> manifest itself in a totally different subsystem though, since the
> allocation that failed wouldn't necessarily be a heavyweight lock
> allocation, but some other allocation that failed as a result of space
> used by the heavyweight locks.

Yeah, that's a valid concern. Maybe it could be addressed by keeping
track of usage of dynshmem per module, and somehow inform the user about
the usage pattern in case of OOM.

> It would be more interesting

Sure, but then you'd definitely need a dynamic allocator, no?

> With respect to imessages specifically, what is the motivation for
> using shared memory rather than something like an SLRU? The new
> LISTEN implementation uses an SLRU and handles variable-size messages,
> so it seems like it might be well-suited to this task.

Well, imessages predates the new LISTEN implementation by some moons.
They are intended to replace (unix-ish) pipes between processes. I fail
to see the immediate link between (S)LRU and inter-process message
passing. It might be more useful for multiple LISTENers, but I bet it
has slightly different semantics than imessages.

But to be honest, I don't know too much about the new LISTEN
implementation. Do you think a loss-less
(single)-process-to-(single)-process message passing system could be
built on top of it?

> Incidentally, the link for the imessages patch on the CommitFest page
> points to http://archives.postgresql.org/message-id/ab0cd52a64e788f4ecb4515d1e6e4691@localhost
> - which is the dynamic shmem patch. So I'm not sure where to find the
> latest imessages patch.

The archive doesn't display attachments very well. But the imessages
patch is part of that mail. Maybe you still find it in your local mailbox?

In the archive view, it starts at the line that says:
*** src/backend/storage/ipc/imsg.c dc149eef487eafb43409a78b8a33c70e7d3c2bfa

(and, well, the dynshmem stuff ends just before that line. Those were
two .diff files attached, IIRC).

Regards

Markus Wanner

From:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To:	Markus Wanner <markus(at)bluegap(dot)ch>
Cc:	PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: dynamically allocating chunks from shared memory
Date:	2010-07-20 19:05:11
Message-ID:	1279652545-sup-5894@alvh.no-ip.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Excerpts from Markus Wanner's message of mar jul 20 14:36:55 -0400 2010:

> > I'm also unconvinced that spinlocks are the best locking primitive here.
> > Why not lwlocks?
>
> It's derived from a completely lock-free algorithm, as proposed by Maged
> M. Michael in: Scalable Lock-Free Dynamic Memory Allocator.

Hmm, deriving code from a paper published by IBM sounds like bad news --
who knows what patents they hold on the techniques there?

From:	Markus Wanner <markus(at)bluegap(dot)ch>
To:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc:	PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: dynamically allocating chunks from shared memory
Date:	2010-07-20 19:14:23
Message-ID:	4C45F58F.8070105@bluegap.ch
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

On 07/20/2010 09:05 PM, Alvaro Herrera wrote:
> Hmm, deriving code from a paper published by IBM sounds like bad news --
> who knows what patents they hold on the techniques there?

Yeah, that might be an issue. Note, however, that the lock-based variant
differs substantially from what's been published. And I sort of doubt
their patents covers a lot of stuff that's not lock-free-ish.

But again, I'd also very much welcome any other allocator. In my
opinion, it's the most annoying drawback of the process-based design
compared to a threaded variant (from the perspective of a developer).

Regards

Markus Wanner

From:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To:	Markus Wanner <markus(at)bluegap(dot)ch>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: dynamically allocating chunks from shared memory
Date:	2010-07-20 21:46:53
Message-ID:	1279661901-sup-5184@alvh.no-ip.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Excerpts from Markus Wanner's message of mar jul 20 14:54:42 -0400 2010:

> > With respect to imessages specifically, what is the motivation for
> > using shared memory rather than something like an SLRU? The new
> > LISTEN implementation uses an SLRU and handles variable-size messages,
> > so it seems like it might be well-suited to this task.
>
> Well, imessages predates the new LISTEN implementation by some moons.
> They are intended to replace (unix-ish) pipes between processes. I fail
> to see the immediate link between (S)LRU and inter-process message
> passing. It might be more useful for multiple LISTENers, but I bet it
> has slightly different semantics than imessages.

I guess what Robert is saying is that you don't need shmem to pass
messages around. The new LISTEN implementation was just an example.
imessages aren't supposed to use it directly. Rather, the idea is to
store the messages in a new SLRU area. Thus you don't need to mess with
dynamically allocating shmem at all.

> But to be honest, I don't know too much about the new LISTEN
> implementation. Do you think a loss-less
> (single)-process-to-(single)-process message passing system could be
> built on top of it?

I don't think you should build on top of LISTEN but of slru.c. This is
probably more similar to multixact (see multixact.c) than to the new
LISTEN implementation.

I think it should be rather straightforward. There would be a unique
append-point; each process desiring to send a new message to another
backend would add a new message at that point. There would be one read
pointer per backend, and it would be advanced as messages are consumed.
Old segments could be trimmed as backends advance their read pointer,
similar to how sinval queue is handled.

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc:	Markus Wanner <markus(at)bluegap(dot)ch>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: dynamically allocating chunks from shared memory
Date:	2010-07-20 23:52:42
Message-ID:	AANLkTikaBL05C1RSe3Ggn7+fnOyrf=3Ba4LtEPj06iL_@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Jul 20, 2010 at 5:46 PM, Alvaro Herrera
<alvherre(at)commandprompt(dot)com> wrote:
> Excerpts from Markus Wanner's message of mar jul 20 14:54:42 -0400 2010:
>
>> > With respect to imessages specifically, what is the motivation for
>> > using shared memory rather than something like an SLRU? The new
>> > LISTEN implementation uses an SLRU and handles variable-size messages,
>> > so it seems like it might be well-suited to this task.
>>
>> Well, imessages predates the new LISTEN implementation by some moons.
>> They are intended to replace (unix-ish) pipes between processes. I fail
>> to see the immediate link between (S)LRU and inter-process message
>> passing. It might be more useful for multiple LISTENers, but I bet it
>> has slightly different semantics than imessages.
>
> I guess what Robert is saying is that you don't need shmem to pass
> messages around. The new LISTEN implementation was just an example.
> imessages aren't supposed to use it directly. Rather, the idea is to
> store the messages in a new SLRU area. Thus you don't need to mess with
> dynamically allocating shmem at all.

Right. I might be full of bull, but that's what I'm saying. :-)

>> But to be honest, I don't know too much about the new LISTEN
>> implementation. Do you think a loss-less
>> (single)-process-to-(single)-process message passing system could be
>> built on top of it?
>
> I don't think you should build on top of LISTEN but of slru.c. This is
> probably more similar to multixact (see multixact.c) than to the new
> LISTEN implementation.
>
> I think it should be rather straightforward. There would be a unique
> append-point; each process desiring to send a new message to another
> backend would add a new message at that point. There would be one read
> pointer per backend, and it would be advanced as messages are consumed.
> Old segments could be trimmed as backends advance their read pointer,
> similar to how sinval queue is handled.

If the messages are mostly unicast, it might be nice if to contrive a
method whereby backends didn't need to explicitly advance over
messages destined only for other backends. Like maybe allocate a
small, fixed amount of shared memory sufficient for two "pointers"
into the SLRU area per backend, and then use the SLRU to store each
message with a header indicating where the next message is to be
found. For each backend, you store one pointer to the first queued
message and one pointer to the last queued message. New messages can
be added by making the current last message point to a newly added
message and updating the last message pointer for that backend. You'd
need to think about the locking and reference counting carefully to
make sure you eventually freed up unused pages, but it seems like it
might be doable. Of course, if the messages are mostly multi/anycast,
or if the rate of messaging is low enough that the aforementioned
complexity is not worth bothering with, then, what you said.

One big advantage of attacking the problem with an SLRU is that
there's no fixed upper limit on the amount of data that can be
enqueued at any given time. You can spill to disk or whatever as
needed (although hopefully you won't normally do so, for performance
reasons).

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

From:	Markus Wanner <markus(at)bluegap(dot)ch>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: dynamically allocating chunks from shared memory
Date:	2010-07-21 08:33:51
Message-ID:	4C46B0EF.60303@bluegap.ch
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 07/21/2010 01:52 AM, Robert Haas wrote:
> On Tue, Jul 20, 2010 at 5:46 PM, Alvaro Herrera
> <alvherre(at)commandprompt(dot)com> wrote:
>> I guess what Robert is saying is that you don't need shmem to pass
>> messages around. The new LISTEN implementation was just an example.
>> imessages aren't supposed to use it directly. Rather, the idea is to
>> store the messages in a new SLRU area. Thus you don't need to mess with
>> dynamically allocating shmem at all.

Okay, so I just need to grok the SLRU stuff. Thanks for clarifying.

Note that I sort of /want/ to mess with shared memory. It's what I know
how to deal with. It's how threaded programs work as well. Ya know,
locks, conditional variables, mutexes, all those nice thing that allow
you to shoot your foot so terribly nicely... Oh, well...

>> I think it should be rather straightforward. There would be a unique
>> append-point;

Unique append-point? Sounds like what I had before. That'd be a step
backwards, compared to the per-backend queue and an allocator that
hopefully scales well with the amount of CPU cores.

>> each process desiring to send a new message to another
>> backend would add a new message at that point. There would be one read
>> pointer per backend, and it would be advanced as messages are consumed.
>> Old segments could be trimmed as backends advance their read pointer,
>> similar to how sinval queue is handled.

That leads to pretty nasty fragmentation. A dynamic allocator should do
much better in that regard. (Wamalloc certainly does).

> If the messages are mostly unicast, it might be nice if to contrive a
> method whereby backends didn't need to explicitly advance over
> messages destined only for other backends. Like maybe allocate a
> small, fixed amount of shared memory sufficient for two "pointers"
> into the SLRU area per backend, and then use the SLRU to store each
> message with a header indicating where the next message is to be
> found.

That's pretty much how imessages currently work. A single list of
messages queued per backend.

> For each backend, you store one pointer to the first queued
> message and one pointer to the last queued message. New messages can
> be added by making the current last message point to a newly added
> message and updating the last message pointer for that backend. You'd
> need to think about the locking and reference counting carefully to
> make sure you eventually freed up unused pages, but it seems like it
> might be doable.

I've just read through slru.c, but still don't have a clue how it could
replace a dynamic allocator.

At the moment, the creator of an imessage allocs memory, copies the
payload there and then activates the message by appending it to the
recipient's queue. Upon getting signaled, the recipient consumes the
message by removing it from the queue and is obliged to release the
memory the messages occupies after having processed it. Simple and
straight forward, IMO.

The queue addition and removal is clear. But how would I do the
alloc/free part with SLRU? Its blocks are fixed size (BLCKSZ) and the
API with ReadPage and WritePage is rather unlike a pair of alloc() and
free().

> One big advantage of attacking the problem with an SLRU is that
> there's no fixed upper limit on the amount of data that can be
> enqueued at any given time. You can spill to disk or whatever as
> needed (although hopefully you won't normally do so, for performance
> reasons).

Yes, imessages shouldn't ever be spilled to disk. There naturally must
be an upper limit for them. (Be it total available memory, as for
threaded things or a given and size-constrained pool, as is the case for
dynshmem).

To me it rather sounds like SLRU is a candidate for using dynamically
allocated shared memory underneath, instead of allocating a fixed amount
of slots in advance. That would allow more efficient use of shared
memory. (Given SLRU's ability to spill to disk, it could even be used to
'balance' out anomalies to some extent).

Regards

Markus Wanner

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Markus Wanner <markus(at)bluegap(dot)ch>
Cc:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: dynamically allocating chunks from shared memory
Date:	2010-07-21 17:25:05
Message-ID:	AANLkTikwOiNWPGt+hAAQqWXOm52GwYHNmJb=s=K2pqun@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Jul 21, 2010 at 4:33 AM, Markus Wanner <markus(at)bluegap(dot)ch> wrote:
> Okay, so I just need to grok the SLRU stuff. Thanks for clarifying.
>
> Note that I sort of /want/ to mess with shared memory. It's what I know how
> to deal with. It's how threaded programs work as well. Ya know, locks,
> conditional variables, mutexes, all those nice thing that allow you to shoot
> your foot so terribly nicely... Oh, well...

For what it's worth, I feel your pain. I think the SLRU method is
*probably* better, but I feel your pain anyway.

>> For each backend, you store one pointer to the first queued
>> message and one pointer to the last queued message. New messages can
>> be added by making the current last message point to a newly added
>> message and updating the last message pointer for that backend. You'd
>> need to think about the locking and reference counting carefully to
>> make sure you eventually freed up unused pages, but it seems like it
>> might be doable.
>
> I've just read through slru.c, but still don't have a clue how it could
> replace a dynamic allocator.
>
> At the moment, the creator of an imessage allocs memory, copies the payload
> there and then activates the message by appending it to the recipient's
> queue. Upon getting signaled, the recipient consumes the message by removing
> it from the queue and is obliged to release the memory the messages occupies
> after having processed it. Simple and straight forward, IMO.
>
> The queue addition and removal is clear. But how would I do the alloc/free
> part with SLRU? Its blocks are fixed size (BLCKSZ) and the API with ReadPage
> and WritePage is rather unlike a pair of alloc() and free().

Given what you're trying to do, it does sound like you're going to
need some kind of an algorithm for space management; but you'll be
managing space within the SLRU rather than within shared_buffers. For
example, you might end up putting a header on each SLRU page or
segment and using that to track the available freespace within that
segment for messages to be read and written. It'll probably be a bit
more complex than the one for listen (see asyncQueueAddEntries).

>> One big advantage of attacking the problem with an SLRU is that
>> there's no fixed upper limit on the amount of data that can be
>> enqueued at any given time. You can spill to disk or whatever as
>> needed (although hopefully you won't normally do so, for performance
>> reasons).
>
> Yes, imessages shouldn't ever be spilled to disk. There naturally must be an
> upper limit for them. (Be it total available memory, as for threaded things
> or a given and size-constrained pool, as is the case for dynshmem).

I guess experience has taught me to be wary of things that are wired
in memory. Under extreme memory pressure, something's got to give, or
the whole system will croak. Consider also the contrary situation,
where the imessages stuff is not in use (even for a short period of
time, like a few minutes). Then we'd really rather not still have
memory carved out for it.

> To me it rather sounds like SLRU is a candidate for using dynamically
> allocated shared memory underneath, instead of allocating a fixed amount of
> slots in advance. That would allow more efficient use of shared memory.
> (Given SLRU's ability to spill to disk, it could even be used to 'balance'
> out anomalies to some extent).

I think what would be even better is to merge the SLRU pools with the
shared_buffer pool, so that the two can duke it out for who is in most
need of the limited amount of memory available.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

From:	Markus Wanner <markus(at)bluegap(dot)ch>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: dynamically allocating chunks from shared memory
Date:	2010-07-21 18:53:35
Message-ID:	4C47422F.9070607@bluegap.ch
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

first of all, thanks for your feedback, I enjoy the discussion.

On 07/21/2010 07:25 PM, Robert Haas wrote:
> Given what you're trying to do, it does sound like you're going to
> need some kind of an algorithm for space management; but you'll be
> managing space within the SLRU rather than within shared_buffers. For
> example, you might end up putting a header on each SLRU page or
> segment and using that to track the available freespace within that
> segment for messages to be read and written. It'll probably be a bit
> more complex than the one for listen (see asyncQueueAddEntries).

But what would that buy us? Also consider that pretty much all available
dynamic allocators use shared memory (either from the OS directly, or
via mmap()'d area).

>> Yes, imessages shouldn't ever be spilled to disk. There naturally must be an
>> upper limit for them. (Be it total available memory, as for threaded things
>> or a given and size-constrained pool, as is the case for dynshmem).
>
> I guess experience has taught me to be wary of things that are wired
> in memory. Under extreme memory pressure, something's got to give, or
> the whole system will croak.

I absolutely agree to that last sentence. However, experience has taught
/me/ to be wary of things that needlessly swap to disk for hours before
reporting any kind of error (AKA swap hell). I prefer systems that
adjust to the OOM condition, instead of just ignoring it and falling
back to disk (which isn't doesn't provide infinite space, so that's just
pushing the limits).

The solution for imessages certainly isn't spilling to disk, which would
consume even more resources. Instead the process(es) for which there are
pending imessages should be allowed to consume them.

That's why upon OOM, IMessageCreate currently simply blocks the process
that wants to create an imessages. And yes, that's not quite perfect
(that process should still consume messages for itself), and it might
not play well with other potential users of dynamically allocated
memory. But it certainly works better than spilling to disk (and yes, I
tested that behavior within Postgres-R).

> Consider also the contrary situation,
> where the imessages stuff is not in use (even for a short period of
> time, like a few minutes). Then we'd really rather not still have
> memory carved out for it.

Huh? That's exactly what dynamic allocation could give you: not having
memory carved out for stuff you currently don't need, but instead being
able to dynamically use memory where most needed. SLRU has memory (not
disk space) carved out for pretty much every sub-system separately, if
I'm reading that code correctly.

> I think what would be even better is to merge the SLRU pools with the
> shared_buffer pool, so that the two can duke it out for who is in most
> need of the limited amount of memory available.

..well, just add the shared_buffer pool to the list of candidates that
could use dynamically allocated shared memory. It would need some
thinking about boundaries (i.e. when to spill to disk, for those modules
that /want/ to spill to disk) and dealing with OOM situations, but
that's about it.

Regards

Markus

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Markus Wanner <markus(at)bluegap(dot)ch>
Cc:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: dynamically allocating chunks from shared memory
Date:	2010-07-21 22:11:38
Message-ID:	AANLkTim7crg+wUB7FEXvzbbkGHmtAyeG2v60_M1Ndfj5@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Jul 21, 2010 at 2:53 PM, Markus Wanner <markus(at)bluegap(dot)ch> wrote:
>> Consider also the contrary situation,
>> where the imessages stuff is not in use (even for a short period of
>> time, like a few minutes). Then we'd really rather not still have
>> memory carved out for it.
>
> Huh? That's exactly what dynamic allocation could give you: not having
> memory carved out for stuff you currently don't need, but instead being able
> to dynamically use memory where most needed. SLRU has memory (not disk
> space) carved out for pretty much every sub-system separately, if I'm
> reading that code correctly.

Yeah, I think you are right. :-(

>> I think what would be even better is to merge the SLRU pools with the
>> shared_buffer pool, so that the two can duke it out for who is in most
>> need of the limited amount of memory available.
>
> ..well, just add the shared_buffer pool to the list of candidates that could
> use dynamically allocated shared memory. It would need some thinking about
> boundaries (i.e. when to spill to disk, for those modules that /want/ to
> spill to disk) and dealing with OOM situations, but that's about it.

I'm not sure why merging the SLRU pools with shared_buffers would
benefit from dynamically allocated shared memory.

I might be at (or possibly beyond) the limit of my ability to comment
intelligently on this without looking more at what you want to use
these imessages for, but I'm still pretty skeptical about the idea of
storing them directly in shared memory. It's possible, though, that I
am all wet.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

From:	Markus Wanner <markus(at)bluegap(dot)ch>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: dynamically allocating chunks from shared memory
Date:	2010-07-22 07:01:32
Message-ID:	4C47ECCC.3040602@bluegap.ch
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

On 07/22/2010 12:11 AM, Robert Haas wrote:
> I'm not sure why merging the SLRU pools with shared_buffers would
> benefit from dynamically allocated shared memory.

Well, I'm not sure how you'd merge SLRU pools with shared_buffers. IMO
that inherently leads to the problem of allocating memory dynamically.

With such an allocator, I'd say you just port one module after another
to use that, instead of pre-allocated, fixed portions of shared memory.

> I might be at (or possibly beyond) the limit of my ability to comment
> intelligently on this without looking more at what you want to use
> these imessages for, but I'm still pretty skeptical about the idea of
> storing them directly in shared memory. It's possible, though, that I
> am all wet.

Imessages are meant to be a replacement for unix pipes. (To my
knowledge, those don't spill to disk either, but are blocking as soon as
Linux considers the pipe to be 'full'. Whenever that is. Or am I wrong
here?)

The reasons for replacing them were: they consume lots of file
descriptors, they can only be established between the parent and its
child process (at least for anonymous pipes that's the case) and last
but not least, I got told they still aren't fully portable. Another nice
thing about imessages compared to unix pipes is, that it's a zero-copy
approach.

Hope that makes my opinions and decisions clearer. Thank you for sharing
your concerns and for explaining SLRU to me.

Regards

Markus Wanner

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Markus Wanner <markus(at)bluegap(dot)ch>
Cc:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: dynamically allocating chunks from shared memory
Date:	2010-07-22 11:04:32
Message-ID:	AANLkTin1FQwzEVbYyQ0fedVjY8LBvC1rv6YcpwCwiY3d@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Jul 22, 2010 at 3:01 AM, Markus Wanner <markus(at)bluegap(dot)ch> wrote:
> On 07/22/2010 12:11 AM, Robert Haas wrote:
>>
>> I'm not sure why merging the SLRU pools with shared_buffers would
>> benefit from dynamically allocated shared memory.
>
> Well, I'm not sure how you'd merge SLRU pools with shared_buffers. IMO that
> inherently leads to the problem of allocating memory dynamically.
>
> With such an allocator, I'd say you just port one module after another to
> use that, instead of pre-allocated, fixed portions of shared memory.

Well, shared_buffers has to be allocated as one contiguous slab
because we index into it that way. So I don't really see how
dynamically allocating memory could help. What you'd need is a
different system for assigning buffer tags, so that a particular tag
could refer to a buffer with either kind of contents.

>> I might be at (or possibly beyond) the limit of my ability to comment
>> intelligently on this without looking more at what you want to use
>> these imessages for, but I'm still pretty skeptical about the idea of
>> storing them directly in shared memory. It's possible, though, that I
>> am all wet.
>
> Imessages are meant to be a replacement for unix pipes. (To my knowledge,
> those don't spill to disk either, but are blocking as soon as Linux
> considers the pipe to be 'full'. Whenever that is. Or am I wrong here?)

I think you're right about that.

> The reasons for replacing them were: they consume lots of file descriptors,
> they can only be established between the parent and its child process (at
> least for anonymous pipes that's the case) and last but not least, I got
> told they still aren't fully portable. Another nice thing about imessages
> compared to unix pipes is, that it's a zero-copy approach.

That's sort of approaching the question from the opposite end from
what I was concerned about - I was wondering why you need a unicast
message-passing system.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

From:	Markus Wanner <markus(at)bluegap(dot)ch>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: dynamically allocating chunks from shared memory
Date:	2010-07-22 12:49:29
Message-ID:	4C483E59.9070806@bluegap.ch
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

On 07/22/2010 01:04 PM, Robert Haas wrote:
> Well, shared_buffers has to be allocated as one contiguous slab
> because we index into it that way. So I don't really see how
> dynamically allocating memory could help. What you'd need is a
> different system for assigning buffer tags, so that a particular tag
> could refer to a buffer with either kind of contents.

Hm.. okay, then it might not be that easy. Thanks for pointing that out.

> That's sort of approaching the question from the opposite end from
> what I was concerned about - I was wondering why you need a unicast
> message-passing system.

Well, the initial Postgres-R approach, being based on Postgres
6.4.something used unix pipes. I coded imessages as a replacement.

Postgres-R basically uses imessages to pass around change sets and other
information required to keep replicas in sync. The thinking in terms of
message passing seems to originate from the GCS, which in itself is a
message passing system (with some nice extras and varying delivery
guarantees).

In Postgres-R the coordinator process receives messages from the GCS,
does some minor controlling and book-keeping, but basically passes on
the data via imessages to a backrgound worker.

Of course, as mentioned in the bgworker patch, this could be done
differently. Using solely shared memory, or maybe SLRU to store change
sets. However, I certainly like the abstraction and guarantees such a
message passing system provides. It makes things easier to reason about,
IMO.

For another example, see the bgworker patches, steps 1 and 2, where I've
changed the current autovacuum infrastructure to use imessages (between
launcher and worker).

[ And I've heard saying that current multi-core CPU designs tend to like
message passing systems. Not sure how much that applies to imessages
and/or how it's used in bgworkers or Postgres-R, though. ]

That much about why using a unicast message-passing system.

Regards

Markus Wanner

From:	Greg Smith <greg(at)2ndquadrant(dot)com>
To:	Markus Wanner <markus(at)bluegap(dot)ch>
Cc:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: dynamically allocating chunks from shared memory
Date:	2010-07-22 13:59:08
Message-ID:	4C484EAC.10302@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Markus Wanner wrote:
> On 07/20/2010 09:05 PM, Alvaro Herrera wrote:
>> Hmm, deriving code from a paper published by IBM sounds like bad news --
>> who knows what patents they hold on the techniques there?
>
> Yeah, that might be an issue. Note, however, that the lock-based
> variant differs substantially from what's been published. And I sort
> of doubt their patents covers a lot of stuff that's not lock-free-ish.

There's a fairly good mapping of what techniques are patented and which
were only mentioned in research in the Sun dynamic memory patent at
http://www.freepatentsonline.com/7328316.html ; that mentions an earlier
paper by the author of the technique Markus is using, but this was from
before that one was written. It looks like Sun has a large portion of
the patent portfolio in this area, which is particularly troublesome now.

--
Greg Smith 2ndQuadrant US Baltimore, MD
PostgreSQL Training, Services and Support
greg(at)2ndQuadrant(dot)com www.2ndQuadrant.us

From:	Markus Wanner <markus(at)bluegap(dot)ch>
To:	Greg Smith <greg(at)2ndquadrant(dot)com>
Cc:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: dynamically allocating chunks from shared memory
Date:	2010-07-22 15:01:47
Message-ID:	4C485D5B.2020405@bluegap.ch
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Greg,

On 07/22/2010 03:59 PM, Greg Smith wrote:
> There's a fairly good mapping of what techniques are patented and which
> were only mentioned in research in the Sun dynamic memory patent at
> http://www.freepatentsonline.com/7328316.html ; that mentions an earlier
> paper by the author of the technique Markus is using, but this was from
> before that one was written. It looks like Sun has a large portion of
> the patent portfolio in this area, which is particularly troublesome now.

Thanks for the pointer, very helpful.

Anybody ever checked jemalloc, or any other OSS allocator out there
against these patents?

Remembering similar patent-discussions, it might be better to not bother
too much and just go with something widely used, based on the assumption
that such a thing is going to enjoy broad support in case of an attack
from a patent troll.

What do you think? What'd be your favorite allocator?

Regards

Markus Wanner

From:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To:	Markus Wanner <markus(at)bluegap(dot)ch>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: dynamically allocating chunks from shared memory
Date:	2010-07-22 18:31:41
Message-ID:	1279823238-sup-6350@alvh.no-ip.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Excerpts from Markus Wanner's message of jue jul 22 08:49:29 -0400 2010:

> Of course, as mentioned in the bgworker patch, this could be done
> differently. Using solely shared memory, or maybe SLRU to store change
> sets. However, I certainly like the abstraction and guarantees such a
> message passing system provides. It makes things easier to reason about,
> IMO.

FWIW I don't think you should be thinking in "replacing imessages with
SLRU". I rather think you should be thinking in how can you implement
the imessages API on top of SLRU. So as far as the coordinator and
background worker are concerned, there wouldn't be any difference --
they keep using the same API they are using today.

Also let me repeat my earlier comment about imessages being more similar
to multixact than to notify. The content of each multixact entry is
just an arbitrary amount of bytes. If imessages are numbered from a
monotonically increasing sequence, it should be possible to use a very
similar technique, and perhaps you should be able to reduce locking
requirements as well (write messages with only a shared lock, after
you've determined and reserved the area you're going to write).

From:	Markus Wanner <markus(at)bluegap(dot)ch>
To:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: dynamically allocating chunks from shared memory
Date:	2010-07-22 19:09:49
Message-ID:	4C48977D.3080702@bluegap.ch
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

On 07/22/2010 08:31 PM, Alvaro Herrera wrote:
> FWIW I don't think you should be thinking in "replacing imessages with
> SLRU". I rather think you should be thinking in how can you implement
> the imessages API on top of SLRU.

Well, I'm rather comparing SLRU with the dynamic allocator. So far I'm
unconvinced that SLRU would be a better base for imessages than a
dynamic allocator. (And I'm arguing that SLRU should use a dynamic
allocator underneath).

> So as far as the coordinator and
> background worker are concerned, there wouldn't be any difference --
> they keep using the same API they are using today.

Agreed, the imessages API to the upper layer doesn't need to care about
the underlying stuff.

> Also let me repeat my earlier comment about imessages being more similar
> to multixact than to notify. The content of each multixact entry is
> just an arbitrary amount of bytes. If imessages are numbered from a
> monotonically increasing sequence,

Well, there's absolutely no need to serialize imessages. So they don't
currently carry any such number. And opposed to multixact entries, they
are clearly directed at exactly one single consumer. Every consumer has
its own receive queue. Sending messages concurrently to different
recipients may happen completely parallelized, without any (b)locking in
between.

The dynamic allocator is the only part of the chain which might need to
do some locking to protect the shared resource (memory) against
concurrent access. Note, however, that wamalloc (as any modern dynamic
allocator) is parallelized to some extent, i.e. concurrent malloc/free
calls don't necessarily need to block each other.

> it should be possible to use a very
> similar technique, and perhaps you should be able to reduce locking
> requirements as well (write messages with only a shared lock, after
> you've determined and reserved the area you're going to write).

Writing to the message is currently (i.e. imessages-on-dynshmem) done
without *any* kind of lock held. So that would rather increase locking
requirements and lower parallelism, I fear.

Regards

Markus

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Markus Wanner <markus(at)bluegap(dot)ch>
Cc:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: dynamically allocating chunks from shared memory
Date:	2010-07-26 12:52:46
Message-ID:	AANLkTi=J7SrdceDkzB7c=tC-vBBx8KdC-M04BaUD0MWO@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Jul 22, 2010 at 3:09 PM, Markus Wanner <markus(at)bluegap(dot)ch> wrote:
>> FWIW I don't think you should be thinking in "replacing imessages with
>> SLRU". I rather think you should be thinking in how can you implement
>> the imessages API on top of SLRU.
>
> Well, I'm rather comparing SLRU with the dynamic allocator. So far I'm
> unconvinced that SLRU would be a better base for imessages than a dynamic
> allocator. (And I'm arguing that SLRU should use a dynamic allocator
> underneath).

Here's another idea. Instead of making imessages use an SLRU, how
about having it steal pages from shared_buffers? This would require
segmenting messages into small enough chunks that they'd fit, but the
nice part is that it would avoid the need to have a completely
separate shared memory arena. Ideally, we'd make the infrastructure
general enough that things like SLRU could use it also; and get rid of
or reduce in size some of the special-purpose chunks we're now
allocating.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

From:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Markus Wanner <markus(at)bluegap(dot)ch>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: dynamically allocating chunks from shared memory
Date:	2010-07-26 14:31:50
Message-ID:	1280154484-sup-7009@alvh.no-ip.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Excerpts from Robert Haas's message of lun jul 26 08:52:46 -0400 2010:

> Here's another idea. Instead of making imessages use an SLRU, how
> about having it steal pages from shared_buffers? This would require
> segmenting messages into small enough chunks that they'd fit, but the
> nice part is that it would avoid the need to have a completely
> separate shared memory arena. Ideally, we'd make the infrastructure
> general enough that things like SLRU could use it also; and get rid of
> or reduce in size some of the special-purpose chunks we're now
> allocating.

What's the problem you see with "another shared memory arena"? Right
now we allocate a single large arena, and the lot of shared_buffers,
SLRU pools, locking objects, etc are all allocated from there. If we
want another 2 MB for "dynamic shmem", we'd just allocate 2 MB more in
that large arena and give those to this new code.

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc:	Markus Wanner <markus(at)bluegap(dot)ch>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: dynamically allocating chunks from shared memory
Date:	2010-07-26 16:33:35
Message-ID:	AANLkTinYgABHqFuxaMkhZwc=nkxp=iYPX1muqWS=1O8S@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Jul 26, 2010 at 10:31 AM, Alvaro Herrera
<alvherre(at)commandprompt(dot)com> wrote:
> Excerpts from Robert Haas's message of lun jul 26 08:52:46 -0400 2010:
>> Here's another idea. Instead of making imessages use an SLRU, how
>> about having it steal pages from shared_buffers? This would require
>> segmenting messages into small enough chunks that they'd fit, but the
>> nice part is that it would avoid the need to have a completely
>> separate shared memory arena. Ideally, we'd make the infrastructure
>> general enough that things like SLRU could use it also; and get rid of
>> or reduce in size some of the special-purpose chunks we're now
>> allocating.
>
> What's the problem you see with "another shared memory arena"? Right
> now we allocate a single large arena, and the lot of shared_buffers,
> SLRU pools, locking objects, etc are all allocated from there. If we
> want another 2 MB for "dynamic shmem", we'd just allocate 2 MB more in
> that large arena and give those to this new code.

But that's not a very flexible design. If you discover that you need
3MB instead of 2MB, you get to restart the entire cluster. If you
discover that you need 1MB instead of 2MB, you get to either restart
the entire cluster, or waste 1MB of shared memory. And since actual
usage will almost certainly fluctuate, you'll almost certainly be
wasting some shared memory that could otherwise be used for other
purposes some of the time. Now, granted, we have this problem already
today, and granted also, 2MB is not an enormous amount of memory on
today's machines. If we really think that 2MB will always be adequate
for every purpose for which we wish to use unicast messaging, then
perhaps it's OK, but I'm not convinced that's true.

It would be nice to think, for example, that this could be used as
infrastructure for parallel query to stream results back from worker
processes to the backend connected to the user. If you're using 16
processors to concurrently scan 16 partitions of an appendrel and
stream those results back to the master, will 128kB/backend be enough
memory to avoid pipeline stalls? What if there's replication going on
at the same time? What if there's other concurrent activity that also
uses imessages? Or even better, what if there's other concurrent
activity that uses the dynamic allocator but NOT imessages? If the
point of having a dynamic allocator is that it's eventually going to
be used by lots of different subsystems, then we had better have a
fairly high degree of confidence that it actually will, but in fact
we've made very little effort to characterize who the other users
might be and whether the stated implementation limitations will be
adequate for them. Frankly, I doubt it. One of the major reasons why
malloc() is so powerful is that you don't have to decide in advance
how much memory you're going to need, as you would if you put the
structure in the data segment. Dynamically allocating out of a 2MB
segment gives up most of that flexibility.

What I think will end up happening here is that you'll always have to
size the segment used by the dynamic allocator considerably larger
than the amount of memory you expect to actually be used, so that
performance doesn't go into the toilet when it fills up. As Markus
pointed out upthread, you'll always need some hard limit on the amount
of space that imessages can use, but you can make that limit much
larger if it's not reserved for a single purpose. If you use the
"temporarily allocated shared buffers" method, then you could set the
default limit to something like "64MB, but not more than 1/8th of
shared buffers". Since the memory won't get used unless it's needed,
you don't really have to care whether a particular installation is
likely to need some, none, or all of that; whereas if you're
allocating nailed-down memory, you're going to want a much smaller
default - a couple of MB, at most. Furthermore, if you do happen to
be running on a 64GB machine with 8GB of shared_buffers and 64MB isn't
adequate, you can easily make it possible to bump that value up by
changing a GUC and hitting reload. With the "nailed-down shared
memory" approach, you're locked into whatever you decide at postmaster
start.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

From:	Markus Wanner <markus(at)bluegap(dot)ch>
To:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: dynamically allocating chunks from shared memory
Date:	2010-07-26 16:34:07
Message-ID:	4C4DB8FF.2010702@bluegap.ch
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

On 07/26/2010 04:31 PM, Alvaro Herrera wrote:
> Excerpts from Robert Haas's message of lun jul 26 08:52:46 -0400 2010:
>> Here's another idea. Instead of making imessages use an SLRU, how
>> about having it steal pages from shared_buffers? This would require
>> segmenting messages into small enough chunks that they'd fit, but the
>> nice part is that it would avoid the need to have a completely
>> separate shared memory arena. Ideally, we'd make the infrastructure
>> general enough that things like SLRU could use it also; and get rid of
>> or reduce in size some of the special-purpose chunks we're now
>> allocating.

To me that sounds like solving the same kind of problem for every module
separately and somewhat differently. I tend to like general solutions
(often too much, but that's another story), and to me it still seems a
completely dynamic memory allocator solves that generically (and way
more elegant than 'stealing pages' sounds).

> Right
> now we allocate a single large arena, and the lot of shared_buffers,
> SLRU pools, locking objects, etc are all allocated from there.

Uh.. they all allocate from different, statically sized pool, don't they?

> If we
> want another 2 MB for "dynamic shmem", we'd just allocate 2 MB more in
> that large arena and give those to this new code.

That's how it could work if we used a dynamic allocator. But currently,
if I understand correctly, once the shared_buffers pool is full, it
cannot steal memory from the SLRU pools. Or am I mistaken?

Regards

Markus Wanner

From:	Markus Wanner <markus(at)bluegap(dot)ch>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: dynamically allocating chunks from shared memory
Date:	2010-07-26 16:51:04
Message-ID:	4C4DBCF8.6020903@bluegap.ch
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

On 07/26/2010 06:33 PM, Robert Haas wrote:
> It would be nice to think, for example, that this could be used as
> infrastructure for parallel query to stream results back from worker
> processes to the backend connected to the user. If you're using 16
> processors to concurrently scan 16 partitions of an appendrel and
> stream those results back to the master

Now, *that* sounds like music to my ears ;-)

Or put another way: yes, I think imessages and the bgworker
infrastructure stuff could enable or at least help that goal.

> Dynamically allocating out of a 2MB
> segment gives up most of that flexibility.

Absolutely, that's why I'd like to see other modules that use the
dynamic allocator. The more the better.

> What I think will end up happening here is that you'll always have to
> size the segment used by the dynamic allocator considerably larger
> than the amount of memory you expect to actually be used, so that
> performance doesn't go into the toilet when it fills up. As Markus
> pointed out upthread, you'll always need some hard limit on the amount
> of space that imessages can use, but you can make that limit much
> larger if it's not reserved for a single purpose. If you use the
> "temporarily allocated shared buffers" method, then you could set the
> default limit to something like "64MB, but not more than 1/8th of
> shared buffers".

I've been thinking about such rules as well. They quickly get more
complex if you begin to take OOM situations and their counter-measures
into account.

In a way, fixing every separate pool to its specific size just is the
very simples rule-set I can think of. The dynamic allocator buys you
more flexibility, but choosing good limits and rules between the
sub-systems is another issue.

Regards

Markus Wanner

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Markus Wanner <markus(at)bluegap(dot)ch>
Cc:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: dynamically allocating chunks from shared memory
Date:	2010-07-26 17:16:14
Message-ID:	AANLkTimP21qj5RP07=nouAMdQ=GP3HEg1DbrPsz0uy=f@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Jul 26, 2010 at 12:51 PM, Markus Wanner <markus(at)bluegap(dot)ch> wrote:
>> Dynamically allocating out of a 2MB
>> segment gives up most of that flexibility.
>
> Absolutely, that's why I'd like to see other modules that use the dynamic
> allocator. The more the better.

Right, I agree. The problem is that I don't think they can. The
elephant in the room is shared_buffers, which I believe to be
typically BY FAR the largest consumer of shared memory. It would be
absolutely fantastic if we had a shared_buffers implementation that
could free up unused buffers when they're not needed, or add more when
required. But there are several reasons why I don't believe that will
ever happen. One, much of the code that uses shared_buffers relies on
shared_buffers being located at a fixed memory address on a contiguous
chunk, and it's hard to see how we could change that assumption
without sacrificing performance. Two, the overall size of the shared
memory arena is largely dependent on the size of shared_buffers, so
unless you also have the ability to resize the arena on the fly (which
is well-nigh to impossible with our current architecture, and maybe
with any architecture), resizing shared_buffers doesn't actually add
that much flexibility. Three, the need for shared buffers is elastic
rather than absolute: stealing a few shared buffers for a defined
purpose (like sending imessages) is perfectly reasonable, but it's
rarely going to be a good idea for the buffer manager to proactively
free up memory just in case some other part of the system might need
some. If you have a system that normally has 4GB of shared buffers
and some other module borrows 100MB and then returns it, the system
will just cache less data while that memory is in use and then start
right back up caching more again once it's returned. That's very
nice, and it's hard to see how else to achieve that result.

Of course, there are other parts of the system (a whole bunch of them)
that used shared memory also, and perhaps some of those could be
modified to use the dynamic allocator as well. But they're getting by
without it now, so maybe they don't really need it. The SLRU stuff, I
think, works more or less like shared buffers (so you have the same
set of issues) and I think most of the other users are allocating
small, fixed-size chunks.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

From:	Markus Wanner <markus(at)bluegap(dot)ch>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: dynamically allocating chunks from shared memory
Date:	2010-07-26 17:50:48
Message-ID:	4C4DCAF8.5000200@bluegap.ch
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

On 07/26/2010 07:16 PM, Robert Haas wrote:
> Of course, there are other parts of the system (a whole bunch of them)
> that used shared memory also, and perhaps some of those could be
> modified to use the dynamic allocator as well. But they're getting by
> without it now, so maybe they don't really need it. The SLRU stuff, I
> think, works more or less like shared buffers (so you have the same
> set of issues) and I think most of the other users are allocating
> small, fixed-size chunks.

Yeah, I see your point(s).

Note however, that a thread based design doesn't have this problem *at
all*. Memory generally is shared (between threads) and you can
dynamically allocate more or less (until Linux' OOM killer hits you..
yet another story). The OS reuses memory you don't currently need even
for other applications.

Users as well as developers know the threaded model (arguably, much
better than the process based one). So that's what we get compared to.
And what developers (including me) are used to.

I think we are getting by with fixed allocations at the moment, because
we did a lot to get by with it. By working around these limitations.

However, that's just my thinking. Thank you for your inputs.

Regards

Markus Wanner

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Markus Wanner <markus(at)bluegap(dot)ch>
Cc:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: dynamically allocating chunks from shared memory
Date:	2010-07-26 18:56:37
Message-ID:	AANLkTi=Ffrj9aU+ELbKmREasygTfoy0d-OzLT2ufmHOJ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Jul 26, 2010 at 1:50 PM, Markus Wanner <markus(at)bluegap(dot)ch> wrote:
> Note however, that a thread based design doesn't have this problem *at all*.
> Memory generally is shared (between threads) and you can dynamically
> allocate more or less (until Linux' OOM killer hits you.. yet another
> story). The OS reuses memory you don't currently need even for other
> applications.
>
> Users as well as developers know the threaded model (arguably, much better
> than the process based one). So that's what we get compared to. And what
> developers (including me) are used to.

I'm sort of used to the process model, myself, but I may be in the minority.

> I think we are getting by with fixed allocations at the moment, because we
> did a lot to get by with it. By working around these limitations.
>
> However, that's just my thinking. Thank you for your inputs.

I completely agree with you that fixed allocations suck. We're just
disagreeing (hopefully, in a friendly and collegial fashion) about
what to do about it.

I actually think that memory management is one of the weakest elements
of our current architecture, though I think for somewhat different
reasons than what you're thinking about. Besides the fact that we
have various smaller pools of dynamically shared memory (e.g. a
separate ring of buffers for each SLRU), I'm also unhappy about some
of the things we do with backend-private memory, work_mem being the
biggest culprit by far, because it's very difficult for the DBA to set
the knobs in a way that uses all of the memory he wants to allocate to
the database efficiently no overruns and none left over. The case
where you can count on the database and all of your temporary files,
etc. to fit in RAM is really an exceptional case: in general, you need
to assume that there will be more demand for memory than there will be
memory available, and as much as possible you want the system (rather
than the user) to decide how it should optimally be allocated. The
query planner and executor actually do have most of what is needed to
execute queries using more or less memory, but they lack the global
intelligence needed for intelligent decision-making. Letting the OS
buffer cache rather than the PG buffer cache handle most of the
system's memory helps, but it's not a complete solution.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

From:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To:	"Markus Wanner" <markus(at)bluegap(dot)ch>, "Robert Haas" <robertmhaas(at)gmail(dot)com>
Cc:	"Alvaro Herrera" <alvherre(at)commandprompt(dot)com>, "PostgreSQL-development Hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: dynamically allocating chunks from shared memory
Date:	2010-07-26 19:16:27
Message-ID:	4C4D98BB0200002500033CFE@gw.wicourts.gov
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> I actually think that memory management is one of the weakest
> elements of our current architecture

I'm actually pretty impressed by the memory contexts in PostgreSQL.
Apparently I'm not alone in that, either; a paper by Hellerstein,
Stonebraker, and Hamilton[1] has this in section 7.2 (Memory
Allocator):

"The interested reader may want to browse the open-source PostgreSQL
code. This utilizes a fairly sophisticated memory allocator."

I think the problem here is that we don't extend that sophistication
to shared memory.

-Kevin

[1] Joseph M. Hellerstein, Michael Stonebraker and James Hamilton.
2007. Architecture of a Database System. Foundations and Trends(R)
in Databases Vol. 1, No. 2 (2007) 141*259.
http://db.cs.berkeley.edu/papers/fntdb07-architecture.pdf

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc:	Markus Wanner <markus(at)bluegap(dot)ch>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: dynamically allocating chunks from shared memory
Date:	2010-07-26 20:27:38
Message-ID:	AANLkTikbNe7m2=PxhmROLWyT53dS-NFXJiww7wuG=mV7@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Jul 26, 2010 at 3:16 PM, Kevin Grittner
<Kevin(dot)Grittner(at)wicourts(dot)gov> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
>> I actually think that memory management is one of the weakest
>> elements of our current architecture
>
> I'm actually pretty impressed by the memory contexts in PostgreSQL.
> Apparently I'm not alone in that, either; a paper by Hellerstein,
> Stonebraker, and Hamilton[1] has this in section 7.2 (Memory
> Allocator):
>
> "The interested reader may want to browse the open-source PostgreSQL
> code. This utilizes a fairly sophisticated memory allocator."
>
> I think the problem here is that we don't extend that sophistication
> to shared memory.

That's one aspect of it, and the other is that we don't have much
global coordination about how we use it.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Markus Wanner <markus(at)bluegap(dot)ch>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: dynamically allocating chunks from shared memory
Date:	2010-08-09 15:02:07
Message-ID:	201008091502.o79F27O26068@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Markus Wanner wrote:
> Hi,
>
> On 07/26/2010 07:16 PM, Robert Haas wrote:
> > Of course, there are other parts of the system (a whole bunch of them)
> > that used shared memory also, and perhaps some of those could be
> > modified to use the dynamic allocator as well. But they're getting by
> > without it now, so maybe they don't really need it. The SLRU stuff, I
> > think, works more or less like shared buffers (so you have the same
> > set of issues) and I think most of the other users are allocating
> > small, fixed-size chunks.
>
> Yeah, I see your point(s).
>
> Note however, that a thread based design doesn't have this problem *at
> all*. Memory generally is shared (between threads) and you can
> dynamically allocate more or less (until Linux' OOM killer hits you..
> yet another story). The OS reuses memory you don't currently need even
> for other applications.

[ Sorry to be jumping into this thread late.]

I am not sure threads would greatly help us. The major problem is that
all of our our structures are currently contiguous in memory for quick
access. I don't see how threading would help with that. We could use
realloc(), but we can do the same in shared memory if we had a chunk
infrastructure, though concurrent access to that memory would hurt us in
either threads or shared memory.

Fundamentally, recreating the libc memory allocation routines is not
that hard. (Everyone has to detach from the shared memory segment, but
they have to stop using it too, so it doesn't seem that hard.)

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Bruce Momjian <bruce(at)momjian(dot)us>
Cc:	Markus Wanner <markus(at)bluegap(dot)ch>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: dynamically allocating chunks from shared memory
Date:	2010-08-09 15:11:04
Message-ID:	AANLkTiktqC=DkNZbQuEcd_xfCiKekUPS9pX+M4Vu+=-j@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Aug 9, 2010 at 11:02 AM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> I am not sure threads would greatly help us. The major problem is that
> all of our our structures are currently contiguous in memory for quick
> access. I don't see how threading would help with that. We could use
> realloc(), but we can do the same in shared memory if we had a chunk
> infrastructure, though concurrent access to that memory would hurt us in
> either threads or shared memory.
>
> Fundamentally, recreating the libc memory allocation routines is not
> that hard. (Everyone has to detach from the shared memory segment, but
> they have to stop using it too, so it doesn't seem that hard.)

I actually don't think that's true. The advantage (and disadvantage)
of using threads is that everything runs in one address space. So you
just allocate more memory and everyone immediately sees it. In a
process environment, that's not the case: to expand or shrink the size
of the shared memory arena, everyone needs to explicitly change their
own mapping.

So imagine that thread-or-process A allocates allocates a new chunk of
memory and then writes a pointer to the new chunk in a previously
allocated section of memory. Thread-or-process B then follows the
pointer. In a threaded model, this is guaranteed to be safe. In a
process model, it's not: A might have enlarged the shared memory
mapping while B has not yet done so. So I think in our model any sort
of change to the shared memory segment is going to require extremely
careful gymnastics, and be pretty expensive.

I don't care to take a position in the religious war over threads vs.
processes, but I do think threads simplify the handling of this
particular case.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

From:	Markus Wanner <markus(at)bluegap(dot)ch>
To:	Bruce Momjian <bruce(at)momjian(dot)us>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: dynamically allocating chunks from shared memory
Date:	2010-08-09 15:35:41
Message-ID:	4C60204D.4030400@bluegap.ch
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

On 08/09/2010 05:02 PM, Bruce Momjian wrote:
> [ Sorry to be jumping into this thread late.]

No problem at all.

> I am not sure threads would greatly help us.

Note that I'm absolutely, certainly not advocating the use of threads
for Postgres.

> The major problem is that
> all of our our structures are currently contiguous in memory for quick
> access. I don't see how threading would help with that. We could use
> realloc(), but we can do the same in shared memory if we had a chunk
> infrastructure, though concurrent access to that memory would hurt us in
> either threads or shared memory.

I don't quite follow what you are trying to say here. Whether or not
structures are contiguous in memory might affect performance, but I
don't see the relation to programmer's habits and/or knowledge.

With our process-based design, the default is private memory (i.e. not
shared). If you need shared memory, you must specify a certain amount in
advance. That chunk of shared memory then is reserved and can't ever be
used by another subsystem. Even if you barely ever need that much shared
memory for the subsystem in question.

That's opposed to what lots of people are used to with the threaded
approach, where shared memory is the default. And where you can easily
and dynamically allocate *shared* memory. Whatever chunk of shared
memory one subsystem doesn't need is available to another one (modulo
fragmentation of the dynamic allocator, perhaps, but..)

> Fundamentally, recreating the libc memory allocation routines is not
> that hard.

Uh.. well, writing a good, scalable, dynamic allocator certainly poses
some very interesting problems. Writing one that doesn't violate any
patent or other IP as an additional requirement seems like a pretty
tough problem to me.

> (Everyone has to detach from the shared memory segment, but
> they have to stop using it too, so it doesn't seem that hard.)

So far, I only considered dynamically allocating from a pool of shared
memory that's initially fixed in size. So as to be able to make better
use of shared memory.

Resizing the overall pool the easy way, requiring every backend to
detach would cost a lot of performance. So that's certainly not
something you want to do often.

The purpose of such a dynamic allocator as I see it rather is to be able
to re-allocate unused memory of one subsystem to another one *on the
fly*. Not just for performance, but also for ease of use for the admin
and the developer, IMO.

Regards

Markus Wanner

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Bruce Momjian <bruce(at)momjian(dot)us>, Markus Wanner <markus(at)bluegap(dot)ch>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: dynamically allocating chunks from shared memory
Date:	2010-08-09 15:41:24
Message-ID:	696.1281368484@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> So imagine that thread-or-process A allocates allocates a new chunk of
> memory and then writes a pointer to the new chunk in a previously
> allocated section of memory. Thread-or-process B then follows the
> pointer. In a threaded model, this is guaranteed to be safe. In a
> process model, it's not: A might have enlarged the shared memory
> mapping while B has not yet done so. So I think in our model any sort
> of change to the shared memory segment is going to require extremely
> careful gymnastics, and be pretty expensive.

... and on some platforms, it'll be flat out impossible. We looked at
this years ago and concluded that changing the size of the shmem segment
after postmaster start was impractical from a portability standpoint.
I have not seen anything to change that conclusion.

> I don't care to take a position in the religious war over threads vs.
> processes, but I do think threads simplify the handling of this
> particular case.

You meant "I don't think", right? I agree. The only way threads would
simplify this is if we went over to a mysql-style model where there was
only one process, period, and all backends were threads inside that.
No shared memory as such, at all.

regards, tom lane

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Markus Wanner <markus(at)bluegap(dot)ch>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: dynamically allocating chunks from shared memory
Date:	2010-08-09 16:03:42
Message-ID:	201008091603.o79G3gp05774@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas wrote:
> On Mon, Aug 9, 2010 at 11:02 AM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> > I am not sure threads would greatly help us. ?The major problem is that
> > all of our our structures are currently contiguous in memory for quick
> > access. ?I don't see how threading would help with that. ?We could use
> > realloc(), but we can do the same in shared memory if we had a chunk
> > infrastructure, though concurrent access to that memory would hurt us in
> > either threads or shared memory.
> >
> > Fundamentally, recreating the libc memory allocation routines is not
> > that hard. ?(Everyone has to detach from the shared memory segment, but
> > they have to stop using it too, so it doesn't seem that hard.)
>
> I actually don't think that's true. The advantage (and disadvantage)
> of using threads is that everything runs in one address space. So you
> just allocate more memory and everyone immediately sees it. In a
> process environment, that's not the case: to expand or shrink the size
> of the shared memory arena, everyone needs to explicitly change their
> own mapping.

You can't expand the size of malloc'ed memory --- you have to call
realloc(), and then you effectively get a new pointer. Shared memory
has a similar limitation. If you allocate shared memory in chunks so
you don't need to change the location, you are effectively doing another
malloc(), like you would in a threaded process.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Markus Wanner <markus(at)bluegap(dot)ch>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: dynamically allocating chunks from shared memory
Date:	2010-08-09 16:10:53
Message-ID:	201008091610.o79GAru12011@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Markus Wanner wrote:
> Hi,
>
> On 08/09/2010 05:02 PM, Bruce Momjian wrote:
> > [ Sorry to be jumping into this thread late.]
>
> No problem at all.
>
> > I am not sure threads would greatly help us.
>
> Note that I'm absolutely, certainly not advocating the use of threads
> for Postgres.
>
> > The major problem is that
> > all of our our structures are currently contiguous in memory for quick
> > access. I don't see how threading would help with that. We could use
> > realloc(), but we can do the same in shared memory if we had a chunk
> > infrastructure, though concurrent access to that memory would hurt us in
> > either threads or shared memory.
>
> I don't quite follow what you are trying to say here. Whether or not
> structures are contiguous in memory might affect performance, but I
> don't see the relation to programmer's habits and/or knowledge.
>
> With our process-based design, the default is private memory (i.e. not
> shared). If you need shared memory, you must specify a certain amount in
> advance. That chunk of shared memory then is reserved and can't ever be
> used by another subsystem. Even if you barely ever need that much shared
> memory for the subsystem in question.

Once multiple threads are using the same local memory, you have the same
issues of being unable to resize it because repalloc can change the
pointer location.

> That's opposed to what lots of people are used to with the threaded
> approach, where shared memory is the default. And where you can easily
> and dynamically allocate *shared* memory. Whatever chunk of shared
> memory one subsystem doesn't need is available to another one (modulo
> fragmentation of the dynamic allocator, perhaps, but..)

Well, this could be done with shared memory as well.

My point is that you can treat malloc the same as "add shared memory",
to some extent, with the same limiations.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Bruce Momjian <bruce(at)momjian(dot)us>
Cc:	Markus Wanner <markus(at)bluegap(dot)ch>, Robert Haas <robertmhaas(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: dynamically allocating chunks from shared memory
Date:	2010-08-09 16:31:45
Message-ID:	201008091631.o79GVj815207@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Bruce Momjian wrote:
> > With our process-based design, the default is private memory (i.e. not
> > shared). If you need shared memory, you must specify a certain amount in
> > advance. That chunk of shared memory then is reserved and can't ever be
> > used by another subsystem. Even if you barely ever need that much shared
> > memory for the subsystem in question.
>
> Once multiple threads are using the same local memory, you have the same
> issues of being unable to resize it because repalloc can change the
> pointer location.

Let me be more concrete. Suppose you are using threads, and you want to
increase your shared memory from 20MB to 30MB. How do you do that? If
you want it contiguous, you have to use realloc, which might move the
pointer. If you allocate another 10MB chunk, you then have shared
memory fragments, which is the same as adding another shared memory
segment.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Markus Wanner <markus(at)bluegap(dot)ch>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: dynamically allocating chunks from shared memory
Date:	2010-08-09 17:47:09
Message-ID:	1281376029.2142.1228.camel@ebony
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, 2010-08-09 at 11:41 -0400, Tom Lane wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> > So imagine that thread-or-process A allocates allocates a new chunk of
> > memory and then writes a pointer to the new chunk in a previously
> > allocated section of memory. Thread-or-process B then follows the
> > pointer. In a threaded model, this is guaranteed to be safe. In a
> > process model, it's not: A might have enlarged the shared memory
> > mapping while B has not yet done so. So I think in our model any sort
> > of change to the shared memory segment is going to require extremely
> > careful gymnastics, and be pretty expensive.
>
> ... and on some platforms, it'll be flat out impossible. We looked at
> this years ago and concluded that changing the size of the shmem segment
> after postmaster start was impractical from a portability standpoint.
> I have not seen anything to change that conclusion.

As caches get larger, downtime gets longer. Downtime of more than a few
minutes per year is enough to blow claims of high availability.

At some point, this project will need to face this particular hurdle. We
may need to balance utility for the majority against portability for the
minority.

We should be laying out an architectural roadmap, not just saying no. We
can make multi-year plans if we wish to.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Training and Services

From:	Markus Wanner <markus(at)bluegap(dot)ch>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: dynamically allocating chunks from shared memory
Date:	2010-08-09 18:07:41
Message-ID:	4C6043ED.9000100@bluegap.ch
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

On 08/09/2010 05:41 PM, Tom Lane wrote:
> ... and on some platforms, it'll be flat out impossible. We looked at
> this years ago and concluded that changing the size of the shmem segment
> after postmaster start was impractical from a portability standpoint.
> I have not seen anything to change that conclusion.

I haven't tried, but I tend to believe that's true.

However, I'd like to get back to the original intent of the posted
patch. Which is about dynamically allocating memory *within a fixed size
pool*.

That's something SRLU or shared_buffers do to some extent, but with lots
of limitations. And without the ability to move free memory between
sub-systems (i.e. between different SLRU buffers).

> You meant "I don't think", right? I agree. The only way threads would
> simplify this is if we went over to a mysql-style model where there was
> only one process, period, and all backends were threads inside that.
> No shared memory as such, at all.

That's how the threaded model normally is used, yes. And with that
model, allocation of shared memory is very easy. It has none of the
pre-allocation requirements we are currently facing.

Regards

Markus Wanner

From:	Markus Wanner <markus(at)bluegap(dot)ch>
To:	Bruce Momjian <bruce(at)momjian(dot)us>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: dynamically allocating chunks from shared memory
Date:	2010-08-09 18:13:05
Message-ID:	4C604531.8060102@bluegap.ch
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

On 08/09/2010 06:10 PM, Bruce Momjian wrote:
> My point is that you can treat malloc the same as "add shared memory",
> to some extent, with the same limiations.

Once one of the SLRU buffers is full, it cannot currently allocate from
another SLRU buffer's unused memory area. That memory there is plain
wasted at that moment. That's my point and the problem the allocator I
posted tries to solve.

I fail to see how malloc could help here. malloc() only allocates
process-local memory.

Regards

Markus Wanner

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Bruce Momjian <bruce(at)momjian(dot)us>, Markus Wanner <markus(at)bluegap(dot)ch>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: dynamically allocating chunks from shared memory
Date:	2010-08-09 18:16:47
Message-ID:	AANLkTim5rx_EKuKxfobn=bo7F=pAyTOPg2pdqjA1hiSA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Aug 9, 2010 at 11:41 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> So imagine that thread-or-process A allocates allocates a new chunk of
>> memory and then writes a pointer to the new chunk in a previously
>> allocated section of memory. Thread-or-process B then follows the
>> pointer. In a threaded model, this is guaranteed to be safe. In a
>> process model, it's not: A might have enlarged the shared memory
>> mapping while B has not yet done so. So I think in our model any sort
>> of change to the shared memory segment is going to require extremely
>> careful gymnastics, and be pretty expensive.
>
> ... and on some platforms, it'll be flat out impossible. We looked at
> this years ago and concluded that changing the size of the shmem segment
> after postmaster start was impractical from a portability standpoint.
> I have not seen anything to change that conclusion.

I haven't done extensive research into this, but I did take a look at
it briefly. It looked to me like the style of shared memory we're
using now (I guess it's System V) has no way to resize a shared memory
segment at all, and certainly no way that's portable. However it also
looked as though POSIX shm (shm_open, etc.) can be resized using
ftruncate(). Whether this is portable to all the platforms we run on,
or whether the behavior of ftruncate() in combination with shm_open()
is in the standard, I'm not sure. I believe I went back and reread
the old threads on this topic and it seems like the sticking point as
far as POSIX shm goes it that it lacks a readable equivalent of
shm_nattch. I think it was proposed to use a small syv shm and then
do the main shared memory arena with shm_open, but at that point you
start to wonder you're messing around with at all.

But I can't help but be intrigued by it, even so. Suppose, for
example, that we kept things that were really fixed-size in shared
memory but moved, say, shared_buffers to a POSIX shm. Would that
allow you to then make shared_buffers PGC_SIGHUP? The obvious answer
is "no", because there are a whole bunch of knock-on issues. Changing
the size of shared_buffers also means changing the number of LWLocks,
changing the number of buffer descriptors, etc. So maybe it can't be
done. But I can't stop wondering if there's a way to make it work...

>> I don't care to take a position in the religious war over threads vs.
>> processes, but I do think threads simplify the handling of this
>> particular case.
>
> You meant "I don't think", right? I agree. The only way threads would
> simplify this is if we went over to a mysql-style model where there was
> only one process, period, and all backends were threads inside that.
> No shared memory as such, at all.

I think we're saying the same thing in different ways; I agree with
everything in that paragraph that follows the question mark. By "this
particular case", I meant "shared memory allocation"; it would amount
to just calling malloc() [or palloc()]. But yeah, clearly that only
works in a single-process model.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Bruce Momjian <bruce(at)momjian(dot)us>
Cc:	Markus Wanner <markus(at)bluegap(dot)ch>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: dynamically allocating chunks from shared memory
Date:	2010-08-09 18:17:52
Message-ID:	AANLkTin=V+ahqXOMRH92GB-b64J-g9LR+ENN41V31Sf3@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Aug 9, 2010 at 12:03 PM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> Robert Haas wrote:
>> On Mon, Aug 9, 2010 at 11:02 AM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
>> > I am not sure threads would greatly help us. ?The major problem is that
>> > all of our our structures are currently contiguous in memory for quick
>> > access. ?I don't see how threading would help with that. ?We could use
>> > realloc(), but we can do the same in shared memory if we had a chunk
>> > infrastructure, though concurrent access to that memory would hurt us in
>> > either threads or shared memory.
>> >
>> > Fundamentally, recreating the libc memory allocation routines is not
>> > that hard. ?(Everyone has to detach from the shared memory segment, but
>> > they have to stop using it too, so it doesn't seem that hard.)
>>
>> I actually don't think that's true. The advantage (and disadvantage)
>> of using threads is that everything runs in one address space. So you
>> just allocate more memory and everyone immediately sees it. In a
>> process environment, that's not the case: to expand or shrink the size
>> of the shared memory arena, everyone needs to explicitly change their
>> own mapping.
>
> You can't expand the size of malloc'ed memory --- you have to call
> realloc(), and then you effectively get a new pointer. Shared memory
> has a similar limitation. If you allocate shared memory in chunks so
> you don't need to change the location, you are effectively doing another
> malloc(), like you would in a threaded process.

The point isn't what happens when you resize individual chunks; it's
what happens when you need to expand the arena.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Bruce Momjian <bruce(at)momjian(dot)us>
Cc:	Markus Wanner <markus(at)bluegap(dot)ch>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: dynamically allocating chunks from shared memory
Date:	2010-08-09 18:19:05
Message-ID:	AANLkTik6S7LCfCr72odSo0rUbXfms-pSFxzEP3b9jbkX@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Aug 9, 2010 at 12:31 PM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> Bruce Momjian wrote:
>> > With our process-based design, the default is private memory (i.e. not
>> > shared). If you need shared memory, you must specify a certain amount in
>> > advance. That chunk of shared memory then is reserved and can't ever be
>> > used by another subsystem. Even if you barely ever need that much shared
>> > memory for the subsystem in question.
>>
>> Once multiple threads are using the same local memory, you have the same
>> issues of being unable to resize it because repalloc can change the
>> pointer location.
>
> Let me be more concrete. Suppose you are using threads, and you want to
> increase your shared memory from 20MB to 30MB. How do you do that? If
> you want it contiguous, you have to use realloc, which might move the
> pointer. If you allocate another 10MB chunk, you then have shared
> memory fragments, which is the same as adding another shared memory
> segment.

You probably wouldn't do either of those things. You'd just allocate
small chunks here and there for whatever you need them for.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

From:	Markus Wanner <markus(at)bluegap(dot)ch>
To:	Bruce Momjian <bruce(at)momjian(dot)us>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: dynamically allocating chunks from shared memory
Date:	2010-08-09 18:28:24
Message-ID:	4C6048C8.5020405@bluegap.ch
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

On 08/09/2010 06:31 PM, Bruce Momjian wrote:
> Let me be more concrete. Suppose you are using threads, and you want to
> increase your shared memory from 20MB to 30MB. How do you do that?

There's absolutely no need to pre-allocate 20 MB in advance in a
threaded environment. You just allocate memory in small chunks. For a
threaded-model, that memory is shared by default, so the total amount of
shared memory can grow and shrink very easily. (And even makes usused
memory available to other processes, not just other threads).

> If
> you want it contiguous, you have to use realloc, which might move the
> pointer. If you allocate another 10MB chunk, you then have shared
> memory fragments, which is the same as adding another shared memory
> segment.

Okay, I think I now understand the requirement of continuity you
mentioned earlier already. I agree that with the current approach, we
cannot simply use such a dynamic allocator to solve all of our problems.

Every subsystem would need to be converted to something that allocates
shared memory in smaller chunks for such a dynamic allocator to be of
any use. Robert already pointed out that this may be troublesome for
shared_buffers, which is by far the largest consumer of shared memory. I
didn't look into this, yet. And I'd like to hear more about the
feasibility of that approach for other subsystems.

Another issue to be discussed would be the limits of sharing free memory
between subsystems. Maybe we even reach the conclusion that we
absolutely *want* fixed maximum sizes for every single subsystem so as
to be able to guarantee a certain amount of multi-xact or SLRU entries
at any point in time (otherwise one memory hungry subsystem could
possibly eat it all up with another subsystem getting the OOM error when
trying to allocate for its very first entry).

Thanks for bringing this discussion to live again.

Regards

Markus Wanner

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Markus Wanner <markus(at)bluegap(dot)ch>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: dynamically allocating chunks from shared memory
Date:	2010-08-09 18:33:08
Message-ID:	201008091833.o79IX8F11111@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Markus Wanner wrote:
> Hi,
>
> On 08/09/2010 06:10 PM, Bruce Momjian wrote:
> > My point is that you can treat malloc the same as "add shared memory",
> > to some extent, with the same limiations.
>
> Once one of the SLRU buffers is full, it cannot currently allocate from
> another SLRU buffer's unused memory area. That memory there is plain
> wasted at that moment. That's my point and the problem the allocator I
> posted tries to solve.
>
> I fail to see how malloc could help here. malloc() only allocates
> process-local memory.

My point is that we have the same limitations with malloc()/threads, as
we have with shared memory.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Markus Wanner <markus(at)bluegap(dot)ch>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: dynamically allocating chunks from shared memory
Date:	2010-08-09 18:33:54
Message-ID:	201008091833.o79IXsh11280@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas wrote:
> On Mon, Aug 9, 2010 at 12:31 PM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> > Bruce Momjian wrote:
> >> > With our process-based design, the default is private memory (i.e. not
> >> > shared). If you need shared memory, you must specify a certain amount in
> >> > advance. That chunk of shared memory then is reserved and can't ever be
> >> > used by another subsystem. Even if you barely ever need that much shared
> >> > memory for the subsystem in question.
> >>
> >> Once multiple threads are using the same local memory, you have the same
> >> issues of being unable to resize it because repalloc can change the
> >> pointer location.
> >
> > Let me be more concrete. ?Suppose you are using threads, and you want to
> > increase your shared memory from 20MB to 30MB. ?How do you do that? ?If
> > you want it contiguous, you have to use realloc, which might move the
> > pointer. ?If you allocate another 10MB chunk, you then have shared
> > memory fragments, which is the same as adding another shared memory
> > segment.
>
> You probably wouldn't do either of those things. You'd just allocate
> small chunks here and there for whatever you need them for.

Well, then we do that with shared memory then --- my point is that it is
the same problem with threads or processes.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Markus Wanner <markus(at)bluegap(dot)ch>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: dynamically allocating chunks from shared memory
Date:	2010-08-09 18:34:22
Message-ID:	201008091834.o79IYML11311@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Markus Wanner wrote:
> Hi,
>
> On 08/09/2010 06:31 PM, Bruce Momjian wrote:
> > Let me be more concrete. Suppose you are using threads, and you want to
> > increase your shared memory from 20MB to 30MB. How do you do that?
>
> There's absolutely no need to pre-allocate 20 MB in advance in a
> threaded environment. You just allocate memory in small chunks. For a
> threaded-model, that memory is shared by default, so the total amount of
> shared memory can grow and shrink very easily. (And even makes usused
> memory available to other processes, not just other threads).
>
> > If
> > you want it contiguous, you have to use realloc, which might move the
> > pointer. If you allocate another 10MB chunk, you then have shared
> > memory fragments, which is the same as adding another shared memory
> > segment.
>
> Okay, I think I now understand the requirement of continuity you
> mentioned earlier already. I agree that with the current approach, we
> cannot simply use such a dynamic allocator to solve all of our problems.
>
> Every subsystem would need to be converted to something that allocates
> shared memory in smaller chunks for such a dynamic allocator to be of
> any use. Robert already pointed out that this may be troublesome for
> shared_buffers, which is by far the largest consumer of shared memory. I
> didn't look into this, yet. And I'd like to hear more about the
> feasibility of that approach for other subsystems.
>
> Another issue to be discussed would be the limits of sharing free memory
> between subsystems. Maybe we even reach the conclusion that we
> absolutely *want* fixed maximum sizes for every single subsystem so as
> to be able to guarantee a certain amount of multi-xact or SLRU entries
> at any point in time (otherwise one memory hungry subsystem could
> possibly eat it all up with another subsystem getting the OOM error when
> trying to allocate for its very first entry).

Yep, you would have to use chunks in threads/malloc, and you have to do
the same thing with shared memory.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

From:	Markus Wanner <markus(at)bluegap(dot)ch>
To:	Bruce Momjian <bruce(at)momjian(dot)us>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: dynamically allocating chunks from shared memory
Date:	2010-08-09 18:39:57
Message-ID:	4C604B7D.9060703@bluegap.ch
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 08/09/2010 08:33 PM, Bruce Momjian wrote:
> Robert Haas wrote:
>> You probably wouldn't do either of those things. You'd just allocate
>> small chunks here and there for whatever you need them for.
>
> Well, then we do that with shared memory then --- my point is that it is
> the same problem with threads or processes.

That's what my patch allows you to do, yes. Currently you are bound to
pre-allocate shared memory at startup. Or how would you allocate small
chunks from shared memory at the moment?

Regards

Markus

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Markus Wanner <markus(at)bluegap(dot)ch>
Cc:	Bruce Momjian <bruce(at)momjian(dot)us>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: dynamically allocating chunks from shared memory
Date:	2010-08-09 18:45:53
Message-ID:	AANLkTimQk=KK4hxP3ufbm1VOOUfOmRZYEx+txCJAh4T7@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Aug 9, 2010 at 2:28 PM, Markus Wanner <markus(at)bluegap(dot)ch> wrote:
> Another issue to be discussed would be the limits of sharing free memory
> between subsystems. Maybe we even reach the conclusion that we absolutely
> *want* fixed maximum sizes for every single subsystem so as to be able to
> guarantee a certain amount of multi-xact or SLRU entries at any point in
> time (otherwise one memory hungry subsystem could possibly eat it all up
> with another subsystem getting the OOM error when trying to allocate for its
> very first entry).

Yeah, I think that's a real concern. I think we need to distinguish
memory needs from memory wants. Ideally, we'd like our entire
database to be cached in RAM. But that may or may not be feasible, so
we page what we can into shared_buffers and page out as necessary to
make room for other things. In contrast, the traditional malloc()
approach doesn't give you much flexibility: if it returns NULL, you
pretty much have to fail whatever operation you were trying to
perform. For some things, that's OK. For other things, it's not.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Bruce Momjian <bruce(at)momjian(dot)us>
Cc:	Markus Wanner <markus(at)bluegap(dot)ch>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: dynamically allocating chunks from shared memory
Date:	2010-08-09 18:48:53
Message-ID:	AANLkTik0W-wRy6-uhy1ZfwFaMYEEtWG2yrW4Py3RHVNf@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Aug 9, 2010 at 2:33 PM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
>> > Let me be more concrete. ?Suppose you are using threads, and you want to
>> > increase your shared memory from 20MB to 30MB. ?How do you do that? ?If
>> > you want it contiguous, you have to use realloc, which might move the
>> > pointer. ?If you allocate another 10MB chunk, you then have shared
>> > memory fragments, which is the same as adding another shared memory
>> > segment.
>>
>> You probably wouldn't do either of those things. You'd just allocate
>> small chunks here and there for whatever you need them for.
>
> Well, then we do that with shared memory then --- my point is that it is
> the same problem with threads or processes.

Well, I think your point is wrong, then. :-)

It's not the same at all. If you have a bunch of threads in one
address space, "shared" memory is really just process-local. You can
grow the total amount of allocated space just by calling malloc().
With our architecture, you can't.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Markus Wanner <markus(at)bluegap(dot)ch>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: dynamically allocating chunks from shared memory
Date:	2010-08-09 18:49:15
Message-ID:	201008091849.o79InFi13563@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Markus Wanner wrote:
> On 08/09/2010 08:33 PM, Bruce Momjian wrote:
> > Robert Haas wrote:
> >> You probably wouldn't do either of those things. You'd just allocate
> >> small chunks here and there for whatever you need them for.
> >
> > Well, then we do that with shared memory then --- my point is that it is
> > the same problem with threads or processes.
>
> That's what my patch allows you to do, yes. Currently you are bound to
> pre-allocate shared memory at startup. Or how would you allocate small
> chunks from shared memory at the moment?

We don't --- we allocate it all at startup.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Markus Wanner <markus(at)bluegap(dot)ch>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: dynamically allocating chunks from shared memory
Date:	2010-08-09 18:50:45
Message-ID:	201008091850.o79Ioje13741@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas wrote:
> On Mon, Aug 9, 2010 at 2:33 PM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> >> > Let me be more concrete. ?Suppose you are using threads, and you want to
> >> > increase your shared memory from 20MB to 30MB. ?How do you do that? ?If
> >> > you want it contiguous, you have to use realloc, which might move the
> >> > pointer. ?If you allocate another 10MB chunk, you then have shared
> >> > memory fragments, which is the same as adding another shared memory
> >> > segment.
> >>
> >> You probably wouldn't do either of those things. ?You'd just allocate
> >> small chunks here and there for whatever you need them for.
> >
> > Well, then we do that with shared memory then --- my point is that it is
> > the same problem with threads or processes.
>
> Well, I think your point is wrong, then. :-)
>
> It's not the same at all. If you have a bunch of threads in one
> address space, "shared" memory is really just process-local. You can
> grow the total amount of allocated space just by calling malloc().
> With our architecture, you can't.

You effectively have to add infrastructure to add/remove shared memory
segments to match memory requests. It is another step, but it is the
same behavior.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

From:	Markus Wanner <markus(at)bluegap(dot)ch>
To:	Bruce Momjian <bruce(at)momjian(dot)us>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: dynamically allocating chunks from shared memory
Date:	2010-08-09 18:55:59
Message-ID:	4C604F3F.1060002@bluegap.ch
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 08/09/2010 08:49 PM, Bruce Momjian wrote:
> Markus Wanner wrote:
>> That's what my patch allows you to do, yes. Currently you are bound to
>> pre-allocate shared memory at startup. Or how would you allocate small
>> chunks from shared memory at the moment?
>
> We don't --- we allocate it all at startup.

Exactly. And that's the difference to a thread-based approach. The
downside of it is that you need to know in advance how much shared
memory each of the subsystems is going to need. On the upside is the
certainty, that you already have the memory allocated and cannot run out
of it. You just have what you have.

(Note that you could do that as well with the thread-based approach, if
you want. Most other programs I know don't choose that approach, though,
but instead try to cope with OOM).

Regards

Markus

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Bruce Momjian <bruce(at)momjian(dot)us>
Cc:	Markus Wanner <markus(at)bluegap(dot)ch>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: dynamically allocating chunks from shared memory
Date:	2010-08-09 18:59:02
Message-ID:	AANLkTikp4Z9eTXrAO5sf5ahz0NEv0D8FVVVMAAgRq_mm@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Aug 9, 2010 at 2:50 PM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> Robert Haas wrote:
>> On Mon, Aug 9, 2010 at 2:33 PM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
>> >> > Let me be more concrete. ?Suppose you are using threads, and you want to
>> >> > increase your shared memory from 20MB to 30MB. ?How do you do that? ?If
>> >> > you want it contiguous, you have to use realloc, which might move the
>> >> > pointer. ?If you allocate another 10MB chunk, you then have shared
>> >> > memory fragments, which is the same as adding another shared memory
>> >> > segment.
>> >>
>> >> You probably wouldn't do either of those things. ?You'd just allocate
>> >> small chunks here and there for whatever you need them for.
>> >
>> > Well, then we do that with shared memory then --- my point is that it is
>> > the same problem with threads or processes.
>>
>> Well, I think your point is wrong, then. :-)
>>
>> It's not the same at all. If you have a bunch of threads in one
>> address space, "shared" memory is really just process-local. You can
>> grow the total amount of allocated space just by calling malloc().
>> With our architecture, you can't.
>
> You effectively have to add infrastructure to add/remove shared memory
> segments to match memory requests. It is another step, but it is the
> same behavior.

That would be one way to tackle the problem, but there are
difficulties. If we just created new shared memory segments at need,
we might end up with a lot of shared memory segments. I suspect that
would get complicated and present many management difficulties - which
is why I'm so far of the opinion that we should try to architect the
system to avoid the need for this functionality. I don't think it's
going to be too easy to provide, short of (as Tom says) moving to the
MySQL model of many threads working in a single process.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Markus Wanner <markus(at)bluegap(dot)ch>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: dynamically allocating chunks from shared memory
Date:	2010-08-09 19:00:20
Message-ID:	201008091900.o79J0KK15418@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas wrote:
> On Mon, Aug 9, 2010 at 2:50 PM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> > Robert Haas wrote:
> >> On Mon, Aug 9, 2010 at 2:33 PM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> >> >> > Let me be more concrete. ?Suppose you are using threads, and you want to
> >> >> > increase your shared memory from 20MB to 30MB. ?How do you do that? ?If
> >> >> > you want it contiguous, you have to use realloc, which might move the
> >> >> > pointer. ?If you allocate another 10MB chunk, you then have shared
> >> >> > memory fragments, which is the same as adding another shared memory
> >> >> > segment.
> >> >>
> >> >> You probably wouldn't do either of those things. ?You'd just allocate
> >> >> small chunks here and there for whatever you need them for.
> >> >
> >> > Well, then we do that with shared memory then --- my point is that it is
> >> > the same problem with threads or processes.
> >>
> >> Well, I think your point is wrong, then. ?:-)
> >>
> >> It's not the same at all. ?If you have a bunch of threads in one
> >> address space, "shared" memory is really just process-local. ?You can
> >> grow the total amount of allocated space just by calling malloc().
> >> With our architecture, you can't.
> >
> > You effectively have to add infrastructure to add/remove shared memory
> > segments to match memory requests. ?It is another step, but it is the
> > same behavior.
>
> That would be one way to tackle the problem, but there are
> difficulties. If we just created new shared memory segments at need,
> we might end up with a lot of shared memory segments. I suspect that
> would get complicated and present many management difficulties - which
> is why I'm so far of the opinion that we should try to architect the
> system to avoid the need for this functionality. I don't think it's
> going to be too easy to provide, short of (as Tom says) moving to the
> MySQL model of many threads working in a single process.

You could allocate shared memory in chunks and then pass that out to
requestors, the same way sbrk() does it.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

From:	Markus Wanner <markus(at)bluegap(dot)ch>
To:	Bruce Momjian <bruce(at)momjian(dot)us>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: dynamically allocating chunks from shared memory
Date:	2010-08-09 19:02:33
Message-ID:	4C6050C9.6050802@bluegap.ch
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 08/09/2010 08:50 PM, Bruce Momjian wrote:
> You effectively have to add infrastructure to add/remove shared memory
> segments to match memory requests. It is another step, but it is the
> same behavior.

That's of no use without a dynamic allocator, I think. Or else it is a
vague description of a dynamic allocator.

I'm approaching the problem from another perspective: trying to
implement a dynamic allocator on top of a fixed size memory pool, first.
Once we have that, we may start to think about dynamically adding or
removing underlying segments.

Regards

Markus Wanner

From:	Markus Wanner <markus(at)bluegap(dot)ch>
To:	Bruce Momjian <bruce(at)momjian(dot)us>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: dynamically allocating chunks from shared memory
Date:	2010-08-09 19:08:49
Message-ID:	4C605241.1050108@bluegap.ch
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 08/09/2010 09:00 PM, Bruce Momjian wrote:
> You could allocate shared memory in chunks and then pass that out to
> requestors, the same way sbrk() does it.

sbrk() is described [1] as a "low-level memory allocator", which "is
typically only used by the high-level malloc memory allocator
implemented in the C library".

Think of my patch as the high(er)-level variant ;-) It's certainly
doable using processes and shared memory. Yes. My patch shows one way of
how to go a step into that direction.

Regards

Markus Wanner

[1]: http://www.cs.utah.edu/flux/moss/node39.html

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Bruce Momjian <bruce(at)momjian(dot)us>
Cc:	Markus Wanner <markus(at)bluegap(dot)ch>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: dynamically allocating chunks from shared memory
Date:	2010-08-09 19:11:06
Message-ID:	AANLkTimtveDThmUmdhV9TOQ_PMNHcxORk1vrOM4R4-Ak@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Aug 9, 2010 at 3:00 PM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
>> That would be one way to tackle the problem, but there are
>> difficulties. If we just created new shared memory segments at need,
>> we might end up with a lot of shared memory segments. I suspect that
>> would get complicated and present many management difficulties - which
>> is why I'm so far of the opinion that we should try to architect the
>> system to avoid the need for this functionality. I don't think it's
>> going to be too easy to provide, short of (as Tom says) moving to the
>> MySQL model of many threads working in a single process.
>
> You could allocate shared memory in chunks and then pass that out to
> requestors, the same way sbrk() does it.

Sure. But I don't think that gets you very far. The management of
the chunks is really hard. I go back to my previous example: you
can't store a pointer that might point to another chunk, because the
chunks won't get mapped into all the address spaces synchronously.
Even if you don't care about doing that (and I bet you do), mapping
and unmapping chunks is a heavyweight operation that requires every
backend to notice that it needs to do something (and, incidentally, if
any of them fail, you pretty much have to PANIC). I just can't
imagine us building a reliable system this way.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

From:	Markus Wanner <markus(at)bluegap(dot)ch>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Bruce Momjian <bruce(at)momjian(dot)us>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: dynamically allocating chunks from shared memory
Date:	2010-08-09 19:11:48
Message-ID:	4C6052F4.1030607@bluegap.ch
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

On 08/09/2010 08:45 PM, Robert Haas wrote:
> Yeah, I think that's a real concern. I think we need to distinguish
> memory needs from memory wants. Ideally, we'd like our entire
> database to be cached in RAM. But that may or may not be feasible, so
> we page what we can into shared_buffers and page out as necessary to
> make room for other things. In contrast, the traditional malloc()
> approach doesn't give you much flexibility: if it returns NULL, you
> pretty much have to fail whatever operation you were trying to
> perform. For some things, that's OK. For other things, it's not.

Agreed, it's going to be a difficult compromise and it possibly is very
hard to find a good one automatically. However, I doubt our current
approach with hard limits between subsystems is the best compromise.

Regards

Markus Wanner

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Markus Wanner <markus(at)bluegap(dot)ch>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: dynamically allocating chunks from shared memory
Date:	2010-08-09 19:14:27
Message-ID:	12219.1281381267@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Markus Wanner <markus(at)bluegap(dot)ch> writes:
> However, I'd like to get back to the original intent of the posted
> patch. Which is about dynamically allocating memory *within a fixed size
> pool*.

> That's something SRLU or shared_buffers do to some extent, but with lots
> of limitations. And without the ability to move free memory between
> sub-systems (i.e. between different SLRU buffers).

As far as SLRU is concerned, the already-agreed-to plan is to get rid of
the separate arenas for SLRU and merge those things into the main shared
buffers arena. IIRC, the motivation for designing SLRU the way it is
was to ensure that SLRU uses couldn't be starved for memory due to high
demand for shared buffers. But that was back when people frequently ran
PG with only a few meg for shared buffers; I think that worry is
obsolete.

So I don't see this patch as offering anything at all that we care about
so far as the core server is concerned. Maybe there are extensions that
need it badly enough to justify such a feature in core, but SLRU is not
a good argument for it.

regards, tom lane

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Bruce Momjian <bruce(at)momjian(dot)us>, Markus Wanner <markus(at)bluegap(dot)ch>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: dynamically allocating chunks from shared memory
Date:	2010-08-09 19:20:21
Message-ID:	12332.1281381621@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Mon, Aug 9, 2010 at 11:41 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> ... and on some platforms, it'll be flat out impossible. We looked at
>> this years ago and concluded that changing the size of the shmem segment
>> after postmaster start was impractical from a portability standpoint.
>> I have not seen anything to change that conclusion.

> I haven't done extensive research into this, but I did take a look at
> it briefly. It looked to me like the style of shared memory we're
> using now (I guess it's System V) has no way to resize a shared memory
> segment at all, and certainly no way that's portable. However it also
> looked as though POSIX shm (shm_open, etc.) can be resized using
> ftruncate(). Whether this is portable to all the platforms we run on,
> or whether the behavior of ftruncate() in combination with shm_open()
> is in the standard, I'm not sure.

It's not portable. That's exactly what we were looking into back when.

> I believe I went back and reread
> the old threads on this topic and it seems like the sticking point as
> far as POSIX shm goes it that it lacks a readable equivalent of
> shm_nattch.

Yeah, that was another little problem. In principle though we only need
one SysV-style shmem segment to get the required interlock, and there
could be add-on shmem segments using POSIX or other APIs. But that
doesn't get you out from under the portability issue or the memory space
management issue (it's unlikely you can enlarge a segment without
remapping it).

regards, tom lane

From:	Markus Wanner <markus(at)bluegap(dot)ch>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: dynamically allocating chunks from shared memory
Date:	2010-08-09 19:27:24
Message-ID:	4C60569C.3020700@bluegap.ch
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

On 08/09/2010 09:14 PM, Tom Lane wrote:
> As far as SLRU is concerned, the already-agreed-to plan is to get rid of
> the separate arenas for SLRU and merge those things into the main shared
> buffers arena.

I didn't know about that plan. Sounds good. (I'm personally thinking
this is trying to solve the same problem in a more specific fashion).

> IIRC, the motivation for designing SLRU the way it is
> was to ensure that SLRU uses couldn't be starved for memory due to high
> demand for shared buffers. But that was back when people frequently ran
> PG with only a few meg for shared buffers; I think that worry is
> obsolete.

Good to know.

> So I don't see this patch as offering anything at all that we care about
> so far as the core server is concerned. Maybe there are extensions that
> need it badly enough to justify such a feature in core, but SLRU is not
> a good argument for it.

Fair enough.

(Patch is already marked as "returned with feedback" on the commitfest
app, thanks again for additional feedback)

Regards

Markus Wanner

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Bruce Momjian <bruce(at)momjian(dot)us>, Markus Wanner <markus(at)bluegap(dot)ch>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: dynamically allocating chunks from shared memory
Date:	2010-08-09 20:09:14
Message-ID:	AANLkTi=ZUDA73qYpnp80pKS2N8kR5EiOQN0m83Drmn-4@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Aug 9, 2010 at 3:20 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Mon, Aug 9, 2010 at 11:41 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> ... and on some platforms, it'll be flat out impossible. We looked at
>>> this years ago and concluded that changing the size of the shmem segment
>>> after postmaster start was impractical from a portability standpoint.
>>> I have not seen anything to change that conclusion.
>
>> I haven't done extensive research into this, but I did take a look at
>> it briefly. It looked to me like the style of shared memory we're
>> using now (I guess it's System V) has no way to resize a shared memory
>> segment at all, and certainly no way that's portable. However it also
>> looked as though POSIX shm (shm_open, etc.) can be resized using
>> ftruncate(). Whether this is portable to all the platforms we run on,
>> or whether the behavior of ftruncate() in combination with shm_open()
>> is in the standard, I'm not sure.
>
> It's not portable. That's exactly what we were looking into back when.

Uggh, that sucks. Can you provide any more details?

>> I believe I went back and reread
>> the old threads on this topic and it seems like the sticking point as
>> far as POSIX shm goes it that it lacks a readable equivalent of
>> shm_nattch.
>
> Yeah, that was another little problem. In principle though we only need
> one SysV-style shmem segment to get the required interlock, and there
> could be add-on shmem segments using POSIX or other APIs. But that
> doesn't get you out from under the portability issue or the memory space
> management issue (it's unlikely you can enlarge a segment without
> remapping it).

Unlikely is probably an understatement. Still, enlarging a segment
with remapping might be workable for some useful subset of the cases.
But, if enlarging it can't be done portably, then we're pretty much
dead in the water.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Bruce Momjian <bruce(at)momjian(dot)us>, Markus Wanner <markus(at)bluegap(dot)ch>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: dynamically allocating chunks from shared memory
Date:	2010-08-09 20:18:26
Message-ID:	13346.1281385106@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Mon, Aug 9, 2010 at 3:20 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> It's not portable. That's exactly what we were looking into back when.

> Uggh, that sucks. Can you provide any more details?

You don't really have to go further than consulting the relevant
standards, eg SUS says at
http://www.opengroup.org/onlinepubs/007908799/xsh/mmap.html

If the size of the mapped file changes after the call to mmap() as a
result of some other operation on the mapped file, the effect of
references to portions of the mapped region that correspond to added
or removed portions of the file is unspecified.

Particular implementations might cope with such cases in useful ways, or
then again they might not. And even if your platform does, you've set
an upper limit for the possible segment size in your mmap() call.

Further down the page, SUS also takes pains to point out that you
probably can't have an unlimited number of mapped regions, so adding
more mmap'd segments isn't a way out either.

regards, tom lane

From:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To:	"Robert Haas" <robertmhaas(at)gmail(dot)com>, "Bruce Momjian" <bruce(at)momjian(dot)us>
Cc:	"Markus Wanner" <markus(at)bluegap(dot)ch>, "Alvaro Herrera" <alvherre(at)commandprompt(dot)com>, "PostgreSQL-development Hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: dynamically allocating chunks from shared memory
Date:	2010-08-09 20:34:08
Message-ID:	4C601FF00200002500034394@gw.wicourts.gov
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> I don't think it's going to be too easy to provide, short of (as
> Tom says) moving to the MySQL model of many threads working in a
> single process.

Well, it's a bit misleading to refer to it as the MySQL model. It's
used by Microsoft SQL Server, MySQL, Informix, and Sybase. IBM DB2
supports four different process models, and OS threads in a single
process is the default for them on an OS with good threading
support; otherwise they default to one process per connection.

Just because MySQL uses a particular technique doesn't
*automatically* mean it's a bad one; it's just not in itself a
confidence-builder. ;-)

-Kevin

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Bruce Momjian <bruce(at)momjian(dot)us>, Markus Wanner <markus(at)bluegap(dot)ch>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: dynamically allocating chunks from shared memory
Date:	2010-08-09 20:44:52
Message-ID:	AANLkTim3AEyuwad5Z5-z4QMOU2s+qqeK7JLZCu=7p87-@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Aug 9, 2010 at 4:18 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Mon, Aug 9, 2010 at 3:20 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> It's not portable. That's exactly what we were looking into back when.
>
>> Uggh, that sucks. Can you provide any more details?
>
> You don't really have to go further than consulting the relevant
> standards, eg SUS says at
> http://www.opengroup.org/onlinepubs/007908799/xsh/mmap.html
>
> If the size of the mapped file changes after the call to mmap() as a
> result of some other operation on the mapped file, the effect of
> references to portions of the mapped region that correspond to added
> or removed portions of the file is unspecified.
>
> Particular implementations might cope with such cases in useful ways, or
> then again they might not.

That doesn't seem like a big problem to me. I was assuming we'd need
to remap when the size changed. Also, I was assuming that we were
going to use shms, not files. Take a look at this:

http://www.opengroup.org/onlinepubs/007908799/xsh/shm_open.html -and-
http://www.opengroup.org/onlinepubs/007908799/xsh/ftruncate.html

From the ftruncate page: "If fildes references a shared memory object,
ftruncate() sets the size of the shared memory object to length."

> And even if your platform does, you've set
> an upper limit for the possible segment size in your mmap() call.
>
> Further down the page, SUS also takes pains to point out that you
> probably can't have an unlimited number of mapped regions, so adding
> more mmap'd segments isn't a way out either.

Yeah. I think any approach that is based on allocating new segments
as needed is pretty much DOA. I think the point of this would be to
be able to resize things like shared_buffers on the fly - that is, an
explicit administrator action might trigger a resize-and-remap cycle,
but general system activity would not. The reality is that as
PostgreSQL is used in more and more 24x7 contexts and people put more
and more critical data into it, forced server restarts become more and
more of a problem. IMHO, we really need to do some creative thinking
about how to crank PGC_POSTMASTER GUCs down to PGC_SIGHUP.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

From:	Martijn van Oosterhout <kleptog(at)svana(dot)org>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Bruce Momjian <bruce(at)momjian(dot)us>, Markus Wanner <markus(at)bluegap(dot)ch>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: dynamically allocating chunks from shared memory
Date:	2010-08-09 20:58:48
Message-ID:	20100809205848.GA9408@svana.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Aug 09, 2010 at 02:16:47PM -0400, Robert Haas wrote:
> I believe I went back and reread
> the old threads on this topic and it seems like the sticking point as
> far as POSIX shm goes it that it lacks a readable equivalent of
> shm_nattch. I think it was proposed to use a small syv shm and then
> do the main shared memory arena with shm_open, but at that point you
> start to wonder you're messing around with at all.

About using a small sysV segment for nattach and allocating the rest
another way: the reason to do it is that "the other way" can be
anything other than sysV. Namely, sysV has pathetic default limits
whereas you can mmap() a few gig anonymously and the kernel won't bat
an eyelid.

Even if "the other way" didn't allow you to resize anything (which is
what people appear to be talking about here) the benefit of being able
to specify useful sizes of shared buffers without having to reconfigure
the kernel makes it (ISTM) worthwhile doing irrespective of anything
else.

Have a nice day,
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> Patriotism is when love of your own people comes first; nationalism,
> when hate for people other than your own comes first.
> - Charles de Gaulle

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Bruce Momjian <bruce(at)momjian(dot)us>, Markus Wanner <markus(at)bluegap(dot)ch>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: dynamically allocating chunks from shared memory
Date:	2010-08-09 23:17:48
Message-ID:	16034.1281395868@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Mon, Aug 9, 2010 at 4:18 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Particular implementations might cope with such cases in useful ways, or
>> then again they might not.

> That doesn't seem like a big problem to me. I was assuming we'd need
> to remap when the size changed.

Well, as long as you can do that, sure. I'm concerned about what
happens if/when remapping fails (not at all unlikely in 32-bit address
spaces in particular). You mentioned that that would probably have to
be a PANIC condition, which I think I agree with; and that idea pretty
much kills any argument that this would be a good way to improve server
uptime.

Another issue is that if you're doing dynamic remapping you almost
certainly can't assume that the segment will appear at the same
addresses in every backend. We could live with that for shared buffers
without too much pain, but not so much for most other shared
datastructures.

> Also, I was assuming that we were
> going to use shms, not files.

It looked to me like the spec for mmap was the same either way.

regards, tom lane

From:	Greg Stark <gsstark(at)mit(dot)edu>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Bruce Momjian <bruce(at)momjian(dot)us>, Markus Wanner <markus(at)bluegap(dot)ch>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: dynamically allocating chunks from shared memory
Date:	2010-08-10 01:06:38
Message-ID:	AANLkTi=38rsRTUVpr9S5J198-Tr8hpiMB=DqF=D=ZRd5@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Aug 9, 2010 at 9:44 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> That doesn't seem like a big problem to me. I was assuming we'd need
> to remap when the size changed.

I had thought about this in the past too, just for supporting run-time
changes to shared_buffers. I always assumed we would just allocate
shared memory in chunks and create separate mappings for each chunk.

--
greg

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Bruce Momjian <bruce(at)momjian(dot)us>, Markus Wanner <markus(at)bluegap(dot)ch>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: dynamically allocating chunks from shared memory
Date:	2010-08-10 01:29:27
Message-ID:	AANLkTi=z0dHD2C-OqqnOx9Xz1Bqt+ca4ss9TJGA-d+m5@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Aug 9, 2010 at 7:17 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Mon, Aug 9, 2010 at 4:18 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> Particular implementations might cope with such cases in useful ways, or
>>> then again they might not.
>> That doesn't seem like a big problem to me. I was assuming we'd need
>> to remap when the size changed.
> Well, as long as you can do that, sure. I'm concerned about what
> happens if/when remapping fails (not at all unlikely in 32-bit address
> spaces in particular). You mentioned that that would probably have to
> be a PANIC condition, which I think I agree with; and that idea pretty
> much kills any argument that this would be a good way to improve server
> uptime.

In some cases, you might be able to get by with FATAL. Still, it's
easier to imagine using this in cases for things like resizing
shared_buffers (where the alternative is to restart the server anyway)
than it is to use it for routine memory allocation.

> Another issue is that if you're doing dynamic remapping you almost
> certainly can't assume that the segment will appear at the same
> addresses in every backend. We could live with that for shared buffers
> without too much pain, but not so much for most other shared
> datastructures.

Hmm.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company