Quick Links

PATCH: two slab-like memory allocators

Lists:	pgsql-hackers

From:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To:	pgsql-hackers(at)postgreSQL(dot)org
Subject:	PATCH: two slab-like memory allocators
Date:	2016-08-02 15:44:35
Message-ID:	d15dff83-0b37-28ed-0809-95a5cc7292ad@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

Back in the bug #14231 thread [1], dealing with performance issues in
reorderbuffer due to excessive number of expensive free() calls, I've
proposed to resolve that by a custom slab-like memory allocator,
suitable for fixed-size allocations. I'd like to put this into the next
CF, and it's probably too invasive change to count as a bugfix anyway.

[1]
https://www.postgresql.org/message-id/flat/20160706185502.1426.28143%40wrigleys.postgresql.org

This patch actually includes two new memory allocators (not one). Very
brief summary (for more detailed explanation of the ideas, see comments
at the beginning of slab.c and genslab.c):

Slab
----
* suitable for fixed-length allocations (other pallocs fail)
* much simpler than AllocSet (no global freelist management etc.)
* free space is tracked per block (using a simple bitmap)
* which allows freeing the block once all chunks are freed (AllocSet
will hold the memory forever, in the hope of reusing it)

GenSlab
-------
* suitable for non-fixed-length allocations, but with chunks of mostly
the same size (initially unknown, the context will tune itself)
* a combination AllocSet and Slab (or a sequence of Slab allocators)
* the goal is to do most allocations in Slab context
* there's always a single 'current' Slab context, and every now and and
then it's replaced with a new generation (with the chunk size computed
from recent requests)
* the AllocSet context is used for chunks too large for current Slab

So none of this is meant as a universal replacement of AllocSet, but in
the suitable cases the results seem really promising. For example for
the simple test query in [1], the performance improvement is this:

N | master | patched
-----------------------------
10000 | 100ms | 100ms
50000 | 15000ms | 350ms
100000 | 146000ms | 700ms
200000 | ? | 1400ms

That's a fairly significant improvement, and the submitted version of
the patches should perform even better (~2x, IIRC).

There's a bunch of TODOs - e.g. handling of realloc() calls in the
GenSlab, and probably things I haven't thought about.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment	Content-Type	Size
0001-simple-slab-allocator-fixed-size-allocations.patch	binary/octet-stream	35.3 KB
0002-generational-slab-auto-tuning-allocator.patch	binary/octet-stream	16.2 KB

From:	Petr Jelinek <petr(at)2ndquadrant(dot)com>
To:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgreSQL(dot)org
Subject:	Re: PATCH: two slab-like memory allocators
Date:	2016-09-25 18:48:41
Message-ID:	c5653004-fe91-5a13-409c-db9f34642b0c@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi Tomas,

On 02/08/16 17:44, Tomas Vondra wrote:
>
> This patch actually includes two new memory allocators (not one). Very
> brief summary (for more detailed explanation of the ideas, see comments
> at the beginning of slab.c and genslab.c):
>
> Slab
> ----
> * suitable for fixed-length allocations (other pallocs fail)
> * much simpler than AllocSet (no global freelist management etc.)
> * free space is tracked per block (using a simple bitmap)
> * which allows freeing the block once all chunks are freed (AllocSet
> will hold the memory forever, in the hope of reusing it)
>
> GenSlab
> -------
> * suitable for non-fixed-length allocations, but with chunks of mostly
> the same size (initially unknown, the context will tune itself)
> * a combination AllocSet and Slab (or a sequence of Slab allocators)
> * the goal is to do most allocations in Slab context
> * there's always a single 'current' Slab context, and every now and and
> then it's replaced with a new generation (with the chunk size computed
> from recent requests)
> * the AllocSet context is used for chunks too large for current Slab
>

So it's just wrapper around the other two allocators to make this
usecase easier to handle. Do you expect there will be use for the
GenSlab eventually outside of the reoderbuffer?

> So none of this is meant as a universal replacement of AllocSet, but in
> the suitable cases the results seem really promising. For example for
> the simple test query in [1], the performance improvement is this:
>
> N | master | patched
> -----------------------------
> 10000 | 100ms | 100ms
> 50000 | 15000ms | 350ms
> 100000 | 146000ms | 700ms
> 200000 | ? | 1400ms
>
> That's a fairly significant improvement, and the submitted version of
> the patches should perform even better (~2x, IIRC).
>

I agree that it improves performance quite nicely and that reoderbuffer
could use this.

About the code. I am not quite sure that this needs to be split into two
patches especially if 1/3 of the second patch is the removal of the code
added by the first one and otherwise it's quite small and
straightforward. That is unless you expect the GenSlab to not go in.

Slab:
In general it seems understandable, the initial description helps to
understand what's happening well enough.

One thing I don't understand however is why the freelist is both array
and doubly linked list and why there is new implementation of said
doubly linked list given that we have dlist.

> +/*
> + * SlabContext is our standard implementation of MemoryContext.
> + *

Really?

> +/*
> + * SlabChunk
> + * The prefix of each piece of memory in an SlabBlock
> + *
> + * NB: this MUST match StandardChunkHeader as defined by utils/memutils.h.
> + */

Is this true? Why? And if it is then why doesn't the SlabChunk actually
match the StandardChunkHeader?

> +#define SlabPointerIsValid(pointer) PointerIsValid(pointer)

What's the point of this given that it's defined in the .c file?

> +static void *
> +SlabAlloc(MemoryContext context, Size size)
> +{
> + Slab set = (Slab) context;
> + SlabBlock block;
> + SlabChunk chunk;
> + int idx;
> +
> + AssertArg(SlabIsValid(set));
> +
> + Assert(size == set->chunkSize);

I wonder if there should be stronger protection (ie, elog) for the size
matching.

> +static void *
> +SlabRealloc(MemoryContext context, void *pointer, Size size)
> +{
> + Slab set = (Slab)context;
> +
> + /* can't do actual realloc with slab, but let's try to be gentle */
> + if (size == set->chunkSize)
> + return pointer;
> +
> + /* we can't really do repalloc with this allocator */
> + Assert(false);

This IMHO should definitely be elog.

> +static void
> +SlabCheck(MemoryContext context)
> +{
> + /* FIXME */
> +}

Do you plan to implement this interface?

> +#define SLAB_DEFAULT_BLOCK_SIZE 8192
> +#define SLAB_LARGE_BLOCK_SIZE (8 * 1024 * 1024)

I am guessing this is based on max_cached_tuplebufs? Maybe these could
be written with same style?

GenSlab:

Since this is relatively simple wrapper it looks mostly ok to me. The
only issue I have here is that I am not quite sure about those FIXME
functions (Free, Realloc, GetChunkSpace). It's slightly weird to call to
mcxt but I guess the alternative there is to error out so this is
probably preferable. Would want to hear other opinions here.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To:	Petr Jelinek <petr(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgreSQL(dot)org
Subject:	Re: PATCH: two slab-like memory allocators
Date:	2016-09-25 20:17:41
Message-ID:	68b837a3-ce55-f9a2-f685-9eafac536f21@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 09/25/2016 08:48 PM, Petr Jelinek wrote:
> Hi Tomas,
>
> On 02/08/16 17:44, Tomas Vondra wrote:
>>
>> This patch actually includes two new memory allocators (not one). Very
>> brief summary (for more detailed explanation of the ideas, see comments
>> at the beginning of slab.c and genslab.c):
>>
>> Slab
>> ----
>> * suitable for fixed-length allocations (other pallocs fail)
>> * much simpler than AllocSet (no global freelist management etc.)
>> * free space is tracked per block (using a simple bitmap)
>> * which allows freeing the block once all chunks are freed (AllocSet
>> will hold the memory forever, in the hope of reusing it)
>>
>> GenSlab
>> -------
>> * suitable for non-fixed-length allocations, but with chunks of mostly
>> the same size (initially unknown, the context will tune itself)
>> * a combination AllocSet and Slab (or a sequence of Slab allocators)
>> * the goal is to do most allocations in Slab context
>> * there's always a single 'current' Slab context, and every now and and
>> then it's replaced with a new generation (with the chunk size computed
>> from recent requests)
>> * the AllocSet context is used for chunks too large for current Slab
>>
>
> So it's just wrapper around the other two allocators to make this
> usecase easier to handle. Do you expect there will be use for the
> GenSlab eventually outside of the reoderbuffer?
>

Yes, you might say it's just a wrapper around the other two allocators,
but it *also* includes the logic of recomputing chunk size etc.

I haven't thought about other places that might benefit from these new
allocators very much - in general, it's useful for places that produce a
stream of equally-sized items (GenSlab relaxes this), that are pfreed()
in ~FIFO manner (i.e. roughly in the order of allocation).

>> So none of this is meant as a universal replacement of AllocSet,
>> but in the suitable cases the results seem really promising. For
>> example for the simple test query in [1], the performance
>> improvement is this:
>>
>> N | master | patched
>> -----------------------------
>> 10000 | 100ms | 100ms
>> 50000 | 15000ms | 350ms
>> 100000 | 146000ms | 700ms
>> 200000 | ? | 1400ms
>>
>> That's a fairly significant improvement, and the submitted version
>> of the patches should perform even better (~2x, IIRC).
>>
>
> I agree that it improves performance quite nicely and that
> reoderbuffer could use this.
>
> About the code. I am not quite sure that this needs to be split into
> two patches especially if 1/3 of the second patch is the removal of
> the code added by the first one and otherwise it's quite small and
> straightforward. That is unless you expect the GenSlab to not go in.
>

I don't know - it seemed natural to first introduce the Slab, as it's
easier to discuss it separately, and it works for 2 of the 3 contexts
needed in reorderbuffer.

GenSlab is an improvement of Slab, or rather based on it, so that it
works for the third context. And it introduces some additional ideas
(particularly the generational design, etc.)

Of course, none of this means it has to be committed in two separate
chunks, or that I don't expect GenSlab to get committed ...

> Slab:
> In general it seems understandable, the initial description helps to
> understand what's happening well enough.
>
> One thing I don't understand however is why the freelist is both
> array and doubly linked list and why there is new implementation of
> said doubly linked list given that we have dlist.
>

Hmm, perhaps that should be explained better.

In AllocSet, we only have a global freelist of chunks, i.e. we have a
list of free chunks for each possible size (there's 11 sizes, starting
with 8 bytes and then doubling the size). So freelist[0] is a list of
free 8B chunks, freelist[1] is a list of free 16B chunks, etc.

In Slab, the freelist has two levels - first there's a bitmap on each
block (which is possible, as the chunks have constant size), tracking
which chunks of that particular block are free. This makes it trivial to
check that all chunks on the block are free, and free the whole block
(which is impossible with AllocSet).

Second, the freelist at the context level tracks blocks with a given
number of free chunks - so freelist[0] tracks completely full blocks,
freelist[1] is a list of blocks with 1 free chunk, etc. This is used to
reuse space on almost full blocks first, in the hope that some of the
less full blocks will get completely empty (and freed to the OS).

Is that clear?

>> +/*
>> + * SlabContext is our standard implementation of MemoryContext.
>> + *
>
> Really?
>

Meh, that's clearly a bogus comment.

>> +/*
>> + * SlabChunk
>> + * The prefix of each piece of memory in an SlabBlock
>> + *
>> + * NB: this MUST match StandardChunkHeader as defined by utils/memutils.h.
>> + */
>
> Is this true? Why? And if it is then why doesn't the SlabChunk
> actually match the StandardChunkHeader?

It is true, a lot of stuff in MemoryContext infrastructure relies on
that. For example when you do pfree(ptr), we actually do something like

header = (StandardChunkHeader*)(ptr - sizeof(StandardChunkHeader))

to get the chunk header - which includes pointer to the memory context
and other useful stuff.

This also means we can put additional fields before StandardChunkHeader
as that does not break this pointer arithmetic, i.e. SlabChunkData is
effectively defined like this:

typedef struct SlabChunkData
{
/* block owning this chunk */
void *block;

/* standard header */
StandardChunkHeader header;
} SlabChunkData;

>
>> +#define SlabPointerIsValid(pointer) PointerIsValid(pointer)
>
> What's the point of this given that it's defined in the .c file?
>

Meh, I've copied this from aset.c, but I see it's useless there too.

>
>> +static void *
>> +SlabAlloc(MemoryContext context, Size size)
>> +{
>> + Slab set = (Slab) context;
>> + SlabBlock block;
>> + SlabChunk chunk;
>> + int idx;
>> +
>> + AssertArg(SlabIsValid(set));
>> +
>> + Assert(size == set->chunkSize);
>
> I wonder if there should be stronger protection (ie, elog) for the size
> matching.
>

Perhaps, I'm not opposed to that.

>
>> +static void *
>> +SlabRealloc(MemoryContext context, void *pointer, Size size)
>> +{
>> + Slab set = (Slab)context;
>> +
>> + /* can't do actual realloc with slab, but let's try to be gentle */
>> + if (size == set->chunkSize)
>> + return pointer;
>> +
>> + /* we can't really do repalloc with this allocator */
>> + Assert(false);
>
> This IMHO should definitely be elog.
>

Yeah, you're probably right.

>
>> +static void
>> +SlabCheck(MemoryContext context)
>> +{
>> + /* FIXME */
>> +}
>
> Do you plan to implement this interface?
>

Yes, although I'm not sure what checks should go there. The only thing I
can think of right now is checking that the number of free chunks on a
block (according to the bitmap) matches the freelist index.

>> +#define SLAB_DEFAULT_BLOCK_SIZE 8192
>> +#define SLAB_LARGE_BLOCK_SIZE (8 * 1024 * 1024)
>
> I am guessing this is based on max_cached_tuplebufs? Maybe these
> could be written with same style?
>

Not sure I understand what you mean by "based on"? I don't quite
remember how I came up with those constants, but I guess 8kB and 8MB
seemed like good values.

Also, what style you mean? I've used the same style as for ALLOCSET_*
constants in the same file.

> GenSlab:
>
> Since this is relatively simple wrapper it looks mostly ok to me. The
> only issue I have here is that I am not quite sure about those FIXME
> functions (Free, Realloc, GetChunkSpace). It's slightly weird to call to
> mcxt but I guess the alternative there is to error out so this is
> probably preferable. Would want to hear other opinions here.
>

Yeah, I'd like to get some opinions on that too - that's why I left
there the FIXMEs, actually.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

From:	Petr Jelinek <petr(at)2ndquadrant(dot)com>
To:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgreSQL(dot)org
Subject:	Re: PATCH: two slab-like memory allocators
Date:	2016-09-25 21:05:34
Message-ID:	885540a7-ac72-5a21-6dae-a4d5b50672da@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 25/09/16 22:17, Tomas Vondra wrote:
> On 09/25/2016 08:48 PM, Petr Jelinek wrote:
>
>> Slab:
>> In general it seems understandable, the initial description helps to
>> understand what's happening well enough.
>>
>> One thing I don't understand however is why the freelist is both
>> array and doubly linked list and why there is new implementation of
>> said doubly linked list given that we have dlist.
>>
>
> Hmm, perhaps that should be explained better.
>
> In AllocSet, we only have a global freelist of chunks, i.e. we have a
> list of free chunks for each possible size (there's 11 sizes, starting
> with 8 bytes and then doubling the size). So freelist[0] is a list of
> free 8B chunks, freelist[1] is a list of free 16B chunks, etc.
>
> In Slab, the freelist has two levels - first there's a bitmap on each
> block (which is possible, as the chunks have constant size), tracking
> which chunks of that particular block are free. This makes it trivial to
> check that all chunks on the block are free, and free the whole block
> (which is impossible with AllocSet).
>
> Second, the freelist at the context level tracks blocks with a given
> number of free chunks - so freelist[0] tracks completely full blocks,
> freelist[1] is a list of blocks with 1 free chunk, etc. This is used to
> reuse space on almost full blocks first, in the hope that some of the
> less full blocks will get completely empty (and freed to the OS).
>
> Is that clear?

Ah okay, makes sense, the documentation of this could be improved then
though as it's all squashed into single sentence that wasn't quite clear
for me.

>
>>> +/*
>>> + * SlabChunk
>>> + * The prefix of each piece of memory in an SlabBlock
>>> + *
>>> + * NB: this MUST match StandardChunkHeader as defined by
>>> utils/memutils.h.
>>> + */
>>
>> Is this true? Why? And if it is then why doesn't the SlabChunk
>> actually match the StandardChunkHeader?
>
> It is true, a lot of stuff in MemoryContext infrastructure relies on
> that. For example when you do pfree(ptr), we actually do something like
>
> header = (StandardChunkHeader*)(ptr - sizeof(StandardChunkHeader))
>
> to get the chunk header - which includes pointer to the memory context
> and other useful stuff.
>
> This also means we can put additional fields before StandardChunkHeader
> as that does not break this pointer arithmetic, i.e. SlabChunkData is
> effectively defined like this:
>
> typedef struct SlabChunkData
> {
> /* block owning this chunk */
> void *block;
>
> /* standard header */
> StandardChunkHeader header;
> } SlabChunkData;
>

Yes but your struct then does not match StandardChunkHeader exactly so
it should be explained in more detail (the aset.c where this comment is
also present has struct that matches StandardChunkHeader so it's
sufficient there).

>>
>>> +static void
>>> +SlabCheck(MemoryContext context)
>>> +{
>>> + /* FIXME */
>>> +}
>>
>> Do you plan to implement this interface?
>>
>
> Yes, although I'm not sure what checks should go there. The only thing I
> can think of right now is checking that the number of free chunks on a
> block (according to the bitmap) matches the freelist index.
>

Yeah this context does not seem like it needs too much checking. The
freelist vs free chunks check sounds ok to me. I guess GenSlab then will
call the checks for the underlying contexts.

>>> +#define SLAB_DEFAULT_BLOCK_SIZE 8192
>>> +#define SLAB_LARGE_BLOCK_SIZE (8 * 1024 * 1024)
>>
>> I am guessing this is based on max_cached_tuplebufs? Maybe these
>> could be written with same style?
>>
>
> Not sure I understand what you mean by "based on"? I don't quite
> remember how I came up with those constants, but I guess 8kB and 8MB
> seemed like good values.
>
> Also, what style you mean? I've used the same style as for ALLOCSET_*
> constants in the same file.
>

I mean using 8 * 1024 for SLAB_DEFAULT_BLOCK_SIZE so that it's more
readable. ALLOCSET_* does that too (with exception of
ALLOCSET_SEPARATE_THRESHOLD which I have no idea why it's different from
rest of the code).

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To:	Petr Jelinek <petr(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgreSQL(dot)org
Subject:	Re: PATCH: two slab-like memory allocators
Date:	2016-09-27 02:10:17
Message-ID:	03110c11-ea72-7b4d-027d-6f19eb300b2a@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

Attached is v2 of the patch, updated based on the review. That means:

- Better comment explaining how free chunks are tracked in Slab context.

- Removed the unused SlabPointerIsValid macro.

- Modified the comment before SlabChunkData, explaining how it relates
to StandardChunkHeader.

- Replaced the two Assert() calls with elog().

- Implemented SlabCheck(). I've ended up with quite a few checks there,
checking pointers between the context, block and chunks, checks due
to MEMORY_CONTEXT_CHECKING etc. And of course, cross-checking the
number of free chunks (bitmap, freelist vs. chunk header).

- I've also modified SlabContextCreate() to compute chunksPerBlock a
bit more efficiently (use a simple formula instead of the loop, which
might be a bit too expensive for large blocks / small chunks).

I haven't done any changes to GenSlab, but I do have a few notes:

Firstly, I've realized there's an issue when chunkSize gets too large -
once it exceeds blockSize, the SlabContextCreate() fails as it's
impossible to place a single chunk into the block. In reorderbuffer,
this may happen when the tuples (allocated in tup_context) get larger
than 8MB, as the context uses SLAB_LARGE_BLOCK_SIZE (which is 8MB).

For Slab the elog(ERROR) is fine as both parameters are controlled by
the developer directly, but GenSlab computes the chunkSize on the fly,
so we must not let it fail like that - that'd result in unpredictable
failures, which is not very nice.

I see two ways to fix this. We may either increase the block size
automatically - e.g. instead of specifying specifying chunkSize and
blockSize when creating the Slab, specify chunkSize and chunksPerBlock
(and then choose the smallest 2^k block large enough). For example with
chunkSize=96 and chunksPerBlock=1000, we'd get 128kB blocks, as that's
the closest 2^k block larger than 96000 bytes.

But maybe there's a simpler solution - we may simply cap the chunkSize
(in GenSlab) to ALLOC_CHUNK_LIMIT. That's fine, because AllocSet handles
those requests in a special way - for example instead of tracking them
in freelist, those chunks got freed immediately.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment	Content-Type	Size
0001-simple-slab-allocator-fixed-size-allocations.patch	binary/octet-stream	39.7 KB
0002-generational-slab-auto-tuning-allocator.patch	binary/octet-stream	16.2 KB

From:	John Gorman <johngorman2(at)gmail(dot)com>
To:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc:	Petr Jelinek <petr(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: PATCH: two slab-like memory allocators
Date:	2016-10-01 22:23:16
Message-ID:	CALkS6B9W0xHS25OOq2sWhUWtS79XuPiiAOFpsh6F8v56uAvy_Q@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

I reproduced the quadradic pfree performance problem and verified that
these patches solved it.

The slab.c data structures and functions contain no quadradic components.

I noticed the sizing loop in SlabContextCreate() and came up with
a similar formula to determine chunksPerBlock that you arrived at.

> Firstly, I've realized there's an issue when chunkSize gets too
> large - once it exceeds blockSize, the SlabContextCreate() fails
> as it's impossible to place a single chunk into the block. In
> reorderbuffer, this may happen when the tuples (allocated in
> tup_context) get larger than 8MB, as the context uses
> SLAB_LARGE_BLOCK_SIZE (which is 8MB).
>
> But maybe there's a simpler solution - we may simply cap the
> chunkSize (in GenSlab) to ALLOC_CHUNK_LIMIT. That's fine, because
> AllocSet handles those requests in a special way - for example
> instead of tracking them in freelist, those chunks got freed
> immediately.

I like this approach because it fixes the performance problems
with smaller allocations and doesn't change how larger
allocations are handled.

In slab.c it looks like a line in the top comments could be clearer.
Perhaps this is what is meant.

< * (plus alignment), now wasting memory.
> * (plus alignment), not wasting memory.

In slab.c some lines are over 80 characters could be folded.

It would be nice to give each patch version a unique file name.

Nice patch, I enjoyed reading it!

Best, John

John Gorman

On Mon, Sep 26, 2016 at 10:10 PM, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com
> wrote:

> Hi,
>
> Attached is v2 of the patch, updated based on the review. That means:
>
> - Better comment explaining how free chunks are tracked in Slab context.
>
> - Removed the unused SlabPointerIsValid macro.
>
> - Modified the comment before SlabChunkData, explaining how it relates
> to StandardChunkHeader.
>
> - Replaced the two Assert() calls with elog().
>
> - Implemented SlabCheck(). I've ended up with quite a few checks there,
> checking pointers between the context, block and chunks, checks due
> to MEMORY_CONTEXT_CHECKING etc. And of course, cross-checking the
> number of free chunks (bitmap, freelist vs. chunk header).
>
> - I've also modified SlabContextCreate() to compute chunksPerBlock a
> bit more efficiently (use a simple formula instead of the loop, which
> might be a bit too expensive for large blocks / small chunks).
>
>
> I haven't done any changes to GenSlab, but I do have a few notes:
>
> Firstly, I've realized there's an issue when chunkSize gets too large -
> once it exceeds blockSize, the SlabContextCreate() fails as it's impossible
> to place a single chunk into the block. In reorderbuffer, this may happen
> when the tuples (allocated in tup_context) get larger than 8MB, as the
> context uses SLAB_LARGE_BLOCK_SIZE (which is 8MB).
>
> For Slab the elog(ERROR) is fine as both parameters are controlled by the
> developer directly, but GenSlab computes the chunkSize on the fly, so we
> must not let it fail like that - that'd result in unpredictable failures,
> which is not very nice.
>
> I see two ways to fix this. We may either increase the block size
> automatically - e.g. instead of specifying specifying chunkSize and
> blockSize when creating the Slab, specify chunkSize and chunksPerBlock (and
> then choose the smallest 2^k block large enough). For example with
> chunkSize=96 and chunksPerBlock=1000, we'd get 128kB blocks, as that's the
> closest 2^k block larger than 96000 bytes.
>
> But maybe there's a simpler solution - we may simply cap the chunkSize (in
> GenSlab) to ALLOC_CHUNK_LIMIT. That's fine, because AllocSet handles those
> requests in a special way - for example instead of tracking them in
> freelist, those chunks got freed immediately.
>
>
> regards
>
> --
> Tomas Vondra http://www.2ndQuadrant.com
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>

From:	Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>
To:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, <pgsql-hackers(at)postgreSQL(dot)org>
Subject:	Re: PATCH: two slab-like memory allocators
Date:	2016-10-01 23:53:16
Message-ID:	e2f2e9ac-e13b-5d1b-b3c1-1799f86024bc@BlueTreble.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 9/26/16 9:10 PM, Tomas Vondra wrote:
> Attached is v2 of the patch, updated based on the review. That means:

+ /* make sure the block can store at least one chunk (with 1B for a
bitmap)? */
(and the comment below it)

I find the question to be confusing... I think these would be better as

+ /* make sure the block can store at least one chunk (with 1B for a
bitmap) */
and
+ /* number of chunks can we fit into a block, including header and
bitmap */

I'm also wondering if a 1B freespace bitmap is actually useful. If
nothing else, there should probably be a #define for the initial bitmap
size.

+ /* otherwise add it to the proper freelist bin */
Looks like something went missing... :)

Should zeroing the block in SlabAlloc be optional like it is with
palloc/palloc0?

+ /*
+ * If there are no free chunks in any existing block, create a new block
+ * and put it to the last freelist bucket.
+ */
+ if (set->minFreeCount == 0)
Couldn't there be blocks that have a free count > minFreeCount? (In
which case you don't want to just alloc a new block, no?)

Nevermind, after reading further down I understand what's going on. I
got confused by "we've decreased nfree for a block with the minimum
count" until I got to the part that deals with minFreeCount = 0. I think
it'd be helpful if the "we've decreased nfree" comment mentioned that if
nfree is 0 we'll look for the correct value after updating the freelists.

+ block->bitmapptr[idx/8] |= (0x01 << (idx % 8));
Did you consider using a pre-defined map of values to avoid the shift? I
know I've seen that somewhere in the code...

Patch 2...

Doesn't GenSlabReset() need to actually free something? If the call
magically percolates to the other contexts it'd be nice to note that in
the comment.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532) mobile: 512-569-9461

From:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To:	Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgreSQL(dot)org
Subject:	Re: PATCH: two slab-like memory allocators
Date:	2016-10-02 00:34:12
Message-ID:	e400bfa9-bec6-0760-4e92-6416c605a001@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 10/02/2016 01:53 AM, Jim Nasby wrote:
> On 9/26/16 9:10 PM, Tomas Vondra wrote:
>> Attached is v2 of the patch, updated based on the review. That means:
>
> + /* make sure the block can store at least one chunk (with 1B for a
> bitmap)? */
> (and the comment below it)
>
> I find the question to be confusing... I think these would be better as
>
> + /* make sure the block can store at least one chunk (with 1B for a
> bitmap) */
> and
> + /* number of chunks can we fit into a block, including header and
> bitmap */
>

Thanks, will rephrase the comments a bit.

> I'm also wondering if a 1B freespace bitmap is actually useful. If
> nothing else, there should probably be a #define for the initial
> bitmap size.

That's not the point. The point is that we need to store at least one
entry per block, with one bit in a bitmap. But we can't store just a
single byte - we always allocate at least 1 byte. So it's more an
explanation for the "1" literal in the check, nothing more.

>
> + /* otherwise add it to the proper freelist bin */
> Looks like something went missing... :)
>

Ummm? The patch contains this:

+ /* otherwise add it to the proper freelist bin */
+ if (set->freelist[block->nfree])
+ set->freelist[block->nfree]->prev = block;
+
+ block->next = set->freelist[block->nfree];
+ set->freelist[block->nfree] = block;

Which does exactly the thing it should do. Or what is missing?

>
> Should zeroing the block in SlabAlloc be optional like it is with
> palloc/palloc0?
>

Good catch. The memset(,0) should not be in SlabAlloc() as all, as the
zeroing is handled in mctx.c, independently of the implementation.

> + /*
> + * If there are no free chunks in any existing block, create a new
> block
> + * and put it to the last freelist bucket.
> + */
> + if (set->minFreeCount == 0)
> Couldn't there be blocks that have a free count > minFreeCount? (In
> which case you don't want to just alloc a new block, no?)
>
> Nevermind, after reading further down I understand what's going on. I
> got confused by "we've decreased nfree for a block with the minimum
> count" until I got to the part that deals with minFreeCount = 0. I think
> it'd be helpful if the "we've decreased nfree" comment mentioned that if
> nfree is 0 we'll look for the correct value after updating the freelists.
>

Right. I think it'd be good to add an assert ensuring the minFreeCount
value matches the block freelist. Or at least SlabCheck() could check this.

> + block->bitmapptr[idx/8] |= (0x01 << (idx % 8));
> Did you consider using a pre-defined map of values to avoid the shift? I
> know I've seen that somewhere in the code...
>
> Patch 2...
>
> Doesn't GenSlabReset() need to actually free something? If the call
> magically percolates to the other contexts it'd be nice to note that in
> the comment.

No, the other contexts are created as children of GenSlab, so the memory
context infrastructure resets them automatically. I don't think this
needs to be mentioned in the comments - it's pretty basic part of the
parent-child context relationship.

Thanks!

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

From:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To:	John Gorman <johngorman2(at)gmail(dot)com>
Cc:	Petr Jelinek <petr(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: PATCH: two slab-like memory allocators
Date:	2016-10-02 00:36:39
Message-ID:	9b74fc50-940f-8ca5-36e4-32010fcde2c4@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 10/02/2016 12:23 AM, John Gorman wrote:
> I reproduced the quadradic pfree performance problem and verified
> that these patches solved it.
>
> The slab.c data structures and functions contain no quadradic
> components.
>
> I noticed the sizing loop in SlabContextCreate() and came up with a
> similar formula to determine chunksPerBlock that you arrived at.
>

;-)

>> Firstly, I've realized there's an issue when chunkSize gets too
>> large - once it exceeds blockSize, the SlabContextCreate() fails
>> as it's impossible to place a single chunk into the block. In
>> reorderbuffer, this may happen when the tuples (allocated in
>> tup_context) get larger than 8MB, as the context uses
>> SLAB_LARGE_BLOCK_SIZE (which is 8MB).
>>
>> But maybe there's a simpler solution - we may simply cap the
>> chunkSize (in GenSlab) to ALLOC_CHUNK_LIMIT. That's fine, because
>> AllocSet handles those requests in a special way - for example
>> instead of tracking them in freelist, those chunks got freed
>> immediately.
>
> I like this approach because it fixes the performance problems
> with smaller allocations and doesn't change how larger
> allocations are handled.
>

Right.

> In slab.c it looks like a line in the top comments could be clearer.
> Perhaps this is what is meant.
>
> < * (plus alignment), now wasting memory.
>> * (plus alignment), not wasting memory.
>
> In slab.c some lines are over 80 characters could be folded.
>
> It would be nice to give each patch version a unique file name.
>

OK, will fix.

> Nice patch, I enjoyed reading it!
>

;-)

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

From:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To:	John Gorman <johngorman2(at)gmail(dot)com>
Cc:	Petr Jelinek <petr(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: PATCH: two slab-like memory allocators
Date:	2016-10-02 01:15:40
Message-ID:	e342a3f5-6fd2-0c1f-ca12-f6b1dd7ad88f@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 10/02/2016 12:23 AM, John Gorman wrote:
> I reproduced the quadradic pfree performance problem and verified that
> these patches solved it.
>
> The slab.c data structures and functions contain no quadradic components.
>
> I noticed the sizing loop in SlabContextCreate() and came up with
> a similar formula to determine chunksPerBlock that you arrived at.
>
>> Firstly, I've realized there's an issue when chunkSize gets too
>> large - once it exceeds blockSize, the SlabContextCreate() fails
>> as it's impossible to place a single chunk into the block. In
>> reorderbuffer, this may happen when the tuples (allocated in
>> tup_context) get larger than 8MB, as the context uses
>> SLAB_LARGE_BLOCK_SIZE (which is 8MB).
>>
>> But maybe there's a simpler solution - we may simply cap the
>> chunkSize (in GenSlab) to ALLOC_CHUNK_LIMIT. That's fine, because
>> AllocSet handles those requests in a special way - for example
>> instead of tracking them in freelist, those chunks got freed
>> immediately.
>
> I like this approach because it fixes the performance problems
> with smaller allocations and doesn't change how larger
> allocations are handled.
>

One more comment about GenSlab, particularly about unpredictability of
the repalloc() behavior. It works for "large" chunks allocated in the
AllocSet part, and mostly does not work for "small" chunks allocated in
the SlabContext. Moreover, the chunkSize changes over time, so for two
chunks of the same size, repalloc() may work on one of them and fail on
the other, depending on time of allocation.

With SlabContext it's perfectly predictable - repalloc() call fails
unless the chunk size is exactly the same as before (which is perhaps a
bit pointless, but if we decide to fail even in this case it'll work
100% time).

But with GenSlabContext it's unclear whether the call fails or not.

I don't like this unpredictability - I'd much rather have consistent
failures (making sure people don't do repalloc() on with GenSlab). But I
don't see a nice way to achieve that - the repalloc() call does not go
through GenSlabRealloc() at all, but directly to SlabRealloc() or
AllocSetRealloc().

The best solution I can think of is adding an alternate version of
AllocSetMethods, pointing to a different AllocSetReset implementation.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc:	John Gorman <johngorman2(at)gmail(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: PATCH: two slab-like memory allocators
Date:	2016-10-03 01:44:16
Message-ID:	CAB7nPqRSeXqEFXtvrivVMfmvjSapkKR=1xvY80OkuhP+a9TGSQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sun, Oct 2, 2016 at 10:15 AM, Tomas Vondra
<tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
> One more comment about GenSlab, particularly about unpredictability of the
> repalloc() behavior. It works for "large" chunks allocated in the AllocSet
> part, and mostly does not work for "small" chunks allocated in the
> SlabContext. Moreover, the chunkSize changes over time, so for two chunks of
> the same size, repalloc() may work on one of them and fail on the other,
> depending on time of allocation.
>
> With SlabContext it's perfectly predictable - repalloc() call fails unless
> the chunk size is exactly the same as before (which is perhaps a bit
> pointless, but if we decide to fail even in this case it'll work 100% time).
>
> But with GenSlabContext it's unclear whether the call fails or not.
>
> I don't like this unpredictability - I'd much rather have consistent
> failures (making sure people don't do repalloc() on with GenSlab). But I
> don't see a nice way to achieve that - the repalloc() call does not go
> through GenSlabRealloc() at all, but directly to SlabRealloc() or
> AllocSetRealloc().
>
> The best solution I can think of is adding an alternate version of
> AllocSetMethods, pointing to a different AllocSetReset implementation.

You guys are still playing with this patch, so moved to next CF with
"waiting on author".
--
Michael

From:	John Gorman <johngorman2(at)gmail(dot)com>
To:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc:	Petr Jelinek <petr(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: PATCH: two slab-like memory allocators
Date:	2016-10-04 19:44:27
Message-ID:	CALkS6B8hisTEJJ7ZCQ_GiBLNVzKfsevAD+92rYN6myyFK=UQtw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

SlabContext has a parent context. It can delegate requests that
it cannot handle to the parent context which is GenSlab. Genslab
can send them to the sister aset context.

This could handle all reallocs so there will be no surprises.

Remind me again why we cannot realloc in place for sizes
smaller than chunkSize? GenSlab is happy to allocate smaller
sizes and put them into the fixed size chunks.

Maybe SlabAlloc can be happy with sizes up to chunkSize.

if (size <= set->chunkSize)
return MemoryContextAlloc(set->slab, size);
else
return MemoryContextAlloc(set->aset, size);

On Sat, Oct 1, 2016 at 10:15 PM, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
wrote:

> On 10/02/2016 12:23 AM, John Gorman wrote:
>
>> I reproduced the quadradic pfree performance problem and verified that
>> these patches solved it.
>>
>> The slab.c data structures and functions contain no quadradic components.
>>
>> I noticed the sizing loop in SlabContextCreate() and came up with
>> a similar formula to determine chunksPerBlock that you arrived at.
>>
>> Firstly, I've realized there's an issue when chunkSize gets too
>>> large - once it exceeds blockSize, the SlabContextCreate() fails
>>> as it's impossible to place a single chunk into the block. In
>>> reorderbuffer, this may happen when the tuples (allocated in
>>> tup_context) get larger than 8MB, as the context uses
>>> SLAB_LARGE_BLOCK_SIZE (which is 8MB).
>>>
>>> But maybe there's a simpler solution - we may simply cap the
>>> chunkSize (in GenSlab) to ALLOC_CHUNK_LIMIT. That's fine, because
>>> AllocSet handles those requests in a special way - for example
>>> instead of tracking them in freelist, those chunks got freed
>>> immediately.
>>>
>>
>> I like this approach because it fixes the performance problems
>> with smaller allocations and doesn't change how larger
>> allocations are handled.
>>
>>
> One more comment about GenSlab, particularly about unpredictability of the
> repalloc() behavior. It works for "large" chunks allocated in the AllocSet
> part, and mostly does not work for "small" chunks allocated in the
> SlabContext. Moreover, the chunkSize changes over time, so for two chunks
> of the same size, repalloc() may work on one of them and fail on the other,
> depending on time of allocation.
>
> With SlabContext it's perfectly predictable - repalloc() call fails unless
> the chunk size is exactly the same as before (which is perhaps a bit
> pointless, but if we decide to fail even in this case it'll work 100% time).
>
> But with GenSlabContext it's unclear whether the call fails or not.
>
> I don't like this unpredictability - I'd much rather have consistent
> failures (making sure people don't do repalloc() on with GenSlab). But I
> don't see a nice way to achieve that - the repalloc() call does not go
> through GenSlabRealloc() at all, but directly to SlabRealloc() or
> AllocSetRealloc().
>
> The best solution I can think of is adding an alternate version of
> AllocSetMethods, pointing to a different AllocSetReset implementation.
>
>
> regards
>
> --
> Tomas Vondra http://www.2ndQuadrant.com
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>

From:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To:	John Gorman <johngorman2(at)gmail(dot)com>
Cc:	Petr Jelinek <petr(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: PATCH: two slab-like memory allocators
Date:	2016-10-05 01:11:23
Message-ID:	11821eb9-95a5-353e-ccba-13e8e9351e62@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 10/04/2016 09:44 PM, John Gorman wrote:
> SlabContext has a parent context. It can delegate requests that
> it cannot handle to the parent context which is GenSlab. Genslab
> can send them to the sister aset context.
>

But Slab may also be used separately, not just as part of GenSlab
(actually, reorderbuffer has two such contexts). That complicates things
quite a bit, and it also seems a bit awkward, because:

(a) It'd require a flag in SlabContext (or perhaps a pointer to the
second context), which introduces coupling between the contexts.

(b) SlabContext was meant to be extremely simple (based on the "single
chunk size" idea), and this contradicts that a bit.

(c) It'd move chunks between the memory contexts in unpredictable ways
(although the user should treat it as a single context, and not reset
the parts independently for example).

>
> This could handle all reallocs so there will be no surprises.
>

Yeah, but it's also

>
> Remind me again why we cannot realloc in place for sizes
> smaller than chunkSize? GenSlab is happy to allocate smaller
> sizes and put them into the fixed size chunks.
>
> Maybe SlabAlloc can be happy with sizes up to chunkSize.
>
> if (size <= set->chunkSize)
> return MemoryContextAlloc(set->slab, size);
> else
> return MemoryContextAlloc(set->aset, size);
>

That'd be certainly possible, but it seems a bit strange as the whole
Slab is based on the idea that all chunks have the same size. Moreover,
people usually realloc() to a larger chunk, so it does not really fix
anything in practice.

In GenSlab, the situation is more complicated. But I don't like the
coupling / moving chunks between contexts, etc.

If realloc() support is a hard requirement, it immediately rules out
SlabContext() as an independent memory context. Which seems stupid, as
independent Slab contexts are quite useful for reorderbuffer use case.

For GenSlab the situation is less clear, as there probably are ways to
make it work, but I'd vote to keep it simple for now, and simply do
elog(ERROR) in the realloc() methods - both for Slab and GenSlab. The
current use case (reorderbuffer) does not need that, and it seems like a
can of worms to me.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

From:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To:	John Gorman <johngorman2(at)gmail(dot)com>
Cc:	Petr Jelinek <petr(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: PATCH: two slab-like memory allocators
Date:	2016-10-05 04:22:19
Message-ID:	b8ded0aa-2515-7ffd-162b-8f515df5eca1@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

attached is v3 of the patches, with a few minor fixes in Slab, and much
larger fixes in GenSlab.

Slab (minor fixes)
------------------
- Removed the unnecessary memset() of new blocks in SlabAlloc(),
although we still need to zero the free bitmap at the end of the block.
- Renamed minFreeCount to minFreeChunks, added a few comments explaining
why/how firstFreeChunk and minFreeChunks are maintained.
- Fixed / improved a bunch of additional comments, based on feedback.

GenSlab
-------
Fixed a bunch of bugs that made GenSlab utterly useless. Firstly,
chunkSize was not stored in GenSlabContextCreate(), so this check in
SlabAlloc()

if (size <= set->chunkSize)
return MemoryContextAlloc(set->slab, set->chunkSize);
else
return MemoryContextAlloc(set->aset, size);

always fell through to the set->aset case, not allocating stuff in the
Slab at all.

Secondly, nallocations / nbytes counters were not updated at all, so the
Slab was never recreated, so GenSlab was not really generational.

This only affected 1 of 3 contexts in ReorderBuffer, but apparently
those "important ones" to affect performance.

Both issues are fixed in the attached v3, which also introduces two
additional improvements discussed in this thread:

- The chunk size is limited by ALLOCSET_SEPARATE_THRESHOLD, as for large
chunks AllocSet works just fine (thanks to keeping them out of free list
etc.)

- Instead of specifying blockSize and chunkSize, GenSlabCreate() now
accepts three parameters - minBlockSize, minChunkCount and chunkSize,
and computes the minimum block size (>= minBlockSize), sufficient to
store minChunkCount chunks, each chunkSize bytes. This works much better
in the auto-tuning scenario.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment	Content-Type	Size
slab-allocators-v3.tgz	application/x-compressed-tar	16.4 KB

From:	Petr Jelinek <petr(at)2ndquadrant(dot)com>
To:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, John Gorman <johngorman2(at)gmail(dot)com>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: PATCH: two slab-like memory allocators
Date:	2016-10-05 04:26:43
Message-ID:	2a7c720f-9934-8306-7247-ba5271511e56@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 05/10/16 03:11, Tomas Vondra wrote:
> On 10/04/2016 09:44 PM, John Gorman wrote:
>>
>> Remind me again why we cannot realloc in place for sizes
>> smaller than chunkSize? GenSlab is happy to allocate smaller
>> sizes and put them into the fixed size chunks.
>>
>> Maybe SlabAlloc can be happy with sizes up to chunkSize.
>>
>> if (size <= set->chunkSize)
>> return MemoryContextAlloc(set->slab, size);
>> else
>> return MemoryContextAlloc(set->aset, size);
>>
>
> That'd be certainly possible, but it seems a bit strange as the whole
> Slab is based on the idea that all chunks have the same size. Moreover,
> people usually realloc() to a larger chunk, so it does not really fix
> anything in practice.
>
> In GenSlab, the situation is more complicated. But I don't like the
> coupling / moving chunks between contexts, etc.
>
> If realloc() support is a hard requirement, it immediately rules out
> SlabContext() as an independent memory context. Which seems stupid, as
> independent Slab contexts are quite useful for reorderbuffer use case.
>
> For GenSlab the situation is less clear, as there probably are ways to
> make it work, but I'd vote to keep it simple for now, and simply do
> elog(ERROR) in the realloc() methods - both for Slab and GenSlab. The
> current use case (reorderbuffer) does not need that, and it seems like a
> can of worms to me.
>

Hmm, so this in practice means that the caller still has to know the
details of what chunks go where. I would prefer if the realloc just
failed always and "don't do realloc on GenSlab" would be part of spec of
hat context, the randomness that you described originally is the main
problem IMHO. Maybe you could add new "constructor" function for Aset
that would create Aset which can't realloc for use inside the GenSlab?

Alternative would be of course having the individual API calls behind
Aset and Slab exported and used by GenSlab directly instead of using
child contexts. Then all the calls would go to GenSlab which could
decide what to do (and move the chunks between the allocators).

But honestly given the usecases for GenSlab, I would at the moment
prefer just to have predictable error as it can be done more cleanly and
nobody needs the functionality so far, it can be revisited once we
actually do need it.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	John Gorman <johngorman2(at)gmail(dot)com>
To:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc:	Petr Jelinek <petr(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: PATCH: two slab-like memory allocators
Date:	2016-10-05 10:40:43
Message-ID:	CALkS6B_=b7i4-TH-8CuDWZ98TZz=zU07F-Z81kKPgg-kKgcwBg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Oct 4, 2016 at 10:11 PM, Tomas Vondra

For GenSlab the situation is less clear, as there probably are ways to make
> it work, but I'd vote to keep it simple for now, and simply do elog(ERROR)
> in the realloc() methods - both for Slab and GenSlab. The current use case
> (reorderbuffer) does not need that, and it seems like a can of worms to me.

Good plan. Realloc can be added later if there is an actual use case.

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc:	John Gorman <johngorman2(at)gmail(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: PATCH: two slab-like memory allocators
Date:	2016-10-18 20:25:51
Message-ID:	CA+Tgmob7u4thiBbdvyseQk1aVApJeG10C_rFFt8LPV03-UzD7g@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Oct 5, 2016 at 12:22 AM, Tomas Vondra
<tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
> attached is v3 of the patches, with a few minor fixes in Slab, and much
> larger fixes in GenSlab.
>
> Slab (minor fixes)
> ------------------
> - Removed the unnecessary memset() of new blocks in SlabAlloc(), although we
> still need to zero the free bitmap at the end of the block.
> - Renamed minFreeCount to minFreeChunks, added a few comments explaining
> why/how firstFreeChunk and minFreeChunks are maintained.
> - Fixed / improved a bunch of additional comments, based on feedback.

I had a look at 0001 today, but it seems to me that it still needs
work. It's still got a lot of remnants of where you've
copy-and-pasted aset.c. I dispute this allegation:

+ * SlabContext is our standard implementation of MemoryContext.

And all of this is just a direct copy-paste; I don't really want two copies:

+ * When running under Valgrind, we want a NOACCESS memory region both before
+ * and after the allocation. The chunk header is tempting as the preceding
+ * region, but mcxt.c expects to able to examine the standard chunk header
+ * fields. Therefore, we use, when available, the requested_size field and
+ * any subsequent padding. requested_size is made NOACCESS before returning
+ * a chunk pointer to a caller. However, to reduce client request traffic,
+ * it is kept DEFINED in chunks on the free list.

And then there's this:

+#ifdef HAVE_ALLOCINFO
+#define SlabFreeInfo(_cxt, _chunk) \
+ fprintf(stderr, "AllocFree: %s: %p, %d\n", \
+ (_cxt)->header.name, (_chunk), (_chunk)->size)
+#define SlabAllocInfo(_cxt, _chunk) \
+ fprintf(stderr, "AllocAlloc: %s: %p, %d\n", \
+ (_cxt)->header.name, (_chunk), (_chunk)->size)

Well, it's pretty stupid that AllocSetAlloc is reporting it's name as
AllocAlloc, a think that, as far as I can tell, is not real. But
having this new context type also pretend to be AllocAlloc is even
dumber.

+static void
+randomize_mem(char *ptr, size_t size)
+{
+ static int save_ctr = 1;
+ size_t remaining = size;
+ int ctr;
+
+ ctr = save_ctr;
+ VALGRIND_MAKE_MEM_UNDEFINED(ptr, size);
+ while (remaining-- > 0)
+ {
+ *ptr++ = ctr;
+ if (++ctr > 251)
+ ctr = 1;
+ }
+ VALGRIND_MAKE_MEM_UNDEFINED(ptr - size, size);
+ save_ctr = ctr;
+}
+#endif /* RANDOMIZE_ALLOCATED_MEMORY */

Another copy of this doesn't seem like a good idea, either.

More broadly, I'm not sure I like this design very much. The whole
point of a slab context is that all of the objects are the same size.
I wouldn't find it too difficult to support this patch if we were
adding an allocator for fixed-size objects that was then being used to
allocate objects which are fixed size. However, what we seem to be
doing is creating an allocator for fixed-size objects and then using
it for variable-size tuples. That's really pretty weird. Isn't the
root of this problem that aset.c is utterly terrible at handling large
number of allocations? Maybe we should try to attack that problem
more directly.

On a related note, the autodestruct thing is a weird hack that's only
necessary because of the hijinks already discussed in the previous
paragraph. The context has no fixed lifetime; we're just trying to
find a way of coping with possibly-shifting tuple sizes over time.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Petr Jelinek <petr(at)2ndquadrant(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc:	John Gorman <johngorman2(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: PATCH: two slab-like memory allocators
Date:	2016-10-18 22:27:39
Message-ID:	a4593432-c989-8ec4-2320-943e3f97ce6d@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 18/10/16 22:25, Robert Haas wrote:
> On Wed, Oct 5, 2016 at 12:22 AM, Tomas Vondra
> <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>> attached is v3 of the patches, with a few minor fixes in Slab, and much
>> larger fixes in GenSlab.
>>
>> Slab (minor fixes)
>> ------------------
>> - Removed the unnecessary memset() of new blocks in SlabAlloc(), although we
>> still need to zero the free bitmap at the end of the block.
>> - Renamed minFreeCount to minFreeChunks, added a few comments explaining
>> why/how firstFreeChunk and minFreeChunks are maintained.
>> - Fixed / improved a bunch of additional comments, based on feedback.
>
> I had a look at 0001 today, but it seems to me that it still needs
> work. It's still got a lot of remnants of where you've
> copy-and-pasted aset.c. I dispute this allegation:
>
> + * SlabContext is our standard implementation of MemoryContext.
>

Are you looking at old version of the patch? I complained about this as
well and Tomas has changed that.

> And then there's this:
>
> +#ifdef HAVE_ALLOCINFO
> +#define SlabFreeInfo(_cxt, _chunk) \
> + fprintf(stderr, "AllocFree: %s: %p, %d\n", \
> + (_cxt)->header.name, (_chunk), (_chunk)->size)
> +#define SlabAllocInfo(_cxt, _chunk) \
> + fprintf(stderr, "AllocAlloc: %s: %p, %d\n", \
> + (_cxt)->header.name, (_chunk), (_chunk)->size)
>
> Well, it's pretty stupid that AllocSetAlloc is reporting it's name as
> AllocAlloc, a think that, as far as I can tell, is not real. But
> having this new context type also pretend to be AllocAlloc is even
> dumber.

You are definitely looking at old version.

>
> More broadly, I'm not sure I like this design very much. The whole
> point of a slab context is that all of the objects are the same size.
> I wouldn't find it too difficult to support this patch if we were
> adding an allocator for fixed-size objects that was then being used to
> allocate objects which are fixed size. However, what we seem to be
> doing is creating an allocator for fixed-size objects and then using
> it for variable-size tuples. That's really pretty weird. Isn't the
> root of this problem that aset.c is utterly terrible at handling large
> number of allocations? Maybe we should try to attack that problem
> more directly.

It's used for TXNs which are fixed and some tuples (there is assumption
that the decoded tuples have more or less normal distribution).

I agree though that the usability beyond the ReoderBuffer is limited
because everything that will want to use it for part of allocations will
get much more complicated by the fact that it will have to use two
different allocators.

I was wondering if rather than trying to implement new allocator we
should maybe implement palloc_fixed which would use some optimized
algorithm for fixed sized objects in our current allocator. The
advantage of that would be that we could for example use that for things
like ListCell easily (memory management of which I see quite often in
profiles).

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To:	Petr Jelinek <petr(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	John Gorman <johngorman2(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: PATCH: two slab-like memory allocators
Date:	2016-10-19 12:51:21
Message-ID:	e88baf4f-afcc-3b48-0524-89a71527f6e3@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 10/19/2016 12:27 AM, Petr Jelinek wrote:
> On 18/10/16 22:25, Robert Haas wrote:
>> On Wed, Oct 5, 2016 at 12:22 AM, Tomas Vondra
>> <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>>> attached is v3 of the patches, with a few minor fixes in Slab, and much
>>> larger fixes in GenSlab.
>>>
>>> Slab (minor fixes)
>>> ------------------
>>> - Removed the unnecessary memset() of new blocks in SlabAlloc(), although we
>>> still need to zero the free bitmap at the end of the block.
>>> - Renamed minFreeCount to minFreeChunks, added a few comments explaining
>>> why/how firstFreeChunk and minFreeChunks are maintained.
>>> - Fixed / improved a bunch of additional comments, based on feedback.
>>
>> I had a look at 0001 today, but it seems to me that it still needs
>> work. It's still got a lot of remnants of where you've
>> copy-and-pasted aset.c. I dispute this allegation:
>>
>> + * SlabContext is our standard implementation of MemoryContext.
>>
>
> Are you looking at old version of the patch? I complained about this as
> well and Tomas has changed that.
>
>> And then there's this:
>>
>> +#ifdef HAVE_ALLOCINFO
>> +#define SlabFreeInfo(_cxt, _chunk) \
>> + fprintf(stderr, "AllocFree: %s: %p, %d\n", \
>> + (_cxt)->header.name, (_chunk), (_chunk)->size)
>> +#define SlabAllocInfo(_cxt, _chunk) \
>> + fprintf(stderr, "AllocAlloc: %s: %p, %d\n", \
>> + (_cxt)->header.name, (_chunk), (_chunk)->size)
>>
>> Well, it's pretty stupid that AllocSetAlloc is reporting it's name as
>> AllocAlloc, a think that, as far as I can tell, is not real. But
>> having this new context type also pretend to be AllocAlloc is even
>> dumber.
>
> You are definitely looking at old version.
>

Yeah.

>>
>> More broadly, I'm not sure I like this design very much. The whole
>> point of a slab context is that all of the objects are the same size.
>> I wouldn't find it too difficult to support this patch if we were
>> adding an allocator for fixed-size objects that was then being used to
>> allocate objects which are fixed size. However, what we seem to be
>> doing is creating an allocator for fixed-size objects and then using
>> it for variable-size tuples. That's really pretty weird. Isn't the
>> root of this problem that aset.c is utterly terrible at handling large
>> number of allocations? Maybe we should try to attack that problem
>> more directly.
>
> It's used for TXNs which are fixed and some tuples (there is
> assumption that the decoded tuples have more or less normal
> distribution).
>

Yeah. There are three contexts in reorder buffers:

- changes (fixed size)
- txns (fixed size)
- tuples (variable size)

The first two work perfectly fine with Slab.

The last one (tuples) is used to allocate variable-sized bits, so I've
tried to come up with something smart - a sequence of Slabs + overflow
AllocSet. I agree that in hindsight it's a bit strange, and that the
"generational" aspect is the key aspect here - i.e. it might be possible
to implement a memory context that allocates variable-length chunks but
still segregates them into generations. That is, don't build this on top
of Slab. That would also fix the issue with two allocators in GenSlab.
I'll think about this.

> I agree though that the usability beyond the ReoderBuffer is limited
> because everything that will want to use it for part of allocations will
> get much more complicated by the fact that it will have to use two
> different allocators.
>

It wasn't my (primary) goal to provide allocators usable outside
ReorderBuffer. I've intended to show that perhaps using AllocSet and
then trying to compensate for the pfree() issues is the wrong direction,
and that perhaps different allocation strategy (exploiting the
ReorderBuffer specifics) would work much better. And I think the two
allocators show prove that.

>
> I was wondering if rather than trying to implement new allocator we
> should maybe implement palloc_fixed which would use some optimized
> algorithm for fixed sized objects in our current allocator. The
> advantage of that would be that we could for example use that for things
> like ListCell easily (memory management of which I see quite often in
> profiles).
>

I don't see how inveting palloc_fixed() solves any of the problems, and
I think it's going to be much more complicated than you think. The idea
of injecting this into AllocSet seems like a dead-end to me, as the code
is already complex enough and it's likely to cause regressions no matter
what you do.

I prefer the idea of implementing separate specialized memory contexts.
If the bar is moved to "implement palloc_fixed()" or something like
that, someone else will have to do that - I'm not all that interested in
ReorderBuffer (this was the first time I actually saw that code), so my
motivation to spend much more time on this is rather small.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

From:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To:	Petr Jelinek <petr(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	John Gorman <johngorman2(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: PATCH: two slab-like memory allocators
Date:	2016-10-20 13:35:22
Message-ID:	ae88ac6a-526d-d7c2-2304-375d74d5696f@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 10/19/2016 02:51 PM, Tomas Vondra wrote:
> ...
>
> Yeah. There are three contexts in reorder buffers:
>
> - changes (fixed size)
> - txns (fixed size)
> - tuples (variable size)
>
> The first two work perfectly fine with Slab.
>
> The last one (tuples) is used to allocate variable-sized bits, so I've
> tried to come up with something smart - a sequence of Slabs + overflow
> AllocSet. I agree that in hindsight it's a bit strange, and that the
> "generational" aspect is the key aspect here - i.e. it might be possible
> to implement a memory context that allocates variable-length chunks but
> still segregates them into generations. That is, don't build this on top
> of Slab. That would also fix the issue with two allocators in GenSlab.
> I'll think about this.

And here is a fairly complete prototype of this idea, adding "Gen"
generational memory context based only on the "similar lifespan"
assumption (and abandoning the fixed-size assumption). It's much simpler
than GenSlab (which it's supposed to replace), and abandoning the idea
of composing two memory contexts also fixed the warts with some of the
API methods (e.g. repalloc).

I've also been thinking that perhaps "Gen" would be useful for all three
contexts in ReorderBuffer - so I've done a quick test comparing the
various combinations (using the test1() function used before).

master slabs slabs+genslab slabs+gen gens
----------------------------------------------------------------
50k 18700 210 220 190 190
100k 160000 380 470 350 350
200k N/A 750 920 740 679
500k N/A 2250 2240 1790 1740
1000k N/A 4600 5000 3910 3700

Where:

* master - 23ed2ba812117
* slabs - all three contexts use Slab (patch 0001)
* slabs+genslab - third context is GenSlab (patch 0002)
* slabs+gen - third context is the new Gen (patch 0003)
* gens - all three contexts use Gen

The results are a bit noisy, but I think it's clear the new Gen context
performs well - it actually seems a bit faster than GenSlab, and using
only Gen for all three contexts does not hurt peformance.

This is most likely due to the trivial (practically absent) freespace
management in Gen context, compared to both Slab and GenSlab. So the
speed is not the only criteria - I haven't measured memory consumption,
but I'm pretty sure there are cases where Slab consumes much less memory
than Gen, thanks to reusing free space.

I'd say throwing away GenSlab and keeping Slab+Gen is the way to go.

There's still a fair bit of work on this, particularly implementing the
missing API methods in Gen - GenCheck() and GenStats(). As Robert
pointed out, there's also quite a bit of duplicated code between the
different memory contexts (randomization and valgrind-related), so this
needs to be moved to a shared place.

I'm also thinking that we need better names, particularly for the Gen
allocator. It's supposed to mean Generational, although there are no
explicit generations anymore. Slab is probably OK - it does not match
any of the existing kernel slab allocators exactly, but it follows the
same principles, which is what matters.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment	Content-Type	Size
slab-allocators-v4.tgz	application/x-compressed-tar	21.5 KB

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Petr Jelinek <petr(at)2ndquadrant(dot)com>
Cc:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, John Gorman <johngorman2(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: PATCH: two slab-like memory allocators
Date:	2016-10-20 14:43:18
Message-ID:	CA+TgmobJrM74FRTcmn2JDMVbLB0a5pj5e3ieLBMzkDNtrxF4dw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Oct 18, 2016 at 6:27 PM, Petr Jelinek <petr(at)2ndquadrant(dot)com> wrote:
> I agree though that the usability beyond the ReoderBuffer is limited
> because everything that will want to use it for part of allocations will
> get much more complicated by the fact that it will have to use two
> different allocators.
>
> I was wondering if rather than trying to implement new allocator we
> should maybe implement palloc_fixed which would use some optimized
> algorithm for fixed sized objects in our current allocator. The
> advantage of that would be that we could for example use that for things
> like ListCell easily (memory management of which I see quite often in
> profiles).

The sb_alloc allocator I proposed a couple of years ago would work
well for this case, I think.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>
Cc:	John Gorman <johngorman2(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: PATCH: two slab-like memory allocators
Date:	2016-10-22 18:30:41
Message-ID:	9818b54a-fcb6-f018-0b16-96d6a1d1da27@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 10/20/2016 04:43 PM, Robert Haas wrote:
> On Tue, Oct 18, 2016 at 6:27 PM, Petr Jelinek <petr(at)2ndquadrant(dot)com> wrote:
>> I agree though that the usability beyond the ReoderBuffer is limited
>> because everything that will want to use it for part of allocations will
>> get much more complicated by the fact that it will have to use two
>> different allocators.
>>
>> I was wondering if rather than trying to implement new allocator we
>> should maybe implement palloc_fixed which would use some optimized
>> algorithm for fixed sized objects in our current allocator. The
>> advantage of that would be that we could for example use that for things
>> like ListCell easily (memory management of which I see quite often in
>> profiles).
>
> The sb_alloc allocator I proposed a couple of years ago would work
> well for this case, I think.
>

Maybe, but it does not follow the Memory Context design at all, if I
understand it correctly. I was willing to give it a spin anyway and see
how it compares to the two other allocators, but this is a significant
paradigm shift and certainly much larger step than what I proposed.

I'm not even sure it's possible to implement a MemoryContext based on
the same ideas as sb_alloc(), because one of the important points of
sb_alloc design seems to be throwing away the chunk header. While that
may be possible, it would certainly affect the whole tree (not just the
reorderbuffer bit), and it'd require way more work.

Moreover, the two allocators I proposed significantly benefit from the
"same lifespan" assumption. I don't think sb_alloc can do that.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

From:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>
Cc:	John Gorman <johngorman2(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: PATCH: two slab-like memory allocators
Date:	2016-10-23 14:26:28
Message-ID:	34a6d4ea-7f3d-a62c-fff1-7a9108c078d7@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 10/22/2016 08:30 PM, Tomas Vondra wrote:
> On 10/20/2016 04:43 PM, Robert Haas wrote:
>>
>> ...
>>
>> The sb_alloc allocator I proposed a couple of years ago would work
>> well for this case, I think.
>>
>
> Maybe, but it does not follow the Memory Context design at all, if I
> understand it correctly. I was willing to give it a spin anyway and see
> how it compares to the two other allocators, but this is a significant
> paradigm shift and certainly much larger step than what I proposed.
>
> I'm not even sure it's possible to implement a MemoryContext based on
> the same ideas as sb_alloc(), because one of the important points of
> sb_alloc design seems to be throwing away the chunk header. While that
> may be possible, it would certainly affect the whole tree (not just the
> reorderbuffer bit), and it'd require way more work.
>
> Moreover, the two allocators I proposed significantly benefit from the
> "same lifespan" assumption. I don't think sb_alloc can do that.
>

I've given the sb_alloc patch another try - essentially hacking it into
reorderbuffer, ignoring the issues mentioned yesterday. And yes, it's
faster than the allocators discussed in this thread. Based on a few very
quick tests on my laptop, the difference is usually ~5-10%.

That might seem like a significant improvement, but it's negligible
compared to the "master -> slab/gen" improvement, which improves
performance by orders of magnitude (at least for the tested cases).

Moreover, the slab/gen allocators proposed here seem like a better fit
for reorderbuffer, e.g. because they release memory. I haven't looked at
sb_alloc too closely, but I think it behaves more like AllocSet in this
regard (i.e. keeping the memory indefinitely).

FWIW I'm not making any conclusions about sb_alloc benefits outside
reorderbuffer.c - it might easily be worth pursuing, no doubt about
that. The amount of remaining work however seems quite high, though.

Attached is the modified sb_alloc patch that I used - it's mostly v1
with removed uses in nbtree etc. FWIW the patch does not implement
sb_destroy_private_allocator (it's only defined in the header), which
seems like a bug.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment	Content-Type	Size
sballoc-v2-tomas.patch	text/x-diff	137.6 KB

From:	Petr Jelinek <petr(at)2ndquadrant(dot)com>
To:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	John Gorman <johngorman2(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: PATCH: two slab-like memory allocators
Date:	2016-10-23 15:26:18
Message-ID:	f2aaf4a5-412c-a4b9-389d-c9f682fbdc77@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 23/10/16 16:26, Tomas Vondra wrote:
> On 10/22/2016 08:30 PM, Tomas Vondra wrote:
>> On 10/20/2016 04:43 PM, Robert Haas wrote:
>>>
>>> ...
>>>
>>> The sb_alloc allocator I proposed a couple of years ago would work
>>> well for this case, I think.
>>>
>>
>> Maybe, but it does not follow the Memory Context design at all, if I
>> understand it correctly. I was willing to give it a spin anyway and see
>> how it compares to the two other allocators, but this is a significant
>> paradigm shift and certainly much larger step than what I proposed.
>>
>> I'm not even sure it's possible to implement a MemoryContext based on
>> the same ideas as sb_alloc(), because one of the important points of
>> sb_alloc design seems to be throwing away the chunk header. While that
>> may be possible, it would certainly affect the whole tree (not just the
>> reorderbuffer bit), and it'd require way more work.
>>
>> Moreover, the two allocators I proposed significantly benefit from the
>> "same lifespan" assumption. I don't think sb_alloc can do that.
>>
>
> I've given the sb_alloc patch another try - essentially hacking it into
> reorderbuffer, ignoring the issues mentioned yesterday. And yes, it's
> faster than the allocators discussed in this thread. Based on a few very
> quick tests on my laptop, the difference is usually ~5-10%.
>
> That might seem like a significant improvement, but it's negligible
> compared to the "master -> slab/gen" improvement, which improves
> performance by orders of magnitude (at least for the tested cases).
>
> Moreover, the slab/gen allocators proposed here seem like a better fit
> for reorderbuffer, e.g. because they release memory. I haven't looked at
> sb_alloc too closely, but I think it behaves more like AllocSet in this
> regard (i.e. keeping the memory indefinitely).
>

For reorderbuffer, from what I've seen in practice, I'd prefer proper
freeing to 5% performance gain as I seen walsenders taking GBs of memory
dues to reoderbuffer allocations that are never properly freed.

About your actual patch. I do like both the Slab and the Gen allocators
and think that we should proceed with them for the moment. You
definitely need to rename the Gen one (don't ask me to what though) as
it sounds like "generic" and do some finishing touches but I think it's
the way to go. I don't see any point in GenSlab anymore.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To:	Petr Jelinek <petr(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	John Gorman <johngorman2(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: PATCH: two slab-like memory allocators
Date:	2016-10-25 21:48:10
Message-ID:	4f176a6c-c197-165d-dee4-5de8f279f1f4@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 10/23/2016 05:26 PM, Petr Jelinek wrote:
> On 23/10/16 16:26, Tomas Vondra wrote:
>> On 10/22/2016 08:30 PM, Tomas Vondra wrote:
>> ...
>> Moreover, the slab/gen allocators proposed here seem like a better
>> fit for reorderbuffer, e.g. because they release memory. I haven't
>> looked at sb_alloc too closely, but I think it behaves more like
>> AllocSet in this regard (i.e. keeping the memory indefinitely).
>>
>
> For reorderbuffer, from what I've seen in practice, I'd prefer
> proper freeing to 5% performance gain as I seen walsenders taking GBs
> of memory dues to reoderbuffer allocations that are never properly
> freed.
>

Right.

>
> About your actual patch. I do like both the Slab and the Gen allocators
> and think that we should proceed with them for the moment. You
> definitely need to rename the Gen one (don't ask me to what though) as
> it sounds like "generic" and do some finishing touches but I think it's
> the way to go. I don't see any point in GenSlab anymore.
>

Attached is a v5 of the patch that does this i.e. throws away the
GenSlab allocator and modifies reorderbuffer in two steps.

First (0001) it adds Slab allocator for TXN/Change allocations, and
keeps the local slab cache for TupleBuf allocations (with a separate
AllocSet context).

Then (in 0002) it adds the Gen allocator for TupleBuf, removing the last
bits of the local slab cache.

I do think this version is is as simple as it gets - there's not much
more we could simplify / remove.

The main issue that bugs me is the name of the Gen allocator, but I
don't have a good naming ideas :( The basic characteristics of Gen is
that it does not reuse space released by pfree(), relying on the fact
that the whole block will become free. That should be reflected in the
name somehow, I guess.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment	Content-Type	Size
slab-allocators-v5.tgz	application/x-compressed-tar	18.3 KB

From:	Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>
To:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, <pgsql-hackers(at)postgreSQL(dot)org>
Subject:	Re: PATCH: two slab-like memory allocators
Date:	2016-10-25 22:54:47
Message-ID:	b5d04d86-5a89-457c-e82f-c52b8ebc6be6@BlueTreble.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 10/1/16 7:34 PM, Tomas Vondra wrote:
>> + /* otherwise add it to the proper freelist bin */
>> Looks like something went missing... :)
>>
>
> Ummm? The patch contains this:
>
> + /* otherwise add it to the proper freelist bin */
> + if (set->freelist[block->nfree])
> + set->freelist[block->nfree]->prev = block;
> +
> + block->next = set->freelist[block->nfree];
> + set->freelist[block->nfree] = block;
>
> Which does exactly the thing it should do. Or what is missing?

What's confusing is the "otherwise" right at the beginning of the function:

+static void
+add_to_freelist(Slab set, SlabBlock block)
+{
+ /* otherwise add it to the proper freelist bin */
+ if (set->freelist[block->nfree])
+ set->freelist[block->nfree]->prev = block;
+
+ block->next = set->freelist[block->nfree];
+ set->freelist[block->nfree] = block;
+}

Otherwise what? What's the other option?

(Haven't looked at the newer patch, so maybe this isn't an issue anymore.)
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532) mobile: 512-569-9461

From:	Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>
To:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	John Gorman <johngorman2(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: PATCH: two slab-like memory allocators
Date:	2016-10-25 22:57:59
Message-ID:	08542375-63f8-db53-daa0-f73745fdaa00@BlueTreble.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 10/25/16 4:48 PM, Tomas Vondra wrote:
> The main issue that bugs me is the name of the Gen allocator, but I
> don't have a good naming ideas :( The basic characteristics of Gen is
> that it does not reuse space released by pfree(), relying on the fact
> that the whole block will become free. That should be reflected in the
> name somehow, I guess.

OneTime? OneUse? OneShot? AllocOnce?

OneHitWonder? ;P
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532) mobile: 512-569-9461

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc:	Petr Jelinek <petr(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, John Gorman <johngorman2(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: PATCH: two slab-like memory allocators
Date:	2016-11-12 19:12:57
Message-ID:	20161112191257.3a4n6kzegvu637qn@alap3.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

Subject: [PATCH 1/2] slab allocator

diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 6ad7e7d..520f295 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c

I'd rather have that in a separate followup commit...

+ * IDENTIFICATION
+ * src/backend/utils/mmgr/slab.c
+ *
+ *
+ * The constant allocation size allows significant simplification and various
+ * optimizations that are not possible in AllocSet. Firstly, we can get rid
+ * of the doubling and carve the blocks into chunks of exactly the right size
+ * (plus alignment), not wasting memory.

References to AllocSet aren't necessarily a good idea, they'll quite
possibly get out of date. The argument can be quite easily be made
without referring to a concrete reference to behaviour elsewhere.

+ *
+ * At the context level, we use 'freelist' array to track blocks grouped by
+ * number of free chunks. For example freelist[0] is a list of completely full
+ * blocks, freelist[1] is a block with a single free chunk, etc.

Hm. Those arrays are going to be quite large for small allocations w/
big blocks (an imo sensible combination). Maybe it'd make more sense to
model it as a linked list of blocks? Full blocks are at one end, empty
ones at the other?

+ * About MEMORY_CONTEXT_CHECKING:
+ *
+ * Since we usually round request sizes up to the next power of 2, there
+ * is often some unused space immediately after a requested data
area.

I assume the "round up" stuff is copy-paste?

+ * Thus, if someone makes the common error of writing past what they've
+ * requested, the problem is likely to go unnoticed ... until the day when
+ * there *isn't* any wasted space, perhaps because of different memory
+ * alignment on a new platform, or some other effect. To catch this sort
+ * of problem, the MEMORY_CONTEXT_CHECKING option stores 0x7E just beyond
+ * the requested space whenever the request is less than the actual chunk
+ * size, and verifies that the byte is undamaged when the chunk is freed.
+ *
+ *
+ * About USE_VALGRIND and Valgrind client requests:
+ *
+ * Valgrind provides "client request" macros that exchange information with
+ * the host Valgrind (if any). Under !USE_VALGRIND, memdebug.h stubs out
+ * currently-used macros.
+ *
+ * When running under Valgrind, we want a NOACCESS memory region both before
+ * and after the allocation. The chunk header is tempting as the preceding
+ * region, but mcxt.c expects to able to examine the standard chunk header
+ * fields. Therefore, we use, when available, the requested_size field and
+ * any subsequent padding. requested_size is made NOACCESS before returning
+ * a chunk pointer to a caller. However, to reduce client request traffic,
+ * it is kept DEFINED in chunks on the free list.
+ *
+ * The rounded-up capacity of the chunk usually acts as a post-allocation
+ * NOACCESS region. If the request consumes precisely the entire chunk,
+ * there is no such region; another chunk header may immediately follow. In
+ * that case, Valgrind will not detect access beyond the end of the chunk.
+ *
+ * See also the cooperating Valgrind client requests in mcxt.c.

I think we need a preliminary patch moving a lot of this into something
like mcxt_internal.h. Copying this comment, and a lot of the utility
functions, into every memory context implementation is a bad pattern.

+typedef struct SlabBlockData *SlabBlock; /* forward reference */
+typedef struct SlabChunkData *SlabChunk;

Can we please not continue hiding pointers behind typedefs? It's a bad
pattern, and that it's fairly widely used isn't a good excuse to
introduce further usages of it.

+/*
+ * SlabContext is a specialized implementation of MemoryContext.
+ */
+typedef struct SlabContext
+{
+ MemoryContextData header; /* Standard memory-context fields */
+ /* Allocation parameters for this context: */
+ Size chunkSize; /* chunk size */
+ Size fullChunkSize; /* chunk size including header and alignment */
+ Size blockSize; /* block size */
+ int chunksPerBlock; /* number of chunks per block */
+ int minFreeChunks; /* min number of free chunks in any block */
+ int nblocks; /* number of blocks allocated */
+ /* Info about storage allocated in this context: */
+ SlabBlock freelist[1]; /* free lists (block-level) */

I assume this is a variable-length array? If so, that a) should be
documented b) use FLEXIBLE_ARRAY_MEMBER as length - not doing so
actually will cause compiler warnings and potential misoptimizations.

+/*
+ * SlabBlockData
+ * Structure of a single block in SLAB allocator.
+ *
+ * slab: context owning this block

What do we need this for?

+ * prev, next: used for doubly-linked list of blocks in global freelist

I'd prefer using an embedded list here (cf. ilist.h).

+/*
+ * SlabChunk
+ * The prefix of each piece of memory in an SlabBlock
+ *
+ * NB: this MUST match StandardChunkHeader as defined by utils/memutils.h.
+ * However it's possible to fields in front of the StandardChunkHeader fields,
+ * which is used to add pointer to the block.
+ */

Wouldn't that be easier to enforce - particularly around alignment
requirements - by embedding a StandardChunkHeader here? That'd also
avoid redundancies.

+/* ----------
+ * Debug macros
+ * ----------
+ */
+#ifdef HAVE_ALLOCINFO
+#define SlabFreeInfo(_cxt, _chunk) \
+ fprintf(stderr, "SlabFree: %s: %p, %d\n", \
+ (_cxt)->header.name, (_chunk), (_chunk)->size)
+#define SlabAllocInfo(_cxt, _chunk) \
+ fprintf(stderr, "SlabAlloc: %s: %p, %d\n", \
+ (_cxt)->header.name, (_chunk), (_chunk)->size)
+#else
+#define SlabFreeInfo(_cxt, _chunk)
+#define SlabAllocInfo(_cxt, _chunk)
+#endif

Do we really have to copy that stuff from aset.c? Obviously no-one uses
that, since it doesn't even compile cleanly if HAVE_ALLOCINFO is
defined:
/home/andres/src/postgresql/src/backend/utils/mmgr/aset.c:302:20: warning: format ‘%d’ expects argument of type ‘int’, but argument 5 has type ‘Size {aka long unsigned int}’ [-Wformat=]
fprintf(stderr, "AllocAlloc: %s: %p, %d\n", \

+#ifdef CLOBBER_FREED_MEMORY
+
+/* Wipe freed memory for debugging purposes */
+static void
+wipe_mem(void *ptr, size_t size)

+#ifdef MEMORY_CONTEXT_CHECKING
+static void
+set_sentinel(void *base, Size offset)
+
+static bool
+sentinel_ok(const void *base, Size offset)
+#endif

+#ifdef RANDOMIZE_ALLOCATED_MEMORY
+static void
+randomize_mem(char *ptr, size_t size)

Let's move these into an mcxt_internal.h or mcxt_impl.h or such, as
static inlines.

+MemoryContext
+SlabContextCreate(MemoryContext parent,
+ const char *name,
+ Size blockSize,
+ Size chunkSize)
+{
+ int chunksPerBlock;
+ Size fullChunkSize;
+ Slab set;
+
+ /* chunk, including SLAB header (both addresses nicely aligned) */
+ fullChunkSize = MAXALIGN(sizeof(SlabChunkData) + MAXALIGN(chunkSize));

+ /* make sure the block can store at least one chunk (plus a bitmap) */
+ if (blockSize - sizeof(SlabChunkData) < fullChunkSize + 1)
+ elog(ERROR, "block size %ld for slab is too small for chunks %ld",
+ blockSize, chunkSize);

I assume the 1 is the bitmap size?

+ /* so how many chunks can we fit into a block, including header and bitmap? */
+ chunksPerBlock
+ = (8 * (blockSize - sizeof(SlabBlockData)) - 7) / (8 * fullChunkSize + 1);

I'm slightly drunk due to bad airline wine, but right now that seems a
bit odd and/or documentation worthy. I understand the (xxx + 7) / 8
pattern elsewhere, but right now I can't follow the - 7.

+/*
+ * SlabAlloc
+ * Returns pointer to allocated memory of given size or NULL if
+ * request could not be completed; memory is added to the set.
+ *
+ * No request may exceed:
+ * MAXALIGN_DOWN(SIZE_MAX) - SLAB_BLOCKHDRSZ - SLAB_CHUNKHDRSZ
+ * All callers use a much-lower limit.

That seems like a meaningless comment in the context of a slab allocator
with a fixed size.

+ /*
+ * If there are no free chunks in any existing block, create a new block
+ * and put it to the last freelist bucket.
+ *
+ * (set->minFreeChunks == 0) means there are no blocks with free chunks,
+ * thanks to how minFreeChunks is updated at the end of SlabAlloc().
+ */
+ if (set->minFreeChunks == 0)
+ {
+ block = (SlabBlock)malloc(set->blockSize);

Space after cast - maybe run pgindent over the file before submission?
Doing that manually helps to avoid ugly damange by the per-release run
later. I'm pretty sure there'll be a significant number of changes.

+ if (block->nfree == 0)
+ block->firstFreeChunk = set->chunksPerBlock;
+ else
+ {
+ /* look for the next free chunk in the block, after the first one */
+ while ((++block->firstFreeChunk) < set->chunksPerBlock)
+ {
+ int byte = block->firstFreeChunk / 8;
+ int bit = block->firstFreeChunk % 8;
+
+ /* stop when you find 0 (unused chunk) */
+ if (! (block->bitmapptr[byte] & (0x01 << bit)))
+ break;
+ }

I haven't profiled (or even compiled) this patchset yet, but FWIW, in
the tuple deforming code, I could measure a noticeable speedup by
accessing bitmap-bytes in the native word-size, rather than char. I'm
*NOT* saying you should do that, but if this ever shows up as a
bottlneck, it might be worthwhile to optimize.

+ /*
+ * And finally update minFreeChunks, i.e. the index to the block with the
+ * lowest number of free chunks. We only need to do that when the block
+ * got full (otherwise we know the current block is the right one).
+ * We'll simply walk the freelist until we find a non-empty entry.
+ */
+ if (set->minFreeChunks == 0)
+ for (idx = 1; idx <= set->chunksPerBlock; idx++)
+ if (set->freelist[idx])
+ {
+ set->minFreeChunks = idx;
+ break;
+ }

Yuck. This definitely needs braces.

Regards,

Andres

From:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	Petr Jelinek <petr(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, John Gorman <johngorman2(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: PATCH: two slab-like memory allocators
Date:	2016-11-14 00:20:00
Message-ID:	efd47210-cf95-d7ee-e984-a873f76596a8@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 11/12/2016 08:12 PM, Andres Freund wrote:
> Hi,
>
> Subject: [PATCH 1/2] slab allocator
>
> diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
> index 6ad7e7d..520f295 100644
> --- a/src/backend/replication/logical/reorderbuffer.c
> +++ b/src/backend/replication/logical/reorderbuffer.c
>
> I'd rather have that in a separate followup commit...
>
>
> + * IDENTIFICATION
> + * src/backend/utils/mmgr/slab.c
> + *
> + *
> + * The constant allocation size allows significant simplification and various
> + * optimizations that are not possible in AllocSet. Firstly, we can get rid
> + * of the doubling and carve the blocks into chunks of exactly the right size
> + * (plus alignment), not wasting memory.
>
> References to AllocSet aren't necessarily a good idea, they'll quite
> possibly get out of date. The argument can be quite easily be made
> without referring to a concrete reference to behaviour elsewhere.
>

Yeah, that's probably true.

> + *
> + * At the context level, we use 'freelist' array to track blocks grouped by
> + * number of free chunks. For example freelist[0] is a list of completely full
> + * blocks, freelist[1] is a block with a single free chunk, etc.
>
> Hm. Those arrays are going to be quite large for small allocations w/
> big blocks (an imo sensible combination). Maybe it'd make more sense to
> model it as a linked list of blocks? Full blocks are at one end, empty
> ones at the other?

So there'd be one huge list of blocks, sorted by the number of empty
chunks? Hmm, that might work I guess.

I don't think the combination of large blocks with small allocations is
particularly sensible, though - what exactly would be the benefit of
such combination? I would even consider enforcing some upper limit on
the number of chunks per block - say, 256, for example.

>
>
> + * About MEMORY_CONTEXT_CHECKING:
> + *
> + * Since we usually round request sizes up to the next power of 2, there
> + * is often some unused space immediately after a requested data
> area.
>
> I assume the "round up" stuff is copy-paste?
>

Yeah, sorry about that.

>
> + * Thus, if someone makes the common error of writing past what they've
> + * requested, the problem is likely to go unnoticed ... until the day when
> + * there *isn't* any wasted space, perhaps because of different memory
> + * ...
> + *
> + * See also the cooperating Valgrind client requests in mcxt.c.
>
> I think we need a preliminary patch moving a lot of this into something
> like mcxt_internal.h. Copying this comment, and a lot of the utility
> functions, into every memory context implementation is a bad pattern.
>

Yes, I planned to do that for the next version of patch. Laziness.

>
> +typedef struct SlabBlockData *SlabBlock; /* forward reference */
> +typedef struct SlabChunkData *SlabChunk;
>
> Can we please not continue hiding pointers behind typedefs? It's a bad
> pattern, and that it's fairly widely used isn't a good excuse to
> introduce further usages of it.
>

Why is it a bad pattern?

> +/*
> + * SlabContext is a specialized implementation of MemoryContext.
> + */
> +typedef struct SlabContext
> +{
> + MemoryContextData header; /* Standard memory-context fields */
> + /* Allocation parameters for this context: */
> + Size chunkSize; /* chunk size */
> + Size fullChunkSize; /* chunk size including header and alignment */
> + Size blockSize; /* block size */
> + int chunksPerBlock; /* number of chunks per block */
> + int minFreeChunks; /* min number of free chunks in any block */
> + int nblocks; /* number of blocks allocated */
> + /* Info about storage allocated in this context: */
> + SlabBlock freelist[1]; /* free lists (block-level) */
>
> I assume this is a variable-length array? If so, that a) should be
> documented b) use FLEXIBLE_ARRAY_MEMBER as length - not doing so
> actually will cause compiler warnings and potential misoptimizations.
>

Will fix, thanks.

> +/*
> + * SlabBlockData
> + * Structure of a single block in SLAB allocator.
> + *
> + * slab: context owning this block
>
> What do we need this for?
>

You're right the pointer to the owning context is unnecessary - there's
nothing like "standard block header" and we already have the pointer in
the standard chunk header. But maybe keeping the pointer at least with
MEMORY_CONTEXT_CHECKING would be a good idea?

> + * prev, next: used for doubly-linked list of blocks in global freelist
>
> I'd prefer using an embedded list here (cf. ilist.h).
>

Makes sense.

> +/*
> + * SlabChunk
> + * The prefix of each piece of memory in an SlabBlock
> + *
> + * NB: this MUST match StandardChunkHeader as defined by utils/memutils.h.
> + * However it's possible to fields in front of the StandardChunkHeader fields,
> + * which is used to add pointer to the block.
> + */
>
> Wouldn't that be easier to enforce - particularly around alignment
> requirements - by embedding a StandardChunkHeader here? That'd also
> avoid redundancies.
>

Also makes sense.

> +/* ----------
> + * Debug macros
> + * ----------
> + */
> +#ifdef HAVE_ALLOCINFO
> +#define SlabFreeInfo(_cxt, _chunk) \
> + fprintf(stderr, "SlabFree: %s: %p, %d\n", \
> + (_cxt)->header.name, (_chunk), (_chunk)->size)
> +#define SlabAllocInfo(_cxt, _chunk) \
> + fprintf(stderr, "SlabAlloc: %s: %p, %d\n", \
> + (_cxt)->header.name, (_chunk), (_chunk)->size)
> +#else
> +#define SlabFreeInfo(_cxt, _chunk)
> +#define SlabAllocInfo(_cxt, _chunk)
> +#endif
>
> Do we really have to copy that stuff from aset.c? Obviously no-one uses
> that, since it doesn't even compile cleanly if HAVE_ALLOCINFO is
> defined:
> /home/andres/src/postgresql/src/backend/utils/mmgr/aset.c:302:20: warning: format ‘%d’ expects argument of type ‘int’, but argument 5 has type ‘Size {aka long unsigned int}’ [-Wformat=]
> fprintf(stderr, "AllocAlloc: %s: %p, %d\n", \
>

I don't really care. Sure, we should fix the warning, but not supporting
HAVE_ALLOCINFO in the new allocator(s) seems wrong - we should either
support it everywhere, or we should rip it out. That's not the purpose
of this patch, though.

>
> +#ifdef CLOBBER_FREED_MEMORY
> +
> +/* Wipe freed memory for debugging purposes */
> +static void
> +wipe_mem(void *ptr, size_t size)
>
> +#ifdef MEMORY_CONTEXT_CHECKING
> +static void
> +set_sentinel(void *base, Size offset)
> +
> +static bool
> +sentinel_ok(const void *base, Size offset)
> +#endif
>
> +#ifdef RANDOMIZE_ALLOCATED_MEMORY
> +static void
> +randomize_mem(char *ptr, size_t size)
>
> Let's move these into an mcxt_internal.h or mcxt_impl.h or such, as
> static inlines.
>

Yes, next to the valgrind stuff.

> +MemoryContext
> +SlabContextCreate(MemoryContext parent,
> + const char *name,
> + Size blockSize,
> + Size chunkSize)
> +{
> + int chunksPerBlock;
> + Size fullChunkSize;
> + Slab set;
> +
> + /* chunk, including SLAB header (both addresses nicely aligned) */
> + fullChunkSize = MAXALIGN(sizeof(SlabChunkData) + MAXALIGN(chunkSize));
>
> + /* make sure the block can store at least one chunk (plus a bitmap) */
> + if (blockSize - sizeof(SlabChunkData) < fullChunkSize + 1)
> + elog(ERROR, "block size %ld for slab is too small for chunks %ld",
> + blockSize, chunkSize);
>
> I assume the 1 is the bitmap size?
>

Yes, the smallest bitmap is 1 byte.

>
> + /* so how many chunks can we fit into a block, including header and bitmap? */
> + chunksPerBlock
> + = (8 * (blockSize - sizeof(SlabBlockData)) - 7) / (8 * fullChunkSize + 1);
>
> I'm slightly drunk due to bad airline wine, but right now that seems a
> bit odd and/or documentation worthy. I understand the (xxx + 7) / 8
> pattern elsewhere, but right now I can't follow the - 7.
>

We need all the bits (header, chunks and bitmap) to fit onto the block,
so this needs to hold:

blockSize >= sizeof(SlabBlockData) +
chunksPerBlock * fullChunkSize +
(chunksPerBlock + 7) / 8

solve for 'chunksPerBlock' and you'll get the above formula. Moving the
7 to the other side of the inequality is the reason for the minus.

But documenting this is probably a good idea.

> +/*
> + * SlabAlloc
> + * Returns pointer to allocated memory of given size or NULL if
> + * request could not be completed; memory is added to the set.
> + *
> + * No request may exceed:
> + * MAXALIGN_DOWN(SIZE_MAX) - SLAB_BLOCKHDRSZ - SLAB_CHUNKHDRSZ
> + * All callers use a much-lower limit.
>
> That seems like a meaningless comment in the context of a slab allocator
> with a fixed size.
>

Why? It might be worth moving this to SlabContextCreate though.

>
> + /*
> + * If there are no free chunks in any existing block, create a new block
> + * and put it to the last freelist bucket.
> + *
> + * (set->minFreeChunks == 0) means there are no blocks with free chunks,
> + * thanks to how minFreeChunks is updated at the end of SlabAlloc().
> + */
> + if (set->minFreeChunks == 0)
> + {
> + block = (SlabBlock)malloc(set->blockSize);
>
> Space after cast - maybe run pgindent over the file before submission?
> Doing that manually helps to avoid ugly damange by the per-release run
> later. I'm pretty sure there'll be a significant number of changes.
>

Will do.

>
>
> + if (block->nfree == 0)
> + block->firstFreeChunk = set->chunksPerBlock;
> + else
> + {
> + /* look for the next free chunk in the block, after the first one */
> + while ((++block->firstFreeChunk) < set->chunksPerBlock)
> + {
> + int byte = block->firstFreeChunk / 8;
> + int bit = block->firstFreeChunk % 8;
> +
> + /* stop when you find 0 (unused chunk) */
> + if (! (block->bitmapptr[byte] & (0x01 << bit)))
> + break;
> + }
>
> I haven't profiled (or even compiled) this patchset yet, but FWIW, in
> the tuple deforming code, I could measure a noticeable speedup by
> accessing bitmap-bytes in the native word-size, rather than char. I'm
> *NOT* saying you should do that, but if this ever shows up as a
> bottlneck, it might be worthwhile to optimize.
>

OK, will try, although I don't expect this branch to be very hot.

> + /*
> + * And finally update minFreeChunks, i.e. the index to the block with the
> + * lowest number of free chunks. We only need to do that when the block
> + * got full (otherwise we know the current block is the right one).
> + * We'll simply walk the freelist until we find a non-empty entry.
> + */
> + if (set->minFreeChunks == 0)
> + for (idx = 1; idx <= set->chunksPerBlock; idx++)
> + if (set->freelist[idx])
> + {
> + set->minFreeChunks = idx;
> + break;
> + }
>
> Yuck. This definitely needs braces.
>

OK ;-)

thanks

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

From:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	Petr Jelinek <petr(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, John Gorman <johngorman2(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: PATCH: two slab-like memory allocators
Date:	2016-11-15 00:44:35
Message-ID:	0b14e90b-1985-bb04-0261-2fcdc28755ab@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Attached is v6 of the patch series, fixing most of the points:

* common bits (valgrind/randomization/wipe) moved to memdebug.h/c

Instead of introducing a new header file, I've added the prototypes to
memdebug.h (which was already used for the valgrind stuff anyway), and
the implementations to a new memdebug.c file. Not sure what you meant by
"static inlines" though.

So the patch series now has three parts - 0001 with memdebug stuff, 0002
with slab and 0003 with gen (still a poor name).

* removing AllocSet references from both new memory contexts

* using FLEXIBLE_ARRAY_ELEMENT in SlabContext

* using dlist instead of the custom linked list

I've however kept SlabContext->freelist as an array, because there may
be many blocks with the same number of free chunks, in which case moving
the block in the list would be expensive. This way it's simply
dlist_delete + dlist_push.

* use StandardChunkHeader instead of the common fields

* removing pointer to context from block header for both contexts

* fix format in FreeInfo/AllocInfo (including for AllocSet)

* improved a bunch of comments (bitmap size, chunksPerBlock formula)

* did a pgindent run on the patch

* implement the missing methods in Gen (Stats/Check)

* fix a few minor bugs in both contexts

I haven't done anything with hiding pointers behind typedefs, because I
don't know what's so wrong about that.

I also haven't done anything with the bitmap access in SlabAlloc - I
haven't found any reasonable case when it would be measurable, and I
don't expect this to be even measurable in practice.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment	Content-Type	Size
slab-allocators-v6.tgz	application/x-compressed-tar	18.2 KB

From:	Petr Jelinek <petr(at)2ndquadrant(dot)com>
To:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, John Gorman <johngorman2(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: PATCH: two slab-like memory allocators
Date:	2016-11-27 18:25:17
Message-ID:	9cdbe694-1b49-0c61-b0b2-d95ef2b3cb31@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 15/11/16 01:44, Tomas Vondra wrote:
> Attached is v6 of the patch series, fixing most of the points:
>
> * common bits (valgrind/randomization/wipe) moved to memdebug.h/c
>
> Instead of introducing a new header file, I've added the prototypes to
> memdebug.h (which was already used for the valgrind stuff anyway), and
> the implementations to a new memdebug.c file. Not sure what you meant by
> "static inlines" though.

I think Andres wanted to put the implementation to the static inline
functions directly in the header (see parts of pg_list or how atomics
work), but I guess you way works too.

>
> I've however kept SlabContext->freelist as an array, because there may
> be many blocks with the same number of free chunks, in which case moving
> the block in the list would be expensive. This way it's simply
> dlist_delete + dlist_push.
>

I get mxact isolation test failures in test_decoding with this version
of patch:
step s0w: INSERT INTO do_write DEFAULT VALUES;
+ WARNING: problem in slab TXN: number of free chunks 33 in block
0x22beba0 does not match bitmap 34
step s0start: SELECT data FROM
pg_logical_slot_get_changes('isolation_slot', NULL, NULL,
'include-xids', 'false');
data
and
step s0alter: ALTER TABLE do_write ADD column ts timestamptz;
step s0w: INSERT INTO do_write DEFAULT VALUES;
+ WARNING: problem in slab TXN: number of free chunks 33 in block
0x227c3f0 does not match bitmap 34
step s0start: SELECT data FROM
pg_logical_slot_get_changes('isolation_slot', NULL, NULL,
'include-xids', 'false');
data

Also, let's just rename the Gen to Generation.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To:	Petr Jelinek <petr(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, John Gorman <johngorman2(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: PATCH: two slab-like memory allocators
Date:	2016-11-27 18:42:58
Message-ID:	98735a6c-c376-86bb-8cb2-1878a0c13bdb@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 11/27/2016 07:25 PM, Petr Jelinek wrote:
> On 15/11/16 01:44, Tomas Vondra wrote:
>> Attached is v6 of the patch series, fixing most of the points:
>>
>> * common bits (valgrind/randomization/wipe) moved to memdebug.h/c
>>
>> Instead of introducing a new header file, I've added the prototypes to
>> memdebug.h (which was already used for the valgrind stuff anyway), and
>> the implementations to a new memdebug.c file. Not sure what you meant by
>> "static inlines" though.
>
> I think Andres wanted to put the implementation to the static inline
> functions directly in the header (see parts of pg_list or how atomics
> work), but I guess you way works too.
>

I see. Well turning that into static inlines just like in pg_list is
possible. I guess the main reason is performance - for pg_list that
probably makes sense, but the memory randomization/valgrind stuff is
only ever used for debugging, which already does a lot of expensive
stuff anyway.

>>
>> I've however kept SlabContext->freelist as an array, because there may
>> be many blocks with the same number of free chunks, in which case moving
>> the block in the list would be expensive. This way it's simply
>> dlist_delete + dlist_push.
>>
>
> +1
>
> I get mxact isolation test failures in test_decoding with this version
> of patch:
> step s0w: INSERT INTO do_write DEFAULT VALUES;
> + WARNING: problem in slab TXN: number of free chunks 33 in block
> 0x22beba0 does not match bitmap 34
> step s0start: SELECT data FROM
> pg_logical_slot_get_changes('isolation_slot', NULL, NULL,
> 'include-xids', 'false');
> data
> and
> step s0alter: ALTER TABLE do_write ADD column ts timestamptz;
> step s0w: INSERT INTO do_write DEFAULT VALUES;
> + WARNING: problem in slab TXN: number of free chunks 33 in block
> 0x227c3f0 does not match bitmap 34
> step s0start: SELECT data FROM
> pg_logical_slot_get_changes('isolation_slot', NULL, NULL,
> 'include-xids', 'false');
> data
>

D'oh! I believe this is a simple thinko in SlabCheck, which iterates
over chunks like this:

for (j = 0; j <= slab->chunksPerBlock; j++)
...

which is of course off-by-one error (and the 33 vs. 34 error message is
consistent with this theory).

>
> Also, let's just rename the Gen to Generation.
>

OK.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc:	Petr Jelinek <petr(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, John Gorman <johngorman2(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: PATCH: two slab-like memory allocators
Date:	2016-11-27 20:47:40
Message-ID:	20161127204740.dr3swudwsbafjuof@alap3.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

> > +typedef struct SlabBlockData *SlabBlock; /* forward reference */
> > +typedef struct SlabChunkData *SlabChunk;
> >
> > Can we please not continue hiding pointers behind typedefs? It's a bad
> > pattern, and that it's fairly widely used isn't a good excuse to
> > introduce further usages of it.
> >
>
> Why is it a bad pattern?

It hides what is passed by reference, and what by value, and it makes it
a guessing game whether you need -> or . since you don't know whether
it's a pointer or the actual object. All to save a * in parameter and
variable declaration?...

Andres

From:	Petr Jelinek <petr(at)2ndquadrant(dot)com>
To:	Andres Freund <andres(at)anarazel(dot)de>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, John Gorman <johngorman2(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: PATCH: two slab-like memory allocators
Date:	2016-11-27 21:21:49
Message-ID:	a4b35a6e-7b8a-1550-1543-e5e77183ab5f@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 27/11/16 21:47, Andres Freund wrote:
> Hi,
>
>>> +typedef struct SlabBlockData *SlabBlock; /* forward reference */
>>> +typedef struct SlabChunkData *SlabChunk;
>>>
>>> Can we please not continue hiding pointers behind typedefs? It's a bad
>>> pattern, and that it's fairly widely used isn't a good excuse to
>>> introduce further usages of it.
>>>
>>
>> Why is it a bad pattern?
>
> It hides what is passed by reference, and what by value, and it makes it
> a guessing game whether you need -> or . since you don't know whether
> it's a pointer or the actual object. All to save a * in parameter and
> variable declaration?...
>

FWIW I don't like that pattern either although it's used in many parts
of our code-base.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Petr Jelinek <petr(at)2ndquadrant(dot)com>
Cc:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, John Gorman <johngorman2(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: PATCH: two slab-like memory allocators
Date:	2016-11-27 22:02:14
Message-ID:	20161127220214.v5fnwccwtxwq5jih@alap3.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2016-11-27 22:21:49 +0100, Petr Jelinek wrote:
> On 27/11/16 21:47, Andres Freund wrote:
> > Hi,
> >
> >>> +typedef struct SlabBlockData *SlabBlock; /* forward reference */
> >>> +typedef struct SlabChunkData *SlabChunk;
> >>>
> >>> Can we please not continue hiding pointers behind typedefs? It's a bad
> >>> pattern, and that it's fairly widely used isn't a good excuse to
> >>> introduce further usages of it.
> >>>
> >>
> >> Why is it a bad pattern?
> >
> > It hides what is passed by reference, and what by value, and it makes it
> > a guessing game whether you need -> or . since you don't know whether
> > it's a pointer or the actual object. All to save a * in parameter and
> > variable declaration?...
> >
>
> FWIW I don't like that pattern either although it's used in many parts
> of our code-base.

But relatively few new ones, most of it is pretty old.

From:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To:	Andres Freund <andres(at)anarazel(dot)de>, Petr Jelinek <petr(at)2ndquadrant(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, John Gorman <johngorman2(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: PATCH: two slab-like memory allocators
Date:	2016-12-01 02:26:56
Message-ID:	40888631-2761-6eaa-e76f-3fdf7a09bd03@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Dne 11/27/2016 v 11:02 PM Andres Freund napsal(a):
> On 2016-11-27 22:21:49 +0100, Petr Jelinek wrote:
>> On 27/11/16 21:47, Andres Freund wrote:
>>> Hi,
>>>
>>>>> +typedef struct SlabBlockData *SlabBlock; /* forward reference */
>>>>> +typedef struct SlabChunkData *SlabChunk;
>>>>>
>>>>> Can we please not continue hiding pointers behind typedefs? It's a bad
>>>>> pattern, and that it's fairly widely used isn't a good excuse to
>>>>> introduce further usages of it.
>>>>>
>>>>
>>>> Why is it a bad pattern?
>>>
>>> It hides what is passed by reference, and what by value, and it makes it
>>> a guessing game whether you need -> or . since you don't know whether
>>> it's a pointer or the actual object. All to save a * in parameter and
>>> variable declaration?...
>>>
>>
>> FWIW I don't like that pattern either although it's used in many
>> parts of our code-base.
>
> But relatively few new ones, most of it is pretty old.
>

I do agree it's not particularly pretty pattern, but in this case it's
fairly isolated in the mmgr sources, and I quite value the consistency
in this part of the code (i.e. that aset.c, slab.c and generation.c all
use the same approach). So I haven't changed this.

The attached v7 fixes the off-by-one error in slab.c, causing failures
in test_decoding isolation tests, and renames Gen to Generation, as
proposed by Petr.

regards
Tomas

Attachment	Content-Type	Size
slab-allocators-v7.tgz	application/x-compressed-tar	19.0 KB

From:	Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>
To:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc:	Andres Freund <andres(at)anarazel(dot)de>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, John Gorman <johngorman2(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: PATCH: two slab-like memory allocators
Date:	2016-12-02 06:50:02
Message-ID:	CAJrrPGf-F_ApFJNwd3xdJk05n6Vg-aCC89h974U-8BX9ONneAQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Dec 1, 2016 at 1:26 PM, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
wrote:

>
>
> Dne 11/27/2016 v 11:02 PM Andres Freund napsal(a):
>
>> On 2016-11-27 22:21:49 +0100, Petr Jelinek wrote:
>>
>>> On 27/11/16 21:47, Andres Freund wrote:
>>>
>>>> Hi,
>>>>
>>>> +typedef struct SlabBlockData *SlabBlock; /* forward
>>>>>> reference */
>>>>>> +typedef struct SlabChunkData *SlabChunk;
>>>>>>
>>>>>> Can we please not continue hiding pointers behind typedefs? It's a bad
>>>>>> pattern, and that it's fairly widely used isn't a good excuse to
>>>>>> introduce further usages of it.
>>>>>>
>>>>>>
>>>>> Why is it a bad pattern?
>>>>>
>>>>
>>>> It hides what is passed by reference, and what by value, and it makes it
>>>> a guessing game whether you need -> or . since you don't know whether
>>>> it's a pointer or the actual object. All to save a * in parameter and
>>>> variable declaration?...
>>>>
>>>>
>>> FWIW I don't like that pattern either although it's used in many
>>> parts of our code-base.
>>>
>>
>> But relatively few new ones, most of it is pretty old.
>>
>>
> I do agree it's not particularly pretty pattern, but in this case it's
> fairly isolated in the mmgr sources, and I quite value the consistency in
> this part of the code (i.e. that aset.c, slab.c and generation.c all use
> the same approach). So I haven't changed this.
>
> The attached v7 fixes the off-by-one error in slab.c, causing failures in
> test_decoding isolation tests, and renames Gen to Generation, as proposed
> by Petr.
>

Moved to next CF with same status (needs review).

Regards,
Hari Babu
Fujitsu Australia

From:	Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>
To:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Petr Jelinek <petr(at)2ndquadrant(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, John Gorman <johngorman2(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: PATCH: two slab-like memory allocators
Date:	2016-12-12 04:05:36
Message-ID:	7947fc5f-127f-914b-f050-ac6560496731@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 01/12/16 03:26, Tomas Vondra wrote:
>
> Dne 11/27/2016 v 11:02 PM Andres Freund napsal(a):
>> On 2016-11-27 22:21:49 +0100, Petr Jelinek wrote:
>>> On 27/11/16 21:47, Andres Freund wrote:
>>>> Hi,
>>>>
>>>>>> +typedef struct SlabBlockData *SlabBlock; /* forward
>>>>>> reference */
>>>>>> +typedef struct SlabChunkData *SlabChunk;
>>>>>>
>>>>>> Can we please not continue hiding pointers behind typedefs? It's a
>>>>>> bad
>>>>>> pattern, and that it's fairly widely used isn't a good excuse to
>>>>>> introduce further usages of it.
>>>>>>
>>>>>
>>>>> Why is it a bad pattern?
>>>>
>>>> It hides what is passed by reference, and what by value, and it
>>>> makes it
>>>> a guessing game whether you need -> or . since you don't know whether
>>>> it's a pointer or the actual object. All to save a * in parameter and
>>>> variable declaration?...
>>>>
>>>
>>> FWIW I don't like that pattern either although it's used in many
>>> parts of our code-base.
>>
>> But relatively few new ones, most of it is pretty old.
>>
>
> I do agree it's not particularly pretty pattern, but in this case it's
> fairly isolated in the mmgr sources, and I quite value the consistency
> in this part of the code (i.e. that aset.c, slab.c and generation.c all
> use the same approach). So I haven't changed this.
>
> The attached v7 fixes the off-by-one error in slab.c, causing failures
> in test_decoding isolation tests, and renames Gen to Generation, as
> proposed by Petr.
>

I'd be happy with this patch now (as in committer ready) except that it
does have some merge conflicts after the recent commits, so rebase is
needed.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To:	Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Petr Jelinek <petr(at)2ndquadrant(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, John Gorman <johngorman2(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: PATCH: two slab-like memory allocators
Date:	2016-12-12 22:39:34
Message-ID:	0fdc7920-f1b4-ea5c-0168-ed4f56448e53@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 12/12/2016 05:05 AM, Petr Jelinek wrote:
>
> I'd be happy with this patch now (as in committer ready) except that it
> does have some merge conflicts after the recent commits, so rebase is
> needed.
>

Attached is a rebased version of the patch, resolving the Makefile merge
conflicts.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment	Content-Type	Size
slab-allocators-v7.tgz	application/x-compressed-tar	18.2 KB

From:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To:	Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Petr Jelinek <petr(at)2ndquadrant(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, John Gorman <johngorman2(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: PATCH: two slab-like memory allocators
Date:	2016-12-13 00:45:13
Message-ID:	20ca0c69-e8c3-cb50-4456-f03908c672ec@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 12/12/2016 11:39 PM, Tomas Vondra wrote:
> On 12/12/2016 05:05 AM, Petr Jelinek wrote:
>>
>> I'd be happy with this patch now (as in committer ready) except that it
>> does have some merge conflicts after the recent commits, so rebase is
>> needed.
>>
>
> Attached is a rebased version of the patch, resolving the Makefile merge
> conflicts.
>

Meh, managed to rebase a wrong branch, missing fix to the off-by-one
error (fixed v6). Attached is v8, hopefully the correct one.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment	Content-Type	Size
0001-move-common-bits-to-memdebug-v8.patch	binary/octet-stream	11.0 KB
0002-slab-allocator-v8.patch	binary/octet-stream	32.4 KB
0003-generational-context-v8.patch	binary/octet-stream	53.6 KB

From:	Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>
To:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Petr Jelinek <petr(at)2ndquadrant(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, John Gorman <johngorman2(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: PATCH: two slab-like memory allocators
Date:	2016-12-13 01:32:22
Message-ID:	63a5c597-b1f2-e6f5-ef5d-e9530636be31@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 13/12/16 01:45, Tomas Vondra wrote:
> On 12/12/2016 11:39 PM, Tomas Vondra wrote:
>> On 12/12/2016 05:05 AM, Petr Jelinek wrote:
>>>
>>> I'd be happy with this patch now (as in committer ready) except that it
>>> does have some merge conflicts after the recent commits, so rebase is
>>> needed.
>>>
>>
>> Attached is a rebased version of the patch, resolving the Makefile merge
>> conflicts.
>>
>
> Meh, managed to rebase a wrong branch, missing fix to the off-by-one
> error (fixed v6). Attached is v8, hopefully the correct one.
>

Okay, this version looks good to me, marked as RfC.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>
Cc:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, John Gorman <johngorman2(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: PATCH: two slab-like memory allocators
Date:	2017-02-01 04:19:31
Message-ID:	CAB7nPqR=GrEPAJ+Sm4KUqC09P4XP6JMMud1mU8s=DdgHB=8B5w@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Dec 13, 2016 at 10:32 AM, Petr Jelinek
<petr(dot)jelinek(at)2ndquadrant(dot)com> wrote:
> Okay, this version looks good to me, marked as RfC.

The patches still apply, moved to CF 2017-03 with same status: RfC.
--
Michael

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc:	Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, John Gorman <johngorman2(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: PATCH: two slab-like memory allocators
Date:	2017-02-09 21:37:32
Message-ID:	20170209213732.pykbsixeezcthotr@alap3.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

On 2016-12-13 01:45:13 +0100, Tomas Vondra wrote:
> src/backend/utils/mmgr/Makefile | 2 +-
> src/backend/utils/mmgr/aset.c | 115 +--------------------------------
> src/backend/utils/mmgr/memdebug.c | 131 ++++++++++++++++++++++++++++++++++++++
> src/include/utils/memdebug.h | 22 +++++++
> 4 files changed, 156 insertions(+), 114 deletions(-)
> create mode 100644 src/backend/utils/mmgr/memdebug.c

I'm a bit loathe to move these to a .c file - won't this likely make
these debugging tools even slower? Seems better to put some of them
them in a header as static inlines (not randomize, but the rest).

> From 43aaabf70b979b172fd659ef4d0ef129fd78d72d Mon Sep 17 00:00:00 2001
> From: Tomas Vondra <tomas(at)2ndquadrant(dot)com>
> Date: Wed, 30 Nov 2016 15:36:23 +0100
> Subject: [PATCH 2/3] slab allocator
>
> ---
> src/backend/replication/logical/reorderbuffer.c | 82 +--
> src/backend/utils/mmgr/Makefile | 2 +-
> src/backend/utils/mmgr/slab.c | 803 ++++++++++++++++++++++++
> src/include/nodes/memnodes.h | 2 +-
> src/include/nodes/nodes.h | 1 +
> src/include/replication/reorderbuffer.h | 15 +-
> src/include/utils/memutils.h | 9 +

I'd like to see the reorderbuffer changes split into a separate commit
from the slab allocator introduction.

> +/*-------------------------------------------------------------------------
> + *
> + * slab.c
> + * SLAB allocator definitions.
> + *
> + * SLAB is a custom MemoryContext implementation designed for cases of
> + * equally-sized objects.
> + *
> + *
> + * Portions Copyright (c) 2016, PostgreSQL Global Development Group

Bump, before a committer forgets it.

> + * IDENTIFICATION
> + * src/backend/utils/mmgr/slab.c
> + *
> + *
> + * The constant allocation size allows significant simplification and various
> + * optimizations. Firstly, we can get rid of the doubling and carve the blocks
> + * into chunks of exactly the right size (plus alignment), not wasting memory.

Getting rid of it relative to what? I'd try to phrase it so these
comments stand on their own.

> + * Each block includes a simple bitmap tracking which chunks are used/free.
> + * This makes it trivial to check if all chunks on the block are free, and
> + * eventually free the whole block (which is almost impossible with a global
> + * freelist of chunks, storing chunks from all blocks).

Why is checking a potentially somewhat long-ish bitmap better than a
simple counter, or a "linked list" of "next free chunk-number" or such
(where free chunks simply contain the id of the subsequent chunk)?
Using a list instead of a bitmap would also make it possible to get
'lifo' behaviour, which is good for cache efficiency. A simple
chunk-number based singly linked list would only imply a minimum
allocation size of 4 - that seems perfectly reasonable?

> + * At the context level, we use 'freelist' to track blocks ordered by number
> + * of free chunks, starting with blocks having a single allocated chunk, and
> + * with completely full blocks on the tail.

Why that way round? Filling chunks up as much as possible is good for
cache and TLB efficiency, and allows for earlier de-allocation of
partially used blocks? Oh, I see you do that in the next comment,
but it still leaves me wondering.

Also, is this actually a list? It's more an array of lists, right?
I.e. it should be named freelists?

Thirdly, isn't that approach going to result in a quite long freelists
array, when you have small items and a decent blocksize? That seems like
a fairly reasonable thing to do?

> + * This also allows various optimizations - for example when searching for
> + * free chunk, we the allocator reuses space from the most full blocks first,
> + * in the hope that some of the less full blocks will get completely empty
> + * (and returned back to the OS).

Might be worth mentioning tlb/cache efficiency too.

> + * For each block, we maintain pointer to the first free chunk - this is quite
> + * cheap and allows us to skip all the preceding used chunks, eliminating
> + * a significant number of lookups in many common usage patters. In the worst
> + * case this performs as if the pointer was not maintained.

Hm, so that'd be eliminated if we maintained a linked list of chunks (by
chunk number) and a free_chunk_cnt or such.

> +
> +#include "postgres.h"
> +
> +#include "utils/memdebug.h"
> +#include "utils/memutils.h"
> +#include "lib/ilist.h"

Move ilist up, above memdebug, so the list is alphabetically ordered.

> +/*
> + * SlabPointer
> + * Aligned pointer which may be a member of an allocation set.
> + */
> +typedef void *SlabPointer;
> +typedef SlabContext *Slab;

I personally wont commit this whith pointer hiding typedefs. If
somebody else does, I can live with it, but for me it's bad enough taste
that I wont.

> +/*
> + * SlabContext is a specialized implementation of MemoryContext.
> + */
> +typedef struct SlabContext
> +{
> + MemoryContextData header; /* Standard memory-context fields */
> + /* Allocation parameters for this context: */
> + Size chunkSize; /* chunk size */
> + Size fullChunkSize; /* chunk size including header and alignment */
> + Size blockSize; /* block size */
> + int chunksPerBlock; /* number of chunks per block */
> + int minFreeChunks; /* min number of free chunks in any block */
> + int nblocks; /* number of blocks allocated */
> + /* blocks with free space, grouped by number of free chunks: */
> + dlist_head freelist[FLEXIBLE_ARRAY_MEMBER];
> +} SlabContext;
> +

Why aren't these ints something unsigned?

> +/*
> + * SlabIsValid
> + * True iff set is valid allocation set.
> + */
> +#define SlabIsValid(set) PointerIsValid(set)

It's not your fault, but this "iff" is obviously a lot stronger than the
actual test ;). I seriously doubt this macro is worth anything...

> +/*
> + * SlabReset
> + * Frees all memory which is allocated in the given set.
> + *
> + * The code simply frees all the blocks in the context - we don't keep any
> + * keeper blocks or anything like that.
> + */

Why don't we? Seems quite worthwhile? Thinking about this, won't this
result in a drastic increase of system malloc/mmap/brk traffic when
there's lot of short transactions in reorderbuffer?

> +static void
> +SlabReset(MemoryContext context)
> +{
> + /* walk over freelists and free the blocks */
> + for (i = 0; i <= set->chunksPerBlock; i++)
> + {
> + dlist_mutable_iter miter;
> +
> + dlist_foreach_modify(miter, &set->freelist[i])
> + {
> + SlabBlock block = dlist_container(SlabBlockData, node, miter.cur);
> +
> + dlist_delete(miter.cur);
> +
> + /* Normal case, release the block */

What does "normal case" refer to here? Given that there's no alternative
case...

> + /*
> + * We need to update index of the next free chunk on the block. If we used
> + * the last free chunk on this block, set it to chunksPerBlock (which is
> + * not a valid chunk index). Otherwise look for the next chunk - we know
> + * that it has to be above the current firstFreeChunk value, thanks to how
> + * we maintain firstFreeChunk here and in SlabFree().
> + */
> + if (block->nfree == 0)
> + block->firstFreeChunk = set->chunksPerBlock;
> + else
> + {
> + /* look for the next free chunk in the block, after the first one */
> + while ((++block->firstFreeChunk) < set->chunksPerBlock)
> + {
> + int byte = block->firstFreeChunk / 8;
> + int bit = block->firstFreeChunk % 8;
> +
> + /* stop when you find 0 (unused chunk) */
> + if (!(block->bitmapptr[byte] & (0x01 << bit)))
> + break;
> + }
> +
> + /* must have found the free chunk */
> + Assert(block->firstFreeChunk != set->chunksPerBlock);
> + }

This and previous code just re-affirms my opinion that a bitmap is not
the best structure here.

> + /* move the whole block to the right place in the freelist */
> + dlist_delete(&block->node);
> + dlist_push_head(&set->freelist[block->nfree], &block->node);

Hm. What if we, instead of the array of doubly linked lists, just kept
a single linked list of blocks, and keep that list sorted by number of
free chunks? Given that freeing / allocation never changes the number
of allocated chunks by more than 1, we'll never have to move an entry
far in that list to keep it sorted.

> +/*
> + * SlabRealloc
> + * As Slab is designed for allocating equally-sized chunks of memory, it
> + * can't really do an actual realloc.
> + *
> + * We try to be gentle and allow calls with exactly the same size as in that
> + * case we can simply return the same chunk. When the size differs, we fail
> + * with assert failure or return NULL.
> + *
> + * We might be even support cases with (size < chunkSize). That however seems
> + * rather pointless - Slab is meant for chunks of constant size, and moreover
> + * realloc is usually used to enlarge the chunk.
> + *
> + * XXX Perhaps we should not be gentle at all and simply fails in all cases,
> + * to eliminate the (mostly pointless) uncertainty.

I think I'm in favor of that. This seems more likely to hide a bug than
actually helpful.

Regards,

Andres

From:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, John Gorman <johngorman2(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: PATCH: two slab-like memory allocators
Date:	2017-02-11 01:13:59
Message-ID:	b72dfa3d-c241-8a3e-97ad-bf214b754aca@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 02/09/2017 10:37 PM, Andres Freund wrote:
> Hi,
>
> On 2016-12-13 01:45:13 +0100, Tomas Vondra wrote:
>> src/backend/utils/mmgr/Makefile | 2 +-
>> src/backend/utils/mmgr/aset.c | 115 +--------------------------------
>> src/backend/utils/mmgr/memdebug.c | 131 ++++++++++++++++++++++++++++++++++++++
>> src/include/utils/memdebug.h | 22 +++++++
>> 4 files changed, 156 insertions(+), 114 deletions(-)
>> create mode 100644 src/backend/utils/mmgr/memdebug.c
>
> I'm a bit loathe to move these to a .c file - won't this likely make
> these debugging tools even slower? Seems better to put some of them
> them in a header as static inlines (not randomize, but the rest).
>

Do you have any numbers to support that? AFAICS compilers got really
good in inlining stuff on their own.

>
>> From 43aaabf70b979b172fd659ef4d0ef129fd78d72d Mon Sep 17 00:00:00 2001
>> From: Tomas Vondra <tomas(at)2ndquadrant(dot)com>
>> Date: Wed, 30 Nov 2016 15:36:23 +0100
>> Subject: [PATCH 2/3] slab allocator
>>
>> ---
>> src/backend/replication/logical/reorderbuffer.c | 82 +--
>> src/backend/utils/mmgr/Makefile | 2 +-
>> src/backend/utils/mmgr/slab.c | 803 ++++++++++++++++++++++++
>> src/include/nodes/memnodes.h | 2 +-
>> src/include/nodes/nodes.h | 1 +
>> src/include/replication/reorderbuffer.h | 15 +-
>> src/include/utils/memutils.h | 9 +
>
> I'd like to see the reorderbuffer changes split into a separate commit
> from the slab allocator introduction.
>

I rather dislike patches that only add a bunch of code, without actually
using it anywhere. But if needed, this is trivial to do at commit time -
just don't commit the reorderbuffer bits.

>
>
>
>> +/*-------------------------------------------------------------------------
>> + *
>> + * slab.c
>> + * SLAB allocator definitions.
>> + *
>> + * SLAB is a custom MemoryContext implementation designed for cases of
>> + * equally-sized objects.
>> + *
>> + *
>> + * Portions Copyright (c) 2016, PostgreSQL Global Development Group
>
> Bump, before a committer forgets it.
>

OK.

>
>> + * IDENTIFICATION
>> + * src/backend/utils/mmgr/slab.c
>> + *
>> + *
>> + * The constant allocation size allows significant simplification and various
>> + * optimizations. Firstly, we can get rid of the doubling and carve the blocks
>> + * into chunks of exactly the right size (plus alignment), not wasting memory.
>
> Getting rid of it relative to what? I'd try to phrase it so these
> comments stand on their own.
>

OK, rill reword.

>
>> + * Each block includes a simple bitmap tracking which chunks are used/free.
>> + * This makes it trivial to check if all chunks on the block are free, and
>> + * eventually free the whole block (which is almost impossible with a global
>> + * freelist of chunks, storing chunks from all blocks).
>
> Why is checking a potentially somewhat long-ish bitmap better than a
> simple counter, or a "linked list" of "next free chunk-number" or such
> (where free chunks simply contain the id of the subsequent chunk)?
> Using a list instead of a bitmap would also make it possible to get
> 'lifo' behaviour, which is good for cache efficiency. A simple
> chunk-number based singly linked list would only imply a minimum
> allocation size of 4 - that seems perfectly reasonable?
>

A block-level counter would be enough to decide if all chunks on the
block are free, but it's not sufficient to identify which chunks are
free / available for reuse.

The bitmap only has a single bit per chunk, so I find "potentially
long-ish" is a bit misleading. Any linked list implementation will
require much more per-chunk overhead - as the chunks are fixed-legth,
it's possible to use chunk index (instead of 64-bit pointers), to save
some space. But with large blocks / small chunks that's still at least 2
or 4 bytes per index, and you'll need two (to implement doubly-linked
list, to make add/remove efficient).

For example assume 8kB block and 64B chunks, i.e. 128 chunks. With
bitmap that's 16B to track all free space on the block. Doubly linked
list would require 1B per chunk index, 2 indexes per chunk. That's 128*2
= 256B.

I have a hard time believing this the cache efficiency of linked lists
(which may or may not be real in this case) out-weights this, but if you
want to try, be my guest.

>
>> + * At the context level, we use 'freelist' to track blocks ordered by number
>> + * of free chunks, starting with blocks having a single allocated chunk, and
>> + * with completely full blocks on the tail.
>
> Why that way round? Filling chunks up as much as possible is good for
> cache and TLB efficiency, and allows for earlier de-allocation of
> partially used blocks? Oh, I see you do that in the next comment,
> but it still leaves me wondering.
>
> Also, is this actually a list? It's more an array of lists, right?
> I.e. it should be named freelists?
>

Possibly. Naming things is hard.

>
> Thirdly, isn't that approach going to result in a quite long freelists
> array, when you have small items and a decent blocksize? That seems like
> a fairly reasonable thing to do?
>

I'm confused. Why wouldn't that be reasonable. Or rather, what would be
a more reasonable way?

>
>> + * This also allows various optimizations - for example when searching for
>> + * free chunk, we the allocator reuses space from the most full blocks first,
>> + * in the hope that some of the less full blocks will get completely empty
>> + * (and returned back to the OS).
>
> Might be worth mentioning tlb/cache efficiency too.
>

I haven't really considered tlb/cache very much. The main goal of this
design was to free blocks (instead of keeping many partially-used blocks
around). If you have comments on this, feel free to add them.

>
>> + * For each block, we maintain pointer to the first free chunk - this is quite
>> + * cheap and allows us to skip all the preceding used chunks, eliminating
>> + * a significant number of lookups in many common usage patters. In the worst
>> + * case this performs as if the pointer was not maintained.
>
> Hm, so that'd be eliminated if we maintained a linked list of chunks (by
> chunk number) and a free_chunk_cnt or such.
>

As I explained above, I don't think linked list is a good solution. IIRC
correctly I've initially done that, and ended using the bitmap. If you
have idea how to do that, feel free to implement that and then we can do
some measurements and compare the patches.

>
>> +
>> +#include "postgres.h"
>> +
>> +#include "utils/memdebug.h"
>> +#include "utils/memutils.h"
>> +#include "lib/ilist.h"
>
> Move ilist up, above memdebug, so the list is alphabetically ordered.
>

>
>> +/*
>> + * SlabPointer
>> + * Aligned pointer which may be a member of an allocation set.
>> + */
>> +typedef void *SlabPointer;
>> +typedef SlabContext *Slab;
>
> I personally wont commit this whith pointer hiding typedefs. If
> somebody else does, I can live with it, but for me it's bad enough taste
> that I wont.
>

Meh.

>
>> +/*
>> + * SlabContext is a specialized implementation of MemoryContext.
>> + */
>> +typedef struct SlabContext
>> +{
>> + MemoryContextData header; /* Standard memory-context fields */
>> + /* Allocation parameters for this context: */
>> + Size chunkSize; /* chunk size */
>> + Size fullChunkSize; /* chunk size including header and alignment */
>> + Size blockSize; /* block size */
>> + int chunksPerBlock; /* number of chunks per block */
>> + int minFreeChunks; /* min number of free chunks in any block */
>> + int nblocks; /* number of blocks allocated */
>> + /* blocks with free space, grouped by number of free chunks: */
>> + dlist_head freelist[FLEXIBLE_ARRAY_MEMBER];
>> +} SlabContext;
>> +
>
> Why aren't these ints something unsigned?
>

Yeah, some of those could be unsigned. Will check.

>
>> +/*
>> + * SlabIsValid
>> + * True iff set is valid allocation set.
>> + */
>> +#define SlabIsValid(set) PointerIsValid(set)
>
> It's not your fault, but this "iff" is obviously a lot stronger than the
> actual test ;). I seriously doubt this macro is worth anything...
>

Yeah.

>
>> +/*
>> + * SlabReset
>> + * Frees all memory which is allocated in the given set.
>> + *
>> + * The code simply frees all the blocks in the context - we don't keep any
>> + * keeper blocks or anything like that.
>> + */
>
> Why don't we? Seems quite worthwhile? Thinking about this, won't this
> result in a drastic increase of system malloc/mmap/brk traffic when
> there's lot of short transactions in reorderbuffer?
>

I haven't seen any significant impact of that during the tests I've done
(even with many tiny transactions), but it adding keeper blocks should
be trivial.

>
>> +static void
>> +SlabReset(MemoryContext context)
>> +{
>> + /* walk over freelists and free the blocks */
>> + for (i = 0; i <= set->chunksPerBlock; i++)
>> + {
>> + dlist_mutable_iter miter;
>> +
>> + dlist_foreach_modify(miter, &set->freelist[i])
>> + {
>> + SlabBlock block = dlist_container(SlabBlockData, node, miter.cur);
>> +
>> + dlist_delete(miter.cur);
>> +
>> + /* Normal case, release the block */
>
> What does "normal case" refer to here? Given that there's no alternative
> case...
>

Mem, bogus comment.

>
>> + /*
>> + * We need to update index of the next free chunk on the block. If we used
>> + * the last free chunk on this block, set it to chunksPerBlock (which is
>> + * not a valid chunk index). Otherwise look for the next chunk - we know
>> + * that it has to be above the current firstFreeChunk value, thanks to how
>> + * we maintain firstFreeChunk here and in SlabFree().
>> + */
>> + if (block->nfree == 0)
>> + block->firstFreeChunk = set->chunksPerBlock;
>> + else
>> + {
>> + /* look for the next free chunk in the block, after the first one */
>> + while ((++block->firstFreeChunk) < set->chunksPerBlock)
>> + {
>> + int byte = block->firstFreeChunk / 8;
>> + int bit = block->firstFreeChunk % 8;
>> +
>> + /* stop when you find 0 (unused chunk) */
>> + if (!(block->bitmapptr[byte] & (0x01 << bit)))
>> + break;
>> + }
>> +
>> + /* must have found the free chunk */
>> + Assert(block->firstFreeChunk != set->chunksPerBlock);
>> + }
>
> This and previous code just re-affirms my opinion that a bitmap is not
> the best structure here.
>

It'd be great if you could explain why, instead of just making such
claims ...

>
>> + /* move the whole block to the right place in the freelist */
>> + dlist_delete(&block->node);
>> + dlist_push_head(&set->freelist[block->nfree], &block->node);
>
> Hm. What if we, instead of the array of doubly linked lists, just kept
> a single linked list of blocks, and keep that list sorted by number of
> free chunks? Given that freeing / allocation never changes the number
> of allocated chunks by more than 1, we'll never have to move an entry
> far in that list to keep it sorted.
>

Only assuming that there'll be only few blocks with the same number of
free chunks. If that's not the case, you'll have to walk many blocks to
move the block to the right place in the list. The array of lists
handles such cases way more efficiently, and I think we should keep it.

>
>> +/*
>> + * SlabRealloc
>> + * As Slab is designed for allocating equally-sized chunks of memory, it
>> + * can't really do an actual realloc.
>> + *
>> + * We try to be gentle and allow calls with exactly the same size as in that
>> + * case we can simply return the same chunk. When the size differs, we fail
>> + * with assert failure or return NULL.
>> + *
>> + * We might be even support cases with (size < chunkSize). That however seems
>> + * rather pointless - Slab is meant for chunks of constant size, and moreover
>> + * realloc is usually used to enlarge the chunk.
>> + *
>> + * XXX Perhaps we should not be gentle at all and simply fails in all cases,
>> + * to eliminate the (mostly pointless) uncertainty.
>
> I think I'm in favor of that. This seems more likely to hide a bug than
> actually helpful.
>

OK.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc:	Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, John Gorman <johngorman2(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: PATCH: two slab-like memory allocators
Date:	2017-02-11 01:33:37
Message-ID:	20170211013337.iw32bvfqwta634wy@alap3.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2017-02-11 02:13:59 +0100, Tomas Vondra wrote:
> On 02/09/2017 10:37 PM, Andres Freund wrote:
> > Hi,
> >
> > On 2016-12-13 01:45:13 +0100, Tomas Vondra wrote:
> > > src/backend/utils/mmgr/Makefile | 2 +-
> > > src/backend/utils/mmgr/aset.c | 115 +--------------------------------
> > > src/backend/utils/mmgr/memdebug.c | 131 ++++++++++++++++++++++++++++++++++++++
> > > src/include/utils/memdebug.h | 22 +++++++
> > > 4 files changed, 156 insertions(+), 114 deletions(-)
> > > create mode 100644 src/backend/utils/mmgr/memdebug.c
> >
> > I'm a bit loathe to move these to a .c file - won't this likely make
> > these debugging tools even slower? Seems better to put some of them
> > them in a header as static inlines (not randomize, but the rest).
> >
>
> Do you have any numbers to support that? AFAICS compilers got really good in
> inlining stuff on their own.

Unless you use LTO, they can't inline across translation units. And
using LTO is slow enough for linking that it's not that much fun to use,
as it makes compile-edit-compile cycles essentially take as long as a
full rebuild.

> > > From 43aaabf70b979b172fd659ef4d0ef129fd78d72d Mon Sep 17 00:00:00 2001
> > > From: Tomas Vondra <tomas(at)2ndquadrant(dot)com>
> > > Date: Wed, 30 Nov 2016 15:36:23 +0100
> > > Subject: [PATCH 2/3] slab allocator
> > >
> > > ---
> > > src/backend/replication/logical/reorderbuffer.c | 82 +--
> > > src/backend/utils/mmgr/Makefile | 2 +-
> > > src/backend/utils/mmgr/slab.c | 803 ++++++++++++++++++++++++
> > > src/include/nodes/memnodes.h | 2 +-
> > > src/include/nodes/nodes.h | 1 +
> > > src/include/replication/reorderbuffer.h | 15 +-
> > > src/include/utils/memutils.h | 9 +
> >
> > I'd like to see the reorderbuffer changes split into a separate commit
> > from the slab allocator introduction.
> >
>
> I rather dislike patches that only add a bunch of code, without actually
> using it anywhere.

> But if needed, this is trivial to do at commit time - just don't
> commit the reorderbuffer bits.

Meh.

> > > + * Each block includes a simple bitmap tracking which chunks are used/free.
> > > + * This makes it trivial to check if all chunks on the block are free, and
> > > + * eventually free the whole block (which is almost impossible with a global
> > > + * freelist of chunks, storing chunks from all blocks).
> >
> > Why is checking a potentially somewhat long-ish bitmap better than a
> > simple counter, or a "linked list" of "next free chunk-number" or such
> > (where free chunks simply contain the id of the subsequent chunk)?
> > Using a list instead of a bitmap would also make it possible to get
> > 'lifo' behaviour, which is good for cache efficiency. A simple
> > chunk-number based singly linked list would only imply a minimum
> > allocation size of 4 - that seems perfectly reasonable?
> >
>
> A block-level counter would be enough to decide if all chunks on the block
> are free, but it's not sufficient to identify which chunks are free /
> available for reuse.
>
> The bitmap only has a single bit per chunk, so I find "potentially long-ish"
> is a bit misleading. Any linked list implementation will require much more
> per-chunk overhead - as the chunks are fixed-legth, it's possible to use
> chunk index (instead of 64-bit pointers), to save some space. But with large
> blocks / small chunks that's still at least 2 or 4 bytes per index, and
> you'll need two (to implement doubly-linked list, to make add/remove
> efficient).

> For example assume 8kB block and 64B chunks, i.e. 128 chunks. With bitmap
> that's 16B to track all free space on the block. Doubly linked list would
> require 1B per chunk index, 2 indexes per chunk. That's 128*2 = 256B.

> I have a hard time believing this the cache efficiency of linked lists
> (which may or may not be real in this case) out-weights this, but if you
> want to try, be my guest.

I'm not following - why would there be overhead in anything for
allocations bigger than 4 (or maybe 8) bytes? You can store the list
(via chunk ids, not pointers) inside the chunks itself, where data
otherwise would be. And I don't see why you'd need a doubly linked
list, as the only operations that are needed are to push to the front of
the list, and to pop from the front of the list - and both operations
are simple to do with a singly linked list?

> > Thirdly, isn't that approach going to result in a quite long freelists
> > array, when you have small items and a decent blocksize? That seems like
> > a fairly reasonable thing to do?
> >
>
> I'm confused. Why wouldn't that be reasonable. Or rather, what would be a
> more reasonable way?

If I understood correctly, you have one an array of doubly linked lists.
A block is stored in the list at the index #block's-free-elements. Is that
right?

If so, if you have e.g. 8 byte allocations and 64kb sized blocks, you
end up with an array of 1024 doubly linked lists, which'll take up 64kb
on its own. And there a certainly scenarios where even bigger block
sizes could make sense. That's both memory overhead, and runtime
overhead, because at reset-time we'll have to check the whole array
(which'll presumably largely be empty lists). Resetting is a pretty
common path...

> > > + /*
> > > + * We need to update index of the next free chunk on the block. If we used
> > > + * the last free chunk on this block, set it to chunksPerBlock (which is
> > > + * not a valid chunk index). Otherwise look for the next chunk - we know
> > > + * that it has to be above the current firstFreeChunk value, thanks to how
> > > + * we maintain firstFreeChunk here and in SlabFree().
> > > + */
> > > + if (block->nfree == 0)
> > > + block->firstFreeChunk = set->chunksPerBlock;
> > > + else
> > > + {
> > > + /* look for the next free chunk in the block, after the first one */
> > > + while ((++block->firstFreeChunk) < set->chunksPerBlock)
> > > + {
> > > + int byte = block->firstFreeChunk / 8;
> > > + int bit = block->firstFreeChunk % 8;
> > > +
> > > + /* stop when you find 0 (unused chunk) */
> > > + if (!(block->bitmapptr[byte] & (0x01 << bit)))
> > > + break;
> > > + }
> > > +
> > > + /* must have found the free chunk */
> > > + Assert(block->firstFreeChunk != set->chunksPerBlock);
> > > + }
> >
> > This and previous code just re-affirms my opinion that a bitmap is not
> > the best structure here.
> >
>
> It'd be great if you could explain why, instead of just making such claims
> ...

Because it's complicated. This is a fair bit of code and branches to
run in a pretty hot path.

- Andres

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc:	Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, John Gorman <johngorman2(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: PATCH: two slab-like memory allocators
Date:	2017-02-11 01:41:29
Message-ID:	20170211014129.zriop3nw7rbevcqq@alap3.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2017-02-11 02:13:59 +0100, Tomas Vondra wrote:
> > > + /* move the whole block to the right place in the freelist */
> > > + dlist_delete(&block->node);
> > > + dlist_push_head(&set->freelist[block->nfree], &block->node);
> >
> > Hm. What if we, instead of the array of doubly linked lists, just kept
> > a single linked list of blocks, and keep that list sorted by number of
> > free chunks? Given that freeing / allocation never changes the number
> > of allocated chunks by more than 1, we'll never have to move an entry
> > far in that list to keep it sorted.
> >
>
> Only assuming that there'll be only few blocks with the same number of free
> chunks. If that's not the case, you'll have to walk many blocks to move the
> block to the right place in the list. The array of lists handles such cases
> way more efficiently, and I think we should keep it.

The proper datastructure would probably be a heap. Right now
binaryheap.h is fixed-size - probably not too hard to change.

Greetings,

Andres Freund

From:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, John Gorman <johngorman2(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: PATCH: two slab-like memory allocators
Date:	2017-02-11 13:40:18
Message-ID:	53def906-2197-bc06-b987-d237f9574800@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 02/11/2017 02:33 AM, Andres Freund wrote:
> On 2017-02-11 02:13:59 +0100, Tomas Vondra wrote:
>> On 02/09/2017 10:37 PM, Andres Freund wrote:
>>> Hi,
>>>
>>> On 2016-12-13 01:45:13 +0100, Tomas Vondra wrote:
>>>> src/backend/utils/mmgr/Makefile | 2 +-
>>>> src/backend/utils/mmgr/aset.c | 115 +--------------------------------
>>>> src/backend/utils/mmgr/memdebug.c | 131 ++++++++++++++++++++++++++++++++++++++
>>>> src/include/utils/memdebug.h | 22 +++++++
>>>> 4 files changed, 156 insertions(+), 114 deletions(-)
>>>> create mode 100644 src/backend/utils/mmgr/memdebug.c
>>>
>>> I'm a bit loathe to move these to a .c file - won't this likely make
>>> these debugging tools even slower? Seems better to put some of them
>>> them in a header as static inlines (not randomize, but the rest).
>>>
>>
>> Do you have any numbers to support that? AFAICS compilers got really good in
>> inlining stuff on their own.
>
> Unless you use LTO, they can't inline across translation units. And
> using LTO is slow enough for linking that it's not that much fun to use,
> as it makes compile-edit-compile cycles essentially take as long as a
> full rebuild.
>
>
>>>> From 43aaabf70b979b172fd659ef4d0ef129fd78d72d Mon Sep 17 00:00:00 2001
>>>> From: Tomas Vondra <tomas(at)2ndquadrant(dot)com>
>>>> Date: Wed, 30 Nov 2016 15:36:23 +0100
>>>> Subject: [PATCH 2/3] slab allocator
>>>>
>>>> ---
>>>> src/backend/replication/logical/reorderbuffer.c | 82 +--
>>>> src/backend/utils/mmgr/Makefile | 2 +-
>>>> src/backend/utils/mmgr/slab.c | 803 ++++++++++++++++++++++++
>>>> src/include/nodes/memnodes.h | 2 +-
>>>> src/include/nodes/nodes.h | 1 +
>>>> src/include/replication/reorderbuffer.h | 15 +-
>>>> src/include/utils/memutils.h | 9 +
>>>
>>> I'd like to see the reorderbuffer changes split into a separate commit
>>> from the slab allocator introduction.
>>>
>>
>> I rather dislike patches that only add a bunch of code, without actually
>> using it anywhere.
>
>> But if needed, this is trivial to do at commit time - just don't
>> commit the reorderbuffer bits.
>
> Meh.
>
>
>>>> + * Each block includes a simple bitmap tracking which chunks are used/free.
>>>> + * This makes it trivial to check if all chunks on the block are free, and
>>>> + * eventually free the whole block (which is almost impossible with a global
>>>> + * freelist of chunks, storing chunks from all blocks).
>>>
>>> Why is checking a potentially somewhat long-ish bitmap better than a
>>> simple counter, or a "linked list" of "next free chunk-number" or such
>>> (where free chunks simply contain the id of the subsequent chunk)?
>>> Using a list instead of a bitmap would also make it possible to get
>>> 'lifo' behaviour, which is good for cache efficiency. A simple
>>> chunk-number based singly linked list would only imply a minimum
>>> allocation size of 4 - that seems perfectly reasonable?
>>>
>>
>> A block-level counter would be enough to decide if all chunks on the block
>> are free, but it's not sufficient to identify which chunks are free /
>> available for reuse.
>>
>> The bitmap only has a single bit per chunk, so I find "potentially long-ish"
>> is a bit misleading. Any linked list implementation will require much more
>> per-chunk overhead - as the chunks are fixed-legth, it's possible to use
>> chunk index (instead of 64-bit pointers), to save some space. But with large
>> blocks / small chunks that's still at least 2 or 4 bytes per index, and
>> you'll need two (to implement doubly-linked list, to make add/remove
>> efficient).
>
>> For example assume 8kB block and 64B chunks, i.e. 128 chunks. With bitmap
>> that's 16B to track all free space on the block. Doubly linked list would
>> require 1B per chunk index, 2 indexes per chunk. That's 128*2 = 256B.
>
>> I have a hard time believing this the cache efficiency of linked lists
>> (which may or may not be real in this case) out-weights this, but if you
>> want to try, be my guest.
>
> I'm not following - why would there be overhead in anything for
> allocations bigger than 4 (or maybe 8) bytes? You can store the list
> (via chunk ids, not pointers) inside the chunks itself, where data
> otherwise would be. And I don't see why you'd need a doubly linked
> list, as the only operations that are needed are to push to the front of
> the list, and to pop from the front of the list - and both operations
> are simple to do with a singly linked list?
>

Oh! I have not considered storing the chunk indexes (for linked lists)
in the chunk itself, which obviously eliminates the overhead concerns,
and you're right a singly-linked list should be enough.

There will be some minimum-chunk-size requirement, depending on the
block size/chunk size. I wonder whether it makes sense to try to be
smart and make it dynamic, so that we only require 1B or 2B for cases
when only that many chunks fit into a block, or just say that it's 4B
and be done with it.

I mean 2^32 chunks ought to be enough for anyone, right?

I'm still not buying the cache efficiency argument, though. One of the
reasons is that the implementation prefers blocks with fewer free chunks
when handling palloc(), so pfree() is making the block less likely to be
chosen by the next palloc().

>
>>> Thirdly, isn't that approach going to result in a quite long freelists
>>> array, when you have small items and a decent blocksize? That seems like
>>> a fairly reasonable thing to do?
>>>
>>
>> I'm confused. Why wouldn't that be reasonable. Or rather, what would be a
>> more reasonable way?
>
> If I understood correctly, you have one an array of doubly linked lists.
> A block is stored in the list at the index #block's-free-elements. Is that
> right?
>
> If so, if you have e.g. 8 byte allocations and 64kb sized blocks, you
> end up with an array of 1024 doubly linked lists, which'll take up 64kb
> on its own. And there a certainly scenarios where even bigger block
> sizes could make sense. That's both memory overhead, and runtime
> overhead, because at reset-time we'll have to check the whole array
> (which'll presumably largely be empty lists). Resetting is a pretty
> common path...
>

True, but it's not entirely clear if resetting is common for the paths
where we use those new allocators.

Also, if we accept that it might be a problem, what other solution you
propose? I don't think just merging everything into a single list is a
good idea, for the reasons I explained before (it might make the resets
somewhat less expensive, but it'll make pfree() more expensive).

What might work is replacing the array with a list, though. So we'd have
a list of lists, which would eliminate the array overhead.

>
>>>> + /*
>>>> + * We need to update index of the next free chunk on the block. If we used
>>>> + * the last free chunk on this block, set it to chunksPerBlock (which is
>>>> + * not a valid chunk index). Otherwise look for the next chunk - we know
>>>> + * that it has to be above the current firstFreeChunk value, thanks to how
>>>> + * we maintain firstFreeChunk here and in SlabFree().
>>>> + */
>>>> + if (block->nfree == 0)
>>>> + block->firstFreeChunk = set->chunksPerBlock;
>>>> + else
>>>> + {
>>>> + /* look for the next free chunk in the block, after the first one */
>>>> + while ((++block->firstFreeChunk) < set->chunksPerBlock)
>>>> + {
>>>> + int byte = block->firstFreeChunk / 8;
>>>> + int bit = block->firstFreeChunk % 8;
>>>> +
>>>> + /* stop when you find 0 (unused chunk) */
>>>> + if (!(block->bitmapptr[byte] & (0x01 << bit)))
>>>> + break;
>>>> + }
>>>> +
>>>> + /* must have found the free chunk */
>>>> + Assert(block->firstFreeChunk != set->chunksPerBlock);
>>>> + }
>>>
>>> This and previous code just re-affirms my opinion that a bitmap is not
>>> the best structure here.
>>>
>>
>> It'd be great if you could explain why, instead of just making such claims
>> ...
>
> Because it's complicated. This is a fair bit of code and branches to
> run in a pretty hot path.
>

Hmm. I admit updating the index of the first free chunk is a bit
cumbersome, and the linked list would make it unnecessary.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc:	Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, John Gorman <johngorman2(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: PATCH: two slab-like memory allocators
Date:	2017-02-14 02:22:21
Message-ID:	20170214022221.obhdeyfn34h6hkwj@alap3.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

On 2017-02-11 14:40:18 +0100, Tomas Vondra wrote:
> On 02/11/2017 02:33 AM, Andres Freund wrote:
> > > I have a hard time believing this the cache efficiency of linked lists
> > > (which may or may not be real in this case) out-weights this, but if you
> > > want to try, be my guest.
> >
> > I'm not following - why would there be overhead in anything for
> > allocations bigger than 4 (or maybe 8) bytes? You can store the list
> > (via chunk ids, not pointers) inside the chunks itself, where data
> > otherwise would be. And I don't see why you'd need a doubly linked
> > list, as the only operations that are needed are to push to the front of
> > the list, and to pop from the front of the list - and both operations
> > are simple to do with a singly linked list?
> >
>
> Oh! I have not considered storing the chunk indexes (for linked lists) in
> the chunk itself, which obviously eliminates the overhead concerns, and
> you're right a singly-linked list should be enough.
>
> There will be some minimum-chunk-size requirement, depending on the block
> size/chunk size. I wonder whether it makes sense to try to be smart and make
> it dynamic, so that we only require 1B or 2B for cases when only that many
> chunks fit into a block, or just say that it's 4B and be done with it.

I doubt it's worth it - it seems likely that the added branch is more
noticeable overall than the possible savings of 3 bytes. Also, won't the
space be lost due to alignment *anyway*?
+ /* chunk, including SLAB header (both addresses nicely aligned) */
+ fullChunkSize = MAXALIGN(sizeof(SlabChunkData) + MAXALIGN(chunkSize));

In that case I'd just Assert(MAXIMUM_ALIGNOF >= sizeof(slist_head)) and
use a plain slist - no point in being more careful than that.

> I mean 2^32 chunks ought to be enough for anyone, right?

Yea, that seems enough; but given the alignment thing pointed out above,
I think we can just use plain pointers - and that definitely should be
enough :P

> I'm still not buying the cache efficiency argument, though. One of the
> reasons is that the implementation prefers blocks with fewer free chunks
> when handling palloc(), so pfree() is making the block less likely to be
> chosen by the next palloc().

That'll possibly de-optimize L1, but for L2 usage the higher density
seems like it'll be a win. All this memory is only accessed by a single
backend, so packing as densely as possible is good.

> > If so, if you have e.g. 8 byte allocations and 64kb sized blocks, you
> > end up with an array of 1024 doubly linked lists, which'll take up 64kb
> > on its own. And there a certainly scenarios where even bigger block
> > sizes could make sense. That's both memory overhead, and runtime
> > overhead, because at reset-time we'll have to check the whole array
> > (which'll presumably largely be empty lists). Resetting is a pretty
> > common path...
> >
>
> True, but it's not entirely clear if resetting is common for the paths where
> we use those new allocators.

That's fair enough. There's still the memory overhead, but I guess we
can also live with that.

> Also, if we accept that it might be a problem, what other solution you
> propose? I don't think just merging everything into a single list is a good
> idea, for the reasons I explained before (it might make the resets somewhat
> less expensive, but it'll make pfree() more expensive).

Now that I think about it, a binary heap, as suggested elsewhere, isn't
entirely trivial to use for this - it's more or less trivial to "fix"
the heap after changing an element's value, but it's harder to find that
element first.

But a two-level list approach seems like it could work quite well -
basically a simplified skip-list. A top-level list contains that has
the property that all the elements have a distinct #free, and then
hanging of those sub-lists for all the other blocks with the same number
of chunks.

I think that'd be a better implementation, but I can understand if you
don't immediately want to go there.

> What might work is replacing the array with a list, though. So we'd have a
> list of lists, which would eliminate the array overhead.

That seems likely to be significantly worse, because a) iteration is
more expensive b) accessing the relevant list to move between two
different "freecount" lists would be O(n).

- Andres

From:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, John Gorman <johngorman2(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: PATCH: two slab-like memory allocators
Date:	2017-02-14 19:05:20
Message-ID:	19464ac8-e66b-3598-858f-29c749bec35a@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 02/14/2017 03:22 AM, Andres Freund wrote:
> Hi,
>
> On 2017-02-11 14:40:18 +0100, Tomas Vondra wrote:
>> On 02/11/2017 02:33 AM, Andres Freund wrote:
>>>> I have a hard time believing this the cache efficiency of linked lists
>>>> (which may or may not be real in this case) out-weights this, but if you
>>>> want to try, be my guest.
>>>
>>> I'm not following - why would there be overhead in anything for
>>> allocations bigger than 4 (or maybe 8) bytes? You can store the list
>>> (via chunk ids, not pointers) inside the chunks itself, where data
>>> otherwise would be. And I don't see why you'd need a doubly linked
>>> list, as the only operations that are needed are to push to the front of
>>> the list, and to pop from the front of the list - and both operations
>>> are simple to do with a singly linked list?
>>>
>>
>> Oh! I have not considered storing the chunk indexes (for linked lists) in
>> the chunk itself, which obviously eliminates the overhead concerns, and
>> you're right a singly-linked list should be enough.
>>
>> There will be some minimum-chunk-size requirement, depending on the block
>> size/chunk size. I wonder whether it makes sense to try to be smart and make
>> it dynamic, so that we only require 1B or 2B for cases when only that many
>> chunks fit into a block, or just say that it's 4B and be done with it.
>
> I doubt it's worth it - it seems likely that the added branch is more
> noticeable overall than the possible savings of 3 bytes. Also, won't the
> space be lost due to alignment *anyway*?
> + /* chunk, including SLAB header (both addresses nicely aligned) */
> + fullChunkSize = MAXALIGN(sizeof(SlabChunkData) + MAXALIGN(chunkSize));
>
> In that case I'd just Assert(MAXIMUM_ALIGNOF >= sizeof(slist_head)) and
> use a plain slist - no point in being more careful than that.
>

Hmm, I think you're right.

>
>> I mean 2^32 chunks ought to be enough for anyone, right?
>
> Yea, that seems enough; but given the alignment thing pointed out above,
> I think we can just use plain pointers - and that definitely should be
> enough :P
>

People in year 2078: Why the hell did they only use 32 bits? Wasn't it
obvious we'll have tiny computers with 32EB of RAM? ;-)

>
>> I'm still not buying the cache efficiency argument, though. One of
>> the reasons is that the implementation prefers blocks with fewer
>> free chunks when handling palloc(), so pfree() is making the block
>> less likely to be chosen by the next palloc().
>
> That'll possibly de-optimize L1, but for L2 usage the higher density
> seems like it'll be a win. All this memory is only accessed by a
> single backend, so packing as densely as possible is good.
>
>
>>> If so, if you have e.g. 8 byte allocations and 64kb sized blocks,
>>> you end up with an array of 1024 doubly linked lists, which'll
>>> take up 64kb on its own. And there a certainly scenarios where
>>> even bigger block sizes could make sense. That's both memory
>>> overhead, and runtime overhead, because at reset-time we'll have
>>> to check the whole array (which'll presumably largely be empty
>>> lists). Resetting is a pretty common path...
>>>
>>
>> True, but it's not entirely clear if resetting is common for the
paths where we use those new allocators.
>
> That's fair enough. There's still the memory overhead, but I guess
> we can also live with that.
>

Right. My ambition was not to develop another general-purpose memory
context that would work perfectly for everything, but something that
works (better than the current code) for places like reorderbuffer.

>
>> Also, if we accept that it might be a problem, what other solution you
>> propose? I don't think just merging everything into a single list is a good
>> idea, for the reasons I explained before (it might make the resets somewhat
>> less expensive, but it'll make pfree() more expensive).
>>
>
> Now that I think about it, a binary heap, as suggested elsewhere, isn't
> entirely trivial to use for this - it's more or less trivial to "fix"
> the heap after changing an element's value, but it's harder to find that
> element first.
>
> But a two-level list approach seems like it could work quite well -
> basically a simplified skip-list. A top-level list contains that has
> the property that all the elements have a distinct #free, and then
> hanging of those sub-lists for all the other blocks with the same number
> of chunks.
>
> I think that'd be a better implementation, but I can understand if you
> don't immediately want to go there.
>

I don't want to go there. I'm not all that interested in reorderbuffer,
to be honest, and this started more as "Hold my beer!" hack, after a
midnight discussion with Petr, than a seriously meant patch. I've
already spent like 100x time on it than I expected.

>
>> What might work is replacing the array with a list, though. So we'd have a
>> list of lists, which would eliminate the array overhead.
>
> That seems likely to be significantly worse, because a) iteration is
> more expensive b) accessing the relevant list to move between two
> different "freecount" lists would be O(n).
>

Oh, right, I haven't realized we won't know the current head of the
list, so we'd have to search for it. OTOH, we could replace it with a
small hash table, which would reduce the lookup time because we'd have
to search only in a single bin.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

From:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, John Gorman <johngorman2(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: PATCH: two slab-like memory allocators
Date:	2017-02-21 00:43:46
Message-ID:	b3b2245c-b37a-e1e5-ebc4-857c914bc747@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

Attached is v9 of this patch series. This addresses most of the points
raised in the review, namely:

1) change most 'debug' stuff to 'static inline' in memdebug.h

2) fixed and reworded a bunch of comments

3) get rid of the block-level bitmap tracking free chunks

Instead of the bitmap, I've used a simple singly-linked list, using
int32 chunk indexes. Perhaps it could use the slist instead, but I'm not
quite sure MAXALIGN is guaranteed to be greater than pointer.

In any case, this seems to be working reasonably well - it saves a bit
of code (but also made some code slightly more complex). Also, it seems
to be a tad faster than v8 - after repeating the same benchmark as
before, I get these numbers:

master slab-v8 slab-v9
-----------------------------------------
10000 50 28 25
50000 17500 180 160
100000 150000 380 330
200000 ? 750 670

Although the results are quite noisy.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment	Content-Type	Size
0001-move-common-bits-to-memdebug-v9.patch	text/x-diff	10.8 KB
0002-slab-allocator-v9.patch	text/x-diff	32.1 KB
0003-generational-context-v9.patch	text/x-diff	53.6 KB

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc:	Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, John Gorman <johngorman2(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: PATCH: two slab-like memory allocators
Date:	2017-02-24 22:10:38
Message-ID:	20170224221038.43xf2cyonczzr6hb@alap3.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

On 2017-02-21 01:43:46 +0100, Tomas Vondra wrote:
> Attached is v9 of this patch series. This addresses most of the points
> raised in the review, namely:

Cool, thanks.

> 3) get rid of the block-level bitmap tracking free chunks
>
> Instead of the bitmap, I've used a simple singly-linked list, using int32
> chunk indexes. Perhaps it could use the slist instead, but I'm not quite
> sure MAXALIGN is guaranteed to be greater than pointer.

I'm pretty sure it's guaranteed to be >= sizeof(void*). But this seems
ok, so ...

Attached are changes I made on the path to committing the patch. These
look large enough that I wanted to give you a chance to comment:

- The AllocFreeInfo changes aren't correct for 32bit systems with 64bit,
longs (like linux, unlike windows) - using %zu is better (z is code
for size_t sized)
- Split off the reorderbuffer changes
- Removed SlabBlock/SlabChunk / renamed SlabBlockData
-
+#ifdef MEMORY_CONTEXT_CHECKING
+#define SLAB_CHUNK_USED \
+ (offsetof(SlabChunkData, requested_size) + sizeof(Size))
+#else
doesn't work anymore, because requested_size isn't a top-level
field. I first redefined it as (without surrounding ifdef)
#define SLAB_CHUNK_USED \
(offsetof(SlabChunkData, header) + sizeof(StandardChunkHeader))
but I'm not really sure there's a whole lot of point in the define
rather than just using sizeof() on the whole thing - there's no trailing
padding?
- SLAB_CHUNK_PUBLIC and SLAB_BLOCKHDRSZ are unused?
- renamed 'set' variables (and comments) to slab.
- used castNode(SlabContext, context) instead of manual casts
- I rephrased
+ *
+ * We cache indexes of the first empty chunk on each block (firstFreeChunk),
+ * and freelist index for blocks with least free chunks (minFreeChunks), so
+ * that we don't have to search the freelist and block on every SlabAlloc()
+ * call, which is quite expensive.
so it's not referencing firstFreeChunk anymore, since that seems to
make less sense now that firstFreeChunk is essentially just the head
of the list of free chunks.
- added typedefs.list entries and pgindented slab.c
- "mark the chunk as unused (zero the bit)" referenced non-existant
bitmap afaics.
- valgrind was triggering on
block->firstFreeChunk = *(int32 *) SlabChunkGetPointer(chunk);
because that was previously marked as NOACCESS (via
wipe_mem). Explicitly marking as DEFINED solves that.
- removed
* XXX Perhaps we should not be gentle at all and simply fails in all cases,
* to eliminate the (mostly pointless) uncertainty.
- you'd included MemoryContext tup_context; in 0002, but it's not really
useful yet (and the comments above it in reorderbuffer.h don't apply)
- SlabFreeInfo/SlabAllocInfo didn't actually compile w/ HAVE_ALLOCINFO
defined (pre StandardChunkHeader def)
- some minor stuff.

I'm kinda inclined to drop SlabFreeInfo/SlabAllocInfo.

I've not yet looked a lot at the next type of context - I want to get
this much committed first...

I plan to give this another pass sometime this weekend and then push
soon.

- Andres

Attachment	Content-Type	Size
0001-Make-useful-infrastructure-from-aset.c-generally-ava.patch	text/x-patch	11.2 KB
0002-Add-Slab-MemoryContext-implementation-for-efficient-.patch	text/x-patch	26.9 KB
0003-Use-the-new-Slab-context-for-some-allocations-in-reo.patch	text/x-patch	5.5 KB

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc:	Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, John Gorman <johngorman2(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: PATCH: two slab-like memory allocators
Date:	2017-02-27 11:17:32
Message-ID:	20170227111732.vrx5v72ighehwpkf@alap3.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

On 2017-02-24 14:10:38 -0800, Andres Freund wrote:
> I've not yet looked a lot at the next type of context - I want to get
> this much committed first...
>
> I plan to give this another pass sometime this weekend and then push
> soon.

Before committing I wanted to make sure that
http://archives.postgresql.org/message-id/32354.1487977458%40sss.pgh.pa.us
isn't a sufficient fix.

With the test of N=1000000 from this thread I measured both runtime and
memory usage (note that's peak virtual memory which includes 2GB of
shared_buffers and such), in assert enabled builds.

master: doesn't finish reasonably
master+doubly linked list fix: 9390.805 ms VmPeak: 10,969,424 kb
master+this thread: 6500.293 ms VmPeak: 2,969,528 kB

So the doubly-linked-list fix is great (and much more backpatchable),
but the patches in this thread are both better runtime *and* peak memory
usage wise. So that seems like a clear call.

I've not yet reviewed the generational allocator yet, but during these
measurements I get:
postgres[3970][1]=# select count(*) FROM pg_logical_slot_get_changes('ttt', NULL, NULL);
WARNING: 01000: problem in Generation Tuples: number of free chunks 0 in block 0x55d011ef10f0 exceeds 7234 allocated
LOCATION: GenerationCheck, generation.c:693
WARNING: 01000: problem in Generation Tuples: number of free chunks 0 in block 0x55d01023eba0 exceeds 65532 allocated
LOCATION: GenerationCheck, generation.c:693
WARNING: 01000: problem in Generation Tuples: number of free chunks 0 in block 0x55d00d7fb870 exceeds 65532 allocated
LOCATION: GenerationCheck, generation.c:693
WARNING: 01000: problem in Generation Tuples: number of free chunks 0 in block 0x55d00cde17b0 exceeds 65531 allocated
LOCATION: GenerationCheck, generation.c:693

that seems to occur when there's currently in-progress transactions when
finishing decoding:

#0 GenerationCheck (context=0x5629129407c8)
at /home/andres/src/postgresql/src/backend/utils/mmgr/generation.c:692
#1 0x00005629105d92db in GenerationReset (context=0x5629129407c8)
at /home/andres/src/postgresql/src/backend/utils/mmgr/generation.c:255
#2 0x00005629105d93c6 in GenerationDelete (context=0x5629129407c8)
at /home/andres/src/postgresql/src/backend/utils/mmgr/generation.c:287
#3 0x00005629105e1e12 in MemoryContextDelete (context=0x5629129407c8)
at /home/andres/src/postgresql/src/backend/utils/mmgr/mcxt.c:225
#4 0x00005629105e1ee3 in MemoryContextDeleteChildren (context=0x562912940008)
at /home/andres/src/postgresql/src/backend/utils/mmgr/mcxt.c:245
#5 0x00005629105e1de0 in MemoryContextDelete (context=0x562912940008)
at /home/andres/src/postgresql/src/backend/utils/mmgr/mcxt.c:208
#6 0x00005629103d5451 in ReorderBufferFree (rb=0x562912906320)
at /home/andres/src/postgresql/src/backend/replication/logical/reorderbuffer.c:278
#7 0x00005629103cea4f in FreeDecodingContext (ctx=0x562912904310)
at /home/andres/src/postgresql/src/backend/replication/logical/logical.c:462
#8 0x00005629103d03f0 in pg_logical_slot_get_changes_guts (fcinfo=0x7fffc2042e50, confirm=0 '\000',

could it be that the test's condition is inverted?

I'll work on getting slab committed first, and then review / edit /
commit generation.c later. One first note there is that I'm wondering
if generation.c is a too generic filename.

- Andres

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc:	Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, John Gorman <johngorman2(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: PATCH: two slab-like memory allocators
Date:	2017-02-27 12:02:00
Message-ID:	20170227120200.5jc3ubg5jrqaznd7@alap3.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

On 2017-02-27 03:17:32 -0800, Andres Freund wrote:
> I'll work on getting slab committed first, and then review / edit /
> commit generation.c later. One first note there is that I'm wondering
> if generation.c is a too generic filename.

And pushed slab and its usage. Will have a look at generation.c
tomorrow.

- Andres

From:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, John Gorman <johngorman2(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: PATCH: two slab-like memory allocators
Date:	2017-02-27 14:11:40
Message-ID:	f33180b4-2396-68ab-4bfd-cd6fe5578f15@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 02/27/2017 12:17 PM, Andres Freund wrote:
> Hi,
>
> On 2017-02-24 14:10:38 -0800, Andres Freund wrote:
>> I've not yet looked a lot at the next type of context - I want to get
>> this much committed first...
>>
>> I plan to give this another pass sometime this weekend and then push
>> soon.
>
> Before committing I wanted to make sure that
> http://archives.postgresql.org/message-id/32354.1487977458%40sss.pgh.pa.us
> isn't a sufficient fix.
>
> With the test of N=1000000 from this thread I measured both runtime and
> memory usage (note that's peak virtual memory which includes 2GB of
> shared_buffers and such), in assert enabled builds.
>
> master: doesn't finish reasonably
> master+doubly linked list fix: 9390.805 ms VmPeak: 10,969,424 kb
> master+this thread: 6500.293 ms VmPeak: 2,969,528 kB
>
> So the doubly-linked-list fix is great (and much more backpatchable),
> but the patches in this thread are both better runtime *and* peak memory
> usage wise. So that seems like a clear call.
>

Nice, thanks for doing the test.

> I've not yet reviewed the generational allocator yet, but during these
> measurements I get:
> postgres[3970][1]=# select count(*) FROM pg_logical_slot_get_changes('ttt', NULL, NULL);
> WARNING: 01000: problem in Generation Tuples: number of free chunks 0 in block 0x55d011ef10f0 exceeds 7234 allocated
> LOCATION: GenerationCheck, generation.c:693
> WARNING: 01000: problem in Generation Tuples: number of free chunks 0 in block 0x55d01023eba0 exceeds 65532 allocated
> LOCATION: GenerationCheck, generation.c:693
> WARNING: 01000: problem in Generation Tuples: number of free chunks 0 in block 0x55d00d7fb870 exceeds 65532 allocated
> LOCATION: GenerationCheck, generation.c:693
> WARNING: 01000: problem in Generation Tuples: number of free chunks 0 in block 0x55d00cde17b0 exceeds 65531 allocated
> LOCATION: GenerationCheck, generation.c:693
>
> that seems to occur when there's currently in-progress transactions when
> finishing decoding:
>
...
>
> could it be that the test's condition is inverted?
>

Yeah, that seems like the culprit - the condition seems wrong. I wonder
why I haven't seen it during my tests, though ...

> I'll work on getting slab committed first, and then review / edit /
> commit generation.c later. One first note there is that I'm wondering
> if generation.c is a too generic filename.
>

Naming things is hard.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

From:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, John Gorman <johngorman2(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: PATCH: two slab-like memory allocators
Date:	2017-02-27 14:14:20
Message-ID:	2474a17d-9255-11a4-be40-15175aa8e168@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 02/27/2017 01:02 PM, Andres Freund wrote:
> Hi,
>
> On 2017-02-27 03:17:32 -0800, Andres Freund wrote:
>> I'll work on getting slab committed first, and then review / edit /
>> commit generation.c later. One first note there is that I'm wondering
>> if generation.c is a too generic filename.
>
> And pushed slab and its usage. Will have a look at generation.c
> tomorrow.
>
> - Andres
>

Gah. I don't want to annoy person who just committed my patch, but can
you give more time when asking for feedback? I mean, sending a modified
patch on Friday midnight, and committing on Monday noon does not really
give much time to look at it.

The changes seem fine to me, thanks for spending time on this.

Thanks

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	pgsql-hackers(at)postgresql(dot)org,Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc:	Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>,Petr Jelinek <petr(at)2ndquadrant(dot)com>,Robert Haas <robertmhaas(at)gmail(dot)com>,John Gorman <johngorman2(at)gmail(dot)com>,pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: PATCH: two slab-like memory allocators
Date:	2017-02-27 15:07:28
Message-ID:	E0049147-F780-4BBB-B6DD-82944FC5A218@anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On February 27, 2017 6:14:20 AM PST, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>On 02/27/2017 01:02 PM, Andres Freund wrote:
>> Hi,
>>
>> On 2017-02-27 03:17:32 -0800, Andres Freund wrote:
>>> I'll work on getting slab committed first, and then review / edit /
>>> commit generation.c later. One first note there is that I'm
>wondering
>>> if generation.c is a too generic filename.
>>
>> And pushed slab and its usage. Will have a look at generation.c
>> tomorrow.
>>
>> - Andres
>>
>
>Gah. I don't want to annoy person who just committed my patch, but can
>you give more time when asking for feedback? I mean, sending a modified
>
>patch on Friday midnight, and committing on Monday noon does not really
>
>give much time to look at it.

Hm. The changes IMO weren't controversial (or surprising -most of them I had announced previously); I announced that I would when posting the review that I'd push the patch later that weekend. If I hadn't been tired after doing the review/editing I'd have just pushed right then and there. It's hard to find time and attention, so not introducing a week of feedback time is quite worthwhile. I listed the changes I made primarily for posterities sake. Most if not all committers make editorializing changed around commit, so that's not just me.

If you specifically want I can try to give you more time to look at an edited patch, but that'll mean things move slower. I won't promise not to make minor changed just before commit either way, I always do another round of review just before push.

Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, John Gorman <johngorman2(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: PATCH: two slab-like memory allocators
Date:	2017-02-27 15:32:25
Message-ID:	26061.1488209545@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Andres Freund <andres(at)anarazel(dot)de> writes:
> And pushed slab and its usage. Will have a look at generation.c
> tomorrow.

Perhaps first you need to find out why so much of the buildfarm
is unhappy.

regards, tom lane

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, John Gorman <johngorman2(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: PATCH: two slab-like memory allocators
Date:	2017-02-27 15:55:32
Message-ID:	20170227155532.o7casczq5parlmz4@alap3.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2017-02-27 10:32:25 -0500, Tom Lane wrote:
> Andres Freund <andres(at)anarazel(dot)de> writes:
> > And pushed slab and its usage. Will have a look at generation.c
> > tomorrow.
>
> Perhaps first you need to find out why so much of the buildfarm
> is unhappy.

Will do, after a morning coffee.

- Andres

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, John Gorman <johngorman2(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: PATCH: two slab-like memory allocators
Date:	2017-02-27 16:48:28
Message-ID:	20170227164828.akgzeg6ggl2v4x7s@alap3.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2017-02-27 07:55:32 -0800, Andres Freund wrote:
> On 2017-02-27 10:32:25 -0500, Tom Lane wrote:
> > Andres Freund <andres(at)anarazel(dot)de> writes:
> > > And pushed slab and its usage. Will have a look at generation.c
> > > tomorrow.
> >
> > Perhaps first you need to find out why so much of the buildfarm
> > is unhappy.
>
> Will do, after a morning coffee.

Hm. Not entirely clear on what's going on yet. I've run the tests on
hydra (community ppc 64 machine), which is pretty similar to termite
which failed [1] with:
TRAP: BadArgument("!(((context) != ((void *)0) && (((((const Node*)((context)))->type) == T_AllocSetContext) || ((((const Node*)((context)))->type) == T_SlabContext))))", File: "/home/pgbuildfarm/buildroot-termite/HEAD/pgsql.build/../pgsql/src/backend/utils/mmgr/mcxt.c", Line: 1010)

The best theory I have so far that I have is that slab.c's idea of
StandardChunkHeader's size doesn't match what mcxt.c think it is
(because slab.c simply embeds StandardChunkHeader, but mcxt uses
MAXALIGN(sizeof(StandardChunkHeader))). That's not good, but I don't
quite see how that'd cause the issue, since StandardChunkHeader's size
should always be properly sized.

Tomas, do you have access to termite (which appears to be run by Craig,
under company mail).

If not, I can push a "blind" fix, but I'd rather have more information.

Greetings,

Andres Freund

[1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=termite&dt=2017-02-27%2014%3A00%3A06

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc:	Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, John Gorman <johngorman2(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: PATCH: two slab-like memory allocators
Date:	2017-02-27 16:51:38
Message-ID:	20170227165138.h4tlklpjhiexotgo@alap3.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

On 2017-02-27 15:11:40 +0100, Tomas Vondra wrote:
> > I've not yet reviewed the generational allocator yet, but during these
> > measurements I get:
> > postgres[3970][1]=# select count(*) FROM pg_logical_slot_get_changes('ttt', NULL, NULL);
> > WARNING: 01000: problem in Generation Tuples: number of free chunks 0 in block 0x55d011ef10f0 exceeds 7234 allocated
> > LOCATION: GenerationCheck, generation.c:693
> > WARNING: 01000: problem in Generation Tuples: number of free chunks 0 in block 0x55d01023eba0 exceeds 65532 allocated
> > LOCATION: GenerationCheck, generation.c:693
> > WARNING: 01000: problem in Generation Tuples: number of free chunks 0 in block 0x55d00d7fb870 exceeds 65532 allocated
> > LOCATION: GenerationCheck, generation.c:693
> > WARNING: 01000: problem in Generation Tuples: number of free chunks 0 in block 0x55d00cde17b0 exceeds 65531 allocated
> > LOCATION: GenerationCheck, generation.c:693
> >
> > that seems to occur when there's currently in-progress transactions when
> > finishing decoding:
> >
> ...
> >
> > could it be that the test's condition is inverted?
> >
>
> Yeah, that seems like the culprit - the condition seems wrong. I wonder why
> I haven't seen it during my tests, though ...

I suspect it's because your tests only triggered a memory context reset
when it was empty... But I ran decoding while a concurrent write
transaction was ongoing...

> > I'll work on getting slab committed first, and then review / edit /
> > commit generation.c later. One first note there is that I'm wondering
> > if generation.c is a too generic filename.

> Naming things is hard.

Indeed. I was thinking of genalloc, but that might be understood as
general, rather generational...

Greetings,

Andres Freund

From:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To:	Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, John Gorman <johngorman2(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: PATCH: two slab-like memory allocators
Date:	2017-02-27 16:56:08
Message-ID:	1632374b-7ef6-0105-0fdc-d750987dc272@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 02/27/2017 05:48 PM, Andres Freund wrote:
> On 2017-02-27 07:55:32 -0800, Andres Freund wrote:
>> On 2017-02-27 10:32:25 -0500, Tom Lane wrote:
>>> Andres Freund <andres(at)anarazel(dot)de> writes:
>>>> And pushed slab and its usage. Will have a look at generation.c
>>>> tomorrow.
>>>
>>> Perhaps first you need to find out why so much of the buildfarm
>>> is unhappy.
>>
>> Will do, after a morning coffee.
>
> Hm. Not entirely clear on what's going on yet. I've run the tests on
> hydra (community ppc 64 machine), which is pretty similar to termite
> which failed [1] with:
> TRAP: BadArgument("!(((context) != ((void *)0) && (((((const Node*)((context)))->type) == T_AllocSetContext) || ((((const Node*)((context)))->type) == T_SlabContext))))", File: "/home/pgbuildfarm/buildroot-termite/HEAD/pgsql.build/../pgsql/src/backend/utils/mmgr/mcxt.c", Line: 1010)
>
> The best theory I have so far that I have is that slab.c's idea of
> StandardChunkHeader's size doesn't match what mcxt.c think it is
> (because slab.c simply embeds StandardChunkHeader, but mcxt uses
> MAXALIGN(sizeof(StandardChunkHeader))). That's not good, but I don't
> quite see how that'd cause the issue, since StandardChunkHeader's size
> should always be properly sized.
>
> Tomas, do you have access to termite (which appears to be run by Craig,
> under company mail).
>

No, I don't, but I'll ping Craig. I might ping him, but it's ~4AM in
Australia, though, so it'll take time.

FWIW I think the ppc64 machines are failing because of unrelated issue
(changes to integer timestamps). We should probably look at 32bit
machines first.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, John Gorman <johngorman2(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: PATCH: two slab-like memory allocators
Date:	2017-02-27 17:00:09
Message-ID:	20170227170009.pr2hlyyf4yjp4kjg@alap3.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

On 2017-02-27 17:56:08 +0100, Tomas Vondra wrote:
> No, I don't, but I'll ping Craig. I might ping him, but it's ~4AM in
> Australia, though, so it'll take time.

Did the same... ;)

> FWIW I think the ppc64 machines are failing because of unrelated issue
> (changes to integer timestamps). We should probably look at 32bit machines
> first.

Don't think so - termite is ppc64 afaics, and the failure doesn't look
integer timestamp related (assert failure is clearly about this, and set
of changed commits *only* include slab related commits).

Greetings,

Andres Freund

From:	Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>
To:	Andres Freund <andres(at)anarazel(dot)de>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, John Gorman <johngorman2(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: PATCH: two slab-like memory allocators
Date:	2017-02-27 17:04:41
Message-ID:	04add4c6-c33e-359c-f244-e68f2a1bd488@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 27/02/17 18:00, Andres Freund wrote:
>
>> FWIW I think the ppc64 machines are failing because of unrelated issue
>> (changes to integer timestamps). We should probably look at 32bit machines
>> first.
>
> Don't think so - termite is ppc64 afaics, and the failure doesn't look
> integer timestamp related (assert failure is clearly about this, and set
> of changed commits *only* include slab related commits).
>

termite is ppc64 but with 4 byte pointer size according to configure so
it might be related to that perhaps?

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, John Gorman <johngorman2(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: PATCH: two slab-like memory allocators
Date:	2017-02-27 17:19:16
Message-ID:	12223.1488215956@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Andres Freund <andres(at)anarazel(dot)de> writes:
>> FWIW I think the ppc64 machines are failing because of unrelated issue
>> (changes to integer timestamps). We should probably look at 32bit machines
>> first.

> Don't think so - termite is ppc64 afaics, and the failure doesn't look
> integer timestamp related (assert failure is clearly about this, and set
> of changed commits *only* include slab related commits).

There are a couple of animals that have --disable-integer-datetimes,
but those are failing at the configure stage, and were doing so
before today.

regards, tom lane

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, John Gorman <johngorman2(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: PATCH: two slab-like memory allocators
Date:	2017-02-27 17:27:48
Message-ID:	12622.1488216468@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Andres Freund <andres(at)anarazel(dot)de> writes:
> The best theory I have so far that I have is that slab.c's idea of
> StandardChunkHeader's size doesn't match what mcxt.c think it is
> (because slab.c simply embeds StandardChunkHeader, but mcxt uses
> MAXALIGN(sizeof(StandardChunkHeader))). That's not good, but I don't
> quite see how that'd cause the issue, since StandardChunkHeader's size
> should always be properly sized.

Uh, wrong. On a 32-bit machine with debug enabled, StandardChunkHeader
will contain 3 4-byte fields. However, there are some such machines on
which MAXALIGN is 8. For example, looking at termite's configure
output:

checking size of void *... 4
checking size of size_t... 4
checking size of long... 4
checking alignment of short... 2
checking alignment of int... 4
checking alignment of long... 4
checking alignment of long long int... 8
checking alignment of double... 8

axolotl's output looks similar. I expect my old HPPA dinosaur
will show the failure as well.

regards, tom lane

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>
Cc:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, John Gorman <johngorman2(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: PATCH: two slab-like memory allocators
Date:	2017-02-27 17:40:25
Message-ID:	20170227174025.uqduky7o4rkt7hvq@alap3.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2017-02-27 18:04:41 +0100, Petr Jelinek wrote:
> On 27/02/17 18:00, Andres Freund wrote:
> >
> >> FWIW I think the ppc64 machines are failing because of unrelated issue
> >> (changes to integer timestamps). We should probably look at 32bit machines
> >> first.
> >
> > Don't think so - termite is ppc64 afaics, and the failure doesn't look
> > integer timestamp related (assert failure is clearly about this, and set
> > of changed commits *only* include slab related commits).
> >
>
> termite is ppc64 but with 4 byte pointer size according to configure so
> it might be related to that perhaps?

Uh, ok. I checked the --configure options, but not the actual configure
output (blame -ENOCOFEE and jetlag). The output makes it fairly likely
that my StandardChunkHeader theory is valid, so I'll work on a patch to
clean that up.

Greetings,

Andres Freund

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, John Gorman <johngorman2(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: PATCH: two slab-like memory allocators
Date:	2017-02-27 17:42:01
Message-ID:	20170227174201.yrofai7neyp6shb2@alap3.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2017-02-27 12:27:48 -0500, Tom Lane wrote:
> Andres Freund <andres(at)anarazel(dot)de> writes:
> > The best theory I have so far that I have is that slab.c's idea of
> > StandardChunkHeader's size doesn't match what mcxt.c think it is
> > (because slab.c simply embeds StandardChunkHeader, but mcxt uses
> > MAXALIGN(sizeof(StandardChunkHeader))). That's not good, but I don't
> > quite see how that'd cause the issue, since StandardChunkHeader's size
> > should always be properly sized.
>
> Uh, wrong. On a 32-bit machine with debug enabled, StandardChunkHeader
> will contain 3 4-byte fields. However, there are some such machines on
> which MAXALIGN is 8. For example, looking at termite's configure
> output:
>
> checking size of void *... 4
> checking size of size_t... 4
> checking size of long... 4
> checking alignment of short... 2
> checking alignment of int... 4
> checking alignment of long... 4
> checking alignment of long long int... 8
> checking alignment of double... 8
>
> axolotl's output looks similar. I expect my old HPPA dinosaur
> will show the failure as well.

Yea, I hadn't yet realized when writing that that termite actually,
despite running on ppc64, compiles a 32bit postgres. Will thus
duplicate StandardChunkHeader's contents in to slab.c :( - I don't see
an easy way around that...

From:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To:	Andres Freund <andres(at)anarazel(dot)de>, Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, John Gorman <johngorman2(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: PATCH: two slab-like memory allocators
Date:	2017-02-27 17:46:06
Message-ID:	66e4c315-27a5-b3ce-8435-fcd666b04bb3@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 02/27/2017 06:40 PM, Andres Freund wrote:
> On 2017-02-27 18:04:41 +0100, Petr Jelinek wrote:
>> On 27/02/17 18:00, Andres Freund wrote:
>>>
>>>> FWIW I think the ppc64 machines are failing because of unrelated issue
>>>> (changes to integer timestamps). We should probably look at 32bit machines
>>>> first.
>>>
>>> Don't think so - termite is ppc64 afaics, and the failure doesn't look
>>> integer timestamp related (assert failure is clearly about this, and set
>>> of changed commits *only* include slab related commits).
>>>
>>
>> termite is ppc64 but with 4 byte pointer size according to configure so
>> it might be related to that perhaps?
>
> Uh, ok. I checked the --configure options, but not the actual configure
> output (blame -ENOCOFEE and jetlag). The output makes it fairly likely
> that my StandardChunkHeader theory is valid, so I'll work on a patch to
> clean that up.
>

Thanks. I set up a rpi3 machine (amrv7l) that fails with the same issue,
so if you need to test the patch, let me know.

While building, I've also noticed a bunch of warnings about string
formatting, attached is a patch that that fixes those.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment	Content-Type	Size
slab-warnings.patch	text/x-diff	887 bytes

From:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To:	Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org
Cc:	Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, John Gorman <johngorman2(at)gmail(dot)com>
Subject:	Re: PATCH: two slab-like memory allocators
Date:	2017-02-27 17:55:21
Message-ID:	03cc6ca8-bdd8-b53e-93e4-757cd80e1ce8@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 02/27/2017 04:07 PM, Andres Freund wrote:
>
>
> On February 27, 2017 6:14:20 AM PST, Tomas Vondra
> <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>> On 02/27/2017 01:02 PM, Andres Freund wrote:
>>> Hi,
>>>
>>> On 2017-02-27 03:17:32 -0800, Andres Freund wrote:
>>>> I'll work on getting slab committed first, and then review /
>>>> edit / commit generation.c later. One first note there is that
>>>> I'm
>> wondering
>>>> if generation.c is a too generic filename.
>>>
>>> And pushed slab and its usage. Will have a look at generation.c
>>> tomorrow.
>>>
>>> - Andres
>>>
>>
>> Gah. I don't want to annoy person who just committed my patch, but
>> can you give more time when asking for feedback? I mean, sending a
>> modified
>>
>> patch on Friday midnight, and committing on Monday noon does not
>> really
>>
>> give much time to look at it.
>
> Hm. The changes IMO weren't controversial (or surprising -most of
> them I had announced previously); I announced that I would when
> posting the review that I'd push the patch later that weekend. If I
> hadn't been tired after doing the review/editing I'd have just pushed
> right then and there. It's hard to find time and attention, so not
> introducing a week of feedback time is quite worthwhile. I listed
> the changes I made primarily for posterities sake. Most if not all
> committers make editorializing changed around commit, so that's not
> just me.
>
> If you specifically want I can try to give you more time to look at
> an edited patch, but that'll mean things move slower. I won't
> promise not to make minor changed just before commit either way, I
> always do another round of review just before push.
>

I also agree the changes are not particularly controversial, but then
why to ask for comments at all? I'm OK with a committer making final
tweaks and pushing it without asking, but if you ask for comments, let's
give people time to actually respond.

I agree introducing weeks of delays would be silly, but I'm not asking
for that. I'm perfectly fine with two days for feedback, as long as it's
not a weekend + half of Monday.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

From:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To:	Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, John Gorman <johngorman2(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: PATCH: two slab-like memory allocators
Date:	2017-02-28 00:44:42
Message-ID:	c7c1bb85-d1d0-05a5-4d4c-3047d6f053e7@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 02/27/2017 06:42 PM, Andres Freund wrote:
> On 2017-02-27 12:27:48 -0500, Tom Lane wrote:
>> Andres Freund <andres(at)anarazel(dot)de> writes:
>>> The best theory I have so far that I have is that slab.c's idea of
>>> StandardChunkHeader's size doesn't match what mcxt.c think it is
>>> (because slab.c simply embeds StandardChunkHeader, but mcxt uses
>>> MAXALIGN(sizeof(StandardChunkHeader))). That's not good, but I don't
>>> quite see how that'd cause the issue, since StandardChunkHeader's size
>>> should always be properly sized.
>>
>> Uh, wrong. On a 32-bit machine with debug enabled, StandardChunkHeader
>> will contain 3 4-byte fields. However, there are some such machines on
>> which MAXALIGN is 8. For example, looking at termite's configure
>> output:
>>
>> checking size of void *... 4
>> checking size of size_t... 4
>> checking size of long... 4
>> checking alignment of short... 2
>> checking alignment of int... 4
>> checking alignment of long... 4
>> checking alignment of long long int... 8
>> checking alignment of double... 8
>>
>> axolotl's output looks similar. I expect my old HPPA dinosaur
>> will show the failure as well.
>
> Yea, I hadn't yet realized when writing that that termite actually,
> despite running on ppc64, compiles a 32bit postgres. Will thus
> duplicate StandardChunkHeader's contents in to slab.c :( - I don't
> see an easy way around that...
>

I've tried this - essentially copying the StandardChunkHeader's contents
into SlabChunk, but that does not seem to do the trick, sadly. Per
pahole, the structures then (at least on armv7l) look like this:

struct SlabChunk {
void * block; /* 0 4 */
MemoryContext context; /* 4 4 */
Size size; /* 8 4 */
Size requested_size; /* 12 4 */

/* size: 16, cachelines: 1, members: 4 */
/* last cacheline: 16 bytes */
};

struct StandardChunkHeader {
MemoryContext context; /* 0 4 */
Size size; /* 4 4 */
Size requested_size; /* 8 4 */

/* size: 12, cachelines: 1, members: 3 */
/* last cacheline: 12 bytes */
};

So SlabChunk happens to be perfectly aligned (MAXIMUM_ALIGNOF=8), and so
pfree() grabs the block pointer but thinks it's the context :-(

Not sure what to do about this - the only thing I can think about is
splitting SlabChunk into two separate structures, and align them
independently.

The attached patch does that - it probably needs a bit more work on the
comments to make it commit-ready, but it fixes the test_deconding tests
on the rpi3 board I'm using for testing.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment	Content-Type	Size
slab-fix.patch	binary/octet-stream	7.3 KB

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, John Gorman <johngorman2(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: PATCH: two slab-like memory allocators
Date:	2017-02-28 03:42:32
Message-ID:	20170228034232.4h7ouyhxq4lbbao2@alap3.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2017-02-28 01:44:42 +0100, Tomas Vondra wrote:
> On 02/27/2017 06:42 PM, Andres Freund wrote:
> > Yea, I hadn't yet realized when writing that that termite actually,
> > despite running on ppc64, compiles a 32bit postgres. Will thus
> > duplicate StandardChunkHeader's contents in to slab.c :( - I don't
> > see an easy way around that...
>
> I've tried this - essentially copying the StandardChunkHeader's contents
> into SlabChunk, but that does not seem to do the trick, sadly. Per pahole,
> the structures then (at least on armv7l) look like this:
>
> struct SlabChunk {
> void * block; /* 0 4 */
> MemoryContext context; /* 4 4 */
> Size size; /* 8 4 */
> Size requested_size; /* 12 4 */
>
> /* size: 16, cachelines: 1, members: 4 */
> /* last cacheline: 16 bytes */
> };
>
> struct StandardChunkHeader {
> MemoryContext context; /* 0 4 */
> Size size; /* 4 4 */
> Size requested_size; /* 8 4 */
>
> /* size: 12, cachelines: 1, members: 3 */
> /* last cacheline: 12 bytes */
> };

> So SlabChunk happens to be perfectly aligned (MAXIMUM_ALIGNOF=8), and so
> pfree() grabs the block pointer but thinks it's the context :-(

Hm. The only way I can think of to do achieve the right thing here would
be something like:

typedef struct StandardChunkHeader
{
MemoryContext context; /* owning context */
Size size; /* size of data space allocated in chunk */
#ifdef MEMORY_CONTEXT_CHECKING
/* when debugging memory usage, also store actual requested size */
Size requested_size;
#endif
union
{
char *data;
/* ensure MAXALIGNed */
int64 alignof_int64;
double alignof_double;
} d;
} StandardChunkHeader;

typedef struct SlabChunk
{
void *block;
StandardChunkHeader header;
} SlabChunk;

That's not overly pretty, but also not absolutely disgusting. Unifying
the padding calculations between allocators would be a nice side-effect.
Note we at least previously had such union/double tricks in the tree, via
http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=e1a11d93111ff3fba7a91f3f2ac0b0aca16909a8

It might be a good idea to have configure define maxaligned_type instead
of including both int64/double (although it'll IIRC practically always
be double that's maxaligned).

Independently of this, we really should redefine StandardChunkHeader to
be only the MemoryContext. There's no need to have size/requested_size
part of StandardChunkHeader, given there's
MemoryContextMethods->get_chunk_space().

> Not sure what to do about this - the only thing I can think about is
> splitting SlabChunk into two separate structures, and align them
> independently.
>
> The attached patch does that - it probably needs a bit more work on the
> comments to make it commit-ready, but it fixes the test_deconding tests on
> the rpi3 board I'm using for testing.

That'd work as well, although at the very least I'd want to add a
comment explaining the actual memory layout somewhere - this is a bit
too finnicky to expect to get right the next time round.

Any preferences?

Greetings,

Andres Freund

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, John Gorman <johngorman2(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: PATCH: two slab-like memory allocators
Date:	2017-02-28 03:57:24
Message-ID:	1240.1488254244@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Andres Freund <andres(at)anarazel(dot)de> writes:
> Independently of this, we really should redefine StandardChunkHeader to
> be only the MemoryContext. There's no need to have size/requested_size
> part of StandardChunkHeader, given there's
> MemoryContextMethods->get_chunk_space().

Yeah, perhaps. The trick would be to get things laid out so that the
MemoryContext pointer is always immediately adjacent to the chunk data
(no padding between).

One could imagine redefining aset.c's chunk header along the lines of

typedef struct AllocSetChunkHeader
{
Size size; /* size of data space allocated in chunk */
#ifdef MEMORY_CONTEXT_CHECKING
Size requested_size; /* original request size */
#if 32-bit-but-maxalign-is-8
Size padding; /* needed to avoid padding below */
#endif
#endif
MemoryContext context; /* owning context */
/* there must not be any padding to reach a MAXALIGN boundary here! */
} AllocSetChunkHeader;

where we'd possibly need some help from configure to implement that inner
#if condition, but it seems doable enough.

If the slab allocator would be happier with just a MemoryContext pointer
as chunk header, I think we should push in this direction rather than
invent some short-term hack.

regards, tom lane

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, John Gorman <johngorman2(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: PATCH: two slab-like memory allocators
Date:	2017-02-28 05:16:30
Message-ID:	20170228051630.3qcevakcaaa6b4ef@alap3.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

On 2017-02-27 22:57:24 -0500, Tom Lane wrote:
> If the slab allocator would be happier with just a MemoryContext pointer
> as chunk header, I think we should push in this direction rather than
> invent some short-term hack.

It would - it really doesn't need the size, because it's the same for
the whole context, and thereby is a waste of space. Still wondering if
we should band-aid this till that's done.

> One could imagine redefining aset.c's chunk header along the lines of
>
> typedef struct AllocSetChunkHeader
> {
> Size size; /* size of data space allocated in chunk */
> #ifdef MEMORY_CONTEXT_CHECKING
> Size requested_size; /* original request size */
> #if 32-bit-but-maxalign-is-8
> Size padding; /* needed to avoid padding below */
> #endif
> #endif
> MemoryContext context; /* owning context */
> /* there must not be any padding to reach a MAXALIGN boundary here! */
> } AllocSetChunkHeader;
>
> where we'd possibly need some help from configure to implement that inner
> #if condition, but it seems doable enough.

Hm, that should be doable with something like
#if MAXIMUM_ALIGNOF > 4 && SIZEOF_VOID_P == 4

which'd probably be better documentation than a macro that hides this
(arguing internally whether SIZEOF_VOID_P or SIZEOF_SIZE_T) is better.

Working on a patch now, will post but not push tonight.

Greetings,

Andres Freund

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, John Gorman <johngorman2(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: PATCH: two slab-like memory allocators
Date:	2017-02-28 05:29:44
Message-ID:	1671.1488259784@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Andres Freund <andres(at)anarazel(dot)de> writes:
> Hm, that should be doable with something like
> #if MAXIMUM_ALIGNOF > 4 && SIZEOF_VOID_P == 4
> which'd probably be better documentation than a macro that hides this
> (arguing internally whether SIZEOF_VOID_P or SIZEOF_SIZE_T) is better.

Not sure either, but suggest we add a StaticAssert asserting there's no
padding; something along the lines of
offsetof(AllocSetChunkHeader, context) + sizeof(MemoryContext) == MAXALIGN(sizeof(AllocSetChunkHeader))

regards, tom lane

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, John Gorman <johngorman2(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: PATCH: two slab-like memory allocators
Date:	2017-02-28 07:44:20
Message-ID:	20170228074420.aazv4iw6k562mnxg@alap3.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

*preliminary* patch attached. This needs a good bit of polishing
(primarily comment work, verifying that valgrind works), but I'm too
tired now.

I'm not quite sure how to deal with mmgr/README - it's written as kind
of a historical document, and the "Mechanisms to Allow Multiple Types of
Contexts" is already quite out of date. I think it'd be good to rip out
all the historical references and just describe the current state, but
I'm not really enthusiastic about tackling that :/

On 2017-02-28 00:29:44 -0500, Tom Lane wrote:
> Not sure either, but suggest we add a StaticAssert asserting there's no
> padding; something along the lines of
> offsetof(AllocSetChunkHeader, context) + sizeof(MemoryContext) == MAXALIGN(sizeof(AllocSetChunkHeader))

Included.

With Craig's help I've verified that the patch as attached works on a
platform that's currently failing. Thanks!

- Andres

Attachment	Content-Type	Size
0001-Remove-StandardChunkHeader-for-Slab-s-benefit.patch	text/x-patch	19.0 KB

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, John Gorman <johngorman2(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: PATCH: two slab-like memory allocators
Date:	2017-02-28 18:41:22
Message-ID:	20170228184122.extbigaer5ryrt3z@alap3.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

On 2017-02-27 23:44:20 -0800, Andres Freund wrote:
> *preliminary* patch attached. This needs a good bit of polishing
> (primarily comment work, verifying that valgrind works), but I'm too
> tired now.
>
> I'm not quite sure how to deal with mmgr/README - it's written as kind
> of a historical document, and the "Mechanisms to Allow Multiple Types of
> Contexts" is already quite out of date. I think it'd be good to rip out
> all the historical references and just describe the current state, but
> I'm not really enthusiastic about tackling that :/

While still not enthusiastic, I took a stab at doing so. While still
not perfect, I do think this is an improvement.

Is anybody uncomfortable going away from the current historical account
style?

- Andres

Attachment	Content-Type	Size
0002-Overhaul-mmgr-s-README.patch	text/x-patch	19.7 KB

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, John Gorman <johngorman2(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: PATCH: two slab-like memory allocators
Date:	2017-03-01 04:18:35
Message-ID:	20170301041835.wli3viwnxr5wonfv@alap3.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2017-02-28 10:41:22 -0800, Andres Freund wrote:
> Hi,
>
> On 2017-02-27 23:44:20 -0800, Andres Freund wrote:
> > *preliminary* patch attached. This needs a good bit of polishing
> > (primarily comment work, verifying that valgrind works), but I'm too
> > tired now.
> >
> > I'm not quite sure how to deal with mmgr/README - it's written as kind
> > of a historical document, and the "Mechanisms to Allow Multiple Types of
> > Contexts" is already quite out of date. I think it'd be good to rip out
> > all the historical references and just describe the current state, but
> > I'm not really enthusiastic about tackling that :/
>
> While still not enthusiastic, I took a stab at doing so. While still
> not perfect, I do think this is an improvement.
>
> Is anybody uncomfortable going away from the current historical account
> style?

I've pushed these now. I'm not claiming that the README revision is
perfect, but we can incremently improve it further...

- Andres, hoping the buildfarm turns greener

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, John Gorman <johngorman2(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: PATCH: two slab-like memory allocators
Date:	2017-03-01 04:29:36
Message-ID:	20170301042936.usv23f4iaumeyptf@alap3.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2017-02-28 20:18:35 -0800, Andres Freund wrote:
> - Andres, hoping the buildfarm turns greener

Oh well, that didn't work. Investigating.

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, John Gorman <johngorman2(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: PATCH: two slab-like memory allocators
Date:	2017-03-01 22:55:45
Message-ID:	20170301225545.eoevemua234g4fzq@alap3.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2017-02-28 20:29:36 -0800, Andres Freund wrote:
> On 2017-02-28 20:18:35 -0800, Andres Freund wrote:
> > - Andres, hoping the buildfarm turns greener
>
> Oh well, that didn't work. Investigating.

The fix for that was fairly trivial, and the buildfarm has cooled down.

The issue was that on 32bit platforms the Datum returned by some
functions (int2int4_sum in this case) isn't actually a separately
allocated Datum, but rather just something embedded in a larger
struct. That, combined with the following code:
if (!peraggstate->resulttypeByVal && !*isnull &&
!MemoryContextContains(CurrentMemoryContext,
DatumGetPointer(*result)))
seems somewhat problematic to me. MemoryContextContains() can give
false positives when used on memory that's not a distinctly allocated
chunk, and if so, we violate memory lifetime rules. It's quite
unlikely, given the required bit patterns, but nonetheless it's making
me somewhat uncomfortable.

Do others think this isn't an issue and we can just live with it?

Regards,

Andres

From:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To:	Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, John Gorman <johngorman2(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: PATCH: two slab-like memory allocators
Date:	2017-03-02 03:36:23
Message-ID:	11adca69-be28-44bc-a801-64e6d53851e3@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 03/01/2017 05:18 AM, Andres Freund wrote:
> On 2017-02-28 10:41:22 -0800, Andres Freund wrote:
>> Hi,
>>
>> On 2017-02-27 23:44:20 -0800, Andres Freund wrote:
>>> *preliminary* patch attached. This needs a good bit of polishing
>>> (primarily comment work, verifying that valgrind works), but I'm too
>>> tired now.
>>>
>>> I'm not quite sure how to deal with mmgr/README - it's written as kind
>>> of a historical document, and the "Mechanisms to Allow Multiple Types of
>>> Contexts" is already quite out of date. I think it'd be good to rip out
>>> all the historical references and just describe the current state, but
>>> I'm not really enthusiastic about tackling that :/
>>
>> While still not enthusiastic, I took a stab at doing so. While still
>> not perfect, I do think this is an improvement.
>>
>> Is anybody uncomfortable going away from the current historical account
>> style?
>
> I've pushed these now. I'm not claiming that the README revision is
> perfect, but we can incremently improve it further...
>

Thanks. I went through the README and it definitely looks better now.

I've noticed two minor typos:

1) That is solved this by creating ...
- extra "this"

2) Given this, routines like pfree their corresponding context ...
- missing "find" or "determine"

I also see you've explicitly mentioned the callbacks were added in 9.5.
Doesn't that somewhat reintroduce the historical account?

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

From:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To:	Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, John Gorman <johngorman2(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: PATCH: two slab-like memory allocators
Date:	2017-03-02 03:47:13
Message-ID:	c5cce00e-8a10-449e-33be-67c0ea1009ef@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 03/01/2017 11:55 PM, Andres Freund wrote:
> On 2017-02-28 20:29:36 -0800, Andres Freund wrote:
>> On 2017-02-28 20:18:35 -0800, Andres Freund wrote:
>>> - Andres, hoping the buildfarm turns greener
>>
>> Oh well, that didn't work. Investigating.
>
> The fix for that was fairly trivial, and the buildfarm has cooled down.
>
> The issue was that on 32bit platforms the Datum returned by some
> functions (int2int4_sum in this case) isn't actually a separately
> allocated Datum, but rather just something embedded in a larger
> struct. That, combined with the following code:
> if (!peraggstate->resulttypeByVal && !*isnull &&
> !MemoryContextContains(CurrentMemoryContext,
> DatumGetPointer(*result)))
> seems somewhat problematic to me. MemoryContextContains() can give
> false positives when used on memory that's not a distinctly allocated
> chunk, and if so, we violate memory lifetime rules. It's quite
> unlikely, given the required bit patterns, but nonetheless it's making
> me somewhat uncomfortable.
>

I assume you're only using that code snippet as an example of code that
might be broken by MemoryContextContains() false positives, right?

(I don't see how the slab allocator could interfere with aggregates, as
it's only used for reorderbuffer.c).

>
> Do others think this isn't an issue and we can just live with it?
>

My understanding is all the places calling MemoryContextContains()
assume they can't receive memory not allocated as a simple chunk by
palloc(). If that's not the case, it's likely broken.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, John Gorman <johngorman2(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: PATCH: two slab-like memory allocators
Date:	2017-03-02 06:19:30
Message-ID:	20170302061930.ofjfbecdbdchn76s@alap3.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2017-03-02 04:36:23 +0100, Tomas Vondra wrote:
> I've noticed two minor typos:
>
> 1) That is solved this by creating ...
> - extra "this"
>
> 2) Given this, routines like pfree their corresponding context ...
> - missing "find" or "determine"

Will fix.

> I also see you've explicitly mentioned the callbacks were added in 9.5.
> Doesn't that somewhat reintroduce the historical account?

That section I just moved up, the version reference was there before. I
left it in, because it seemed new enough to still be somewhat
relevant; removed and added it, not sure what's better.

Greetings,

Andres Freund

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, John Gorman <johngorman2(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: PATCH: two slab-like memory allocators
Date:	2017-03-02 06:25:16
Message-ID:	20170302062516.7iig7yqh4atrul6b@alap3.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2017-03-02 04:47:13 +0100, Tomas Vondra wrote:
> On 03/01/2017 11:55 PM, Andres Freund wrote:
> > On 2017-02-28 20:29:36 -0800, Andres Freund wrote:
> > > On 2017-02-28 20:18:35 -0800, Andres Freund wrote:
> > > > - Andres, hoping the buildfarm turns greener
> > >
> > > Oh well, that didn't work. Investigating.
> >
> > The fix for that was fairly trivial, and the buildfarm has cooled down.
> >
> > The issue was that on 32bit platforms the Datum returned by some
> > functions (int2int4_sum in this case) isn't actually a separately
> > allocated Datum, but rather just something embedded in a larger
> > struct. That, combined with the following code:
> > if (!peraggstate->resulttypeByVal && !*isnull &&
> > !MemoryContextContains(CurrentMemoryContext,
> > DatumGetPointer(*result)))
> > seems somewhat problematic to me. MemoryContextContains() can give
> > false positives when used on memory that's not a distinctly allocated
> > chunk, and if so, we violate memory lifetime rules. It's quite
> > unlikely, given the required bit patterns, but nonetheless it's making
> > me somewhat uncomfortable.
> >
>
> I assume you're only using that code snippet as an example of code that
> might be broken by MemoryContextContains() false positives, right?

I'm mentioning that piece of code because it's what temporarily caused
all 32bit animals to fail, when I had made MemoryContextContains() less
forgiving.

> (I don't see how the slab allocator could interfere with aggregates, as it's
> only used for reorderbuffer.c).

Indeed, this is independent of slab.c. I just came across it because I
triggered crashes when shrinking the StandardChunkHeader to be just the
chunk's MemoryContext.

> > Do others think this isn't an issue and we can just live with it?
> >
>
> My understanding is all the places calling MemoryContextContains() assume
> they can't receive memory not allocated as a simple chunk by palloc(). If
> that's not the case, it's likely broken.

Yea, that's my conclusion too. Which means nodeAgg.c and nodeWindowAgg.c
are broken afaics, because of e.g. int2int4_sum's() use of
Int64GetDatumFast() on sub-parts of larger allocations.

- Andres

From:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To:	Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, John Gorman <johngorman2(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: PATCH: two slab-like memory allocators
Date:	2017-03-02 21:51:09
Message-ID:	fa4b397e-b454-d109-990f-2f05f7e47f15@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 03/01/2017 05:29 AM, Andres Freund wrote:
> On 2017-02-28 20:18:35 -0800, Andres Freund wrote:
>> - Andres, hoping the buildfarm turns greener
>
> Oh well, that didn't work. Investigating.
>

Attaches is the last part of the patch series, rebased to current master
and adopting the new chunk header approach. Unlike Slab, this context
needs the whole AllocSet header (size, requested_size), and also the
block pointer, so no padding seems to be needed.

I've tested this on x86-64 and amrv7l, and the test_decoding test suite
passes on both.

FWIW, I'm still not entirely happy with the name "Generation". I agree
with Andres that it's perhaps a bit too generic, but more importantly
the name might actually be a bit obsolete. There used to be generations
of chunks, but that's gone. Now it simply does not reuse the chunks at
all, and frees the blocks when they get empty.

It's not entirely FIFO though, because the transactions interleave, so
later blocks may be released first. But the "allocated close, freed
close" is still there. So perhaps something like "TemporalSet" or
something like that would be a better name?

Man, naming things is hard ...

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment	Content-Type	Size
gen-context.patch	binary/octet-stream	0 bytes

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, John Gorman <johngorman2(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: PATCH: two slab-like memory allocators
Date:	2017-03-04 01:58:49
Message-ID:	20170304015849.azxrq5zevzrkvirx@alap3.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2017-03-01 22:19:30 -0800, Andres Freund wrote:
> On 2017-03-02 04:36:23 +0100, Tomas Vondra wrote:
> > I've noticed two minor typos:
> >
> > 1) That is solved this by creating ...
> > - extra "this"
> >
> > 2) Given this, routines like pfree their corresponding context ...
> > - missing "find" or "determine"
>
> Will fix.

And done.

> > I also see you've explicitly mentioned the callbacks were added in 9.5.
> > Doesn't that somewhat reintroduce the historical account?
>
> That section I just moved up, the version reference was there before. I
> left it in, because it seemed new enough to still be somewhat
> relevant; removed and added it, not sure what's better.

Left that in place for now.

From:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, John Gorman <johngorman2(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: PATCH: two slab-like memory allocators
Date:	2017-03-04 04:02:29
Message-ID:	70336470-6428-cfc1-eb15-d94e74d321c9@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 03/04/2017 02:58 AM, Andres Freund wrote:
> On 2017-03-01 22:19:30 -0800, Andres Freund wrote:
>> On 2017-03-02 04:36:23 +0100, Tomas Vondra wrote:
>>> I've noticed two minor typos:
>>>
>>> 1) That is solved this by creating ...
>>> - extra "this"
>>>
>>> 2) Given this, routines like pfree their corresponding context ...
>>> - missing "find" or "determine"
>>
>> Will fix.
>
> And done.
>
>>> I also see you've explicitly mentioned the callbacks were added
>>> in 9.5. Doesn't that somewhat reintroduce the historical
>>> account?
>>
>> That section I just moved up, the version reference was there
>> before. I left it in, because it seemed new enough to still be
>> somewhat relevant; removed and added it, not sure what's better.
>
> Left that in place for now.
>

Yeah. I haven't realized it was just moved a bit, and moreover it
probably makes sense to have some comments regarding differences between
current versions. So +1 to that.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, John Gorman <johngorman2(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: PATCH: two slab-like memory allocators
Date:	2017-03-06 17:40:18
Message-ID:	CA+TgmoY6F5ry7hiabiWFvU40bVgVruLB9YGP0D2O4Tb_zNXOUw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Mar 1, 2017 at 5:55 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> The issue was that on 32bit platforms the Datum returned by some
> functions (int2int4_sum in this case) isn't actually a separately
> allocated Datum, but rather just something embedded in a larger
> struct. That, combined with the following code:
> if (!peraggstate->resulttypeByVal && !*isnull &&
> !MemoryContextContains(CurrentMemoryContext,
> DatumGetPointer(*result)))
> seems somewhat problematic to me. MemoryContextContains() can give
> false positives when used on memory that's not a distinctly allocated
> chunk, and if so, we violate memory lifetime rules. It's quite
> unlikely, given the required bit patterns, but nonetheless it's making
> me somewhat uncomfortable.
>
> Do others think this isn't an issue and we can just live with it?

I think it's 100% broken to call MemoryContextContains() on something
that's not guaranteed to be a palloc'd chunk.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, John Gorman <johngorman2(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: PATCH: two slab-like memory allocators
Date:	2017-03-06 17:44:06
Message-ID:	20170306174406.pfs3as44gc7a5ps2@alap3.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2017-03-06 12:40:18 -0500, Robert Haas wrote:
> On Wed, Mar 1, 2017 at 5:55 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> > The issue was that on 32bit platforms the Datum returned by some
> > functions (int2int4_sum in this case) isn't actually a separately
> > allocated Datum, but rather just something embedded in a larger
> > struct. That, combined with the following code:
> > if (!peraggstate->resulttypeByVal && !*isnull &&
> > !MemoryContextContains(CurrentMemoryContext,
> > DatumGetPointer(*result)))
> > seems somewhat problematic to me. MemoryContextContains() can give
> > false positives when used on memory that's not a distinctly allocated
> > chunk, and if so, we violate memory lifetime rules. It's quite
> > unlikely, given the required bit patterns, but nonetheless it's making
> > me somewhat uncomfortable.
> >
> > Do others think this isn't an issue and we can just live with it?
>
> I think it's 100% broken to call MemoryContextContains() on something
> that's not guaranteed to be a palloc'd chunk.

I agree, but to me it seems the only fix would be to just yank out the
whole optimization?

- Andres

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, John Gorman <johngorman2(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: PATCH: two slab-like memory allocators
Date:	2017-03-06 18:05:44
Message-ID:	CA+TgmoaQ34gXj3G96xqU6chxTOHSHxip07+GySRxS-5FgHs+NQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Mar 6, 2017 at 12:44 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> On 2017-03-06 12:40:18 -0500, Robert Haas wrote:
>> On Wed, Mar 1, 2017 at 5:55 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
>> > The issue was that on 32bit platforms the Datum returned by some
>> > functions (int2int4_sum in this case) isn't actually a separately
>> > allocated Datum, but rather just something embedded in a larger
>> > struct. That, combined with the following code:
>> > if (!peraggstate->resulttypeByVal && !*isnull &&
>> > !MemoryContextContains(CurrentMemoryContext,
>> > DatumGetPointer(*result)))
>> > seems somewhat problematic to me. MemoryContextContains() can give
>> > false positives when used on memory that's not a distinctly allocated
>> > chunk, and if so, we violate memory lifetime rules. It's quite
>> > unlikely, given the required bit patterns, but nonetheless it's making
>> > me somewhat uncomfortable.
>> >
>> > Do others think this isn't an issue and we can just live with it?
>>
>> I think it's 100% broken to call MemoryContextContains() on something
>> that's not guaranteed to be a palloc'd chunk.
>
> I agree, but to me it seems the only fix would be to just yank out the
> whole optimization?

Dunno, haven't looked into it.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, John Gorman <johngorman2(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: PATCH: two slab-like memory allocators
Date:	2017-03-06 18:49:56
Message-ID:	b09976b6-b662-da59-3f65-a4396e8de5b3@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 03/06/2017 07:05 PM, Robert Haas wrote:
> On Mon, Mar 6, 2017 at 12:44 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
>> On 2017-03-06 12:40:18 -0500, Robert Haas wrote:
>>> On Wed, Mar 1, 2017 at 5:55 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
>>>> The issue was that on 32bit platforms the Datum returned by some
>>>> functions (int2int4_sum in this case) isn't actually a separately
>>>> allocated Datum, but rather just something embedded in a larger
>>>> struct. That, combined with the following code:
>>>> if (!peraggstate->resulttypeByVal && !*isnull &&
>>>> !MemoryContextContains(CurrentMemoryContext,
>>>> DatumGetPointer(*result)))
>>>> seems somewhat problematic to me. MemoryContextContains() can give
>>>> false positives when used on memory that's not a distinctly allocated
>>>> chunk, and if so, we violate memory lifetime rules. It's quite
>>>> unlikely, given the required bit patterns, but nonetheless it's making
>>>> me somewhat uncomfortable.
>>>>
>>>> Do others think this isn't an issue and we can just live with it?
>>>
>>> I think it's 100% broken to call MemoryContextContains() on something
>>> that's not guaranteed to be a palloc'd chunk.
>>
>> I agree, but to me it seems the only fix would be to just yank out the
>> whole optimization?
>
> Dunno, haven't looked into it.
>

I think it might be fixable by adding a flag into the chunk, with 'true'
for regular allocations, and 'false' for the optimized ones. And then
only use MemoryContextContains() for 'flag=true' chunks.

The question however is whether this won't make the optimization
pointless. I also, wonder how much we save by this optimization and how
widely it's used? Can someone point me to some numbers?

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, John Gorman <johngorman2(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: PATCH: two slab-like memory allocators
Date:	2017-03-06 19:08:57
Message-ID:	20170306190857.3pef5twyhnhxoqjs@alap3.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

On 2017-03-06 19:49:56 +0100, Tomas Vondra wrote:
> On 03/06/2017 07:05 PM, Robert Haas wrote:
> > On Mon, Mar 6, 2017 at 12:44 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> > > On 2017-03-06 12:40:18 -0500, Robert Haas wrote:
> > > > On Wed, Mar 1, 2017 at 5:55 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> > > > > The issue was that on 32bit platforms the Datum returned by some
> > > > > functions (int2int4_sum in this case) isn't actually a separately
> > > > > allocated Datum, but rather just something embedded in a larger
> > > > > struct. That, combined with the following code:
> > > > > if (!peraggstate->resulttypeByVal && !*isnull &&
> > > > > !MemoryContextContains(CurrentMemoryContext,
> > > > > DatumGetPointer(*result)))
> > > > > seems somewhat problematic to me. MemoryContextContains() can give
> > > > > false positives when used on memory that's not a distinctly allocated
> > > > > chunk, and if so, we violate memory lifetime rules. It's quite
> > > > > unlikely, given the required bit patterns, but nonetheless it's making
> > > > > me somewhat uncomfortable.
> > > > >
> > > > > Do others think this isn't an issue and we can just live with it?
> > > >
> > > > I think it's 100% broken to call MemoryContextContains() on something
> > > > that's not guaranteed to be a palloc'd chunk.
> > >
> > > I agree, but to me it seems the only fix would be to just yank out the
> > > whole optimization?
> >
> > Dunno, haven't looked into it.
> >
>
> I think it might be fixable by adding a flag into the chunk, with 'true' for
> regular allocations, and 'false' for the optimized ones. And then only use
> MemoryContextContains() for 'flag=true' chunks.

I'm not quite following here. We only get a Datum and the knowledge
that it's a pass-by-ref argument, so we really don't know that much. We
could create an "EmbeddedDatum" type that has a preceding chunk header
(appropriately for the version), that just gets zeroed out at start. Is
that what you mean?

> The question however is whether this won't make the optimization pointless.
> I also, wonder how much we save by this optimization and how widely it's
> used? Can someone point me to some numbers?

I don't recall any recent numbers. I'm more than a bit doubful that it
really matters - it's only used for the results of aggregate/window
functions, and surely they've a good chunk of their own overhead...

Greetings,

Andres Freund

From:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, John Gorman <johngorman2(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: PATCH: two slab-like memory allocators
Date:	2017-03-06 22:32:30
Message-ID:	699073c9-17ca-68fb-9463-17b89b72dc22@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 03/06/2017 08:08 PM, Andres Freund wrote:
> Hi,
>
> On 2017-03-06 19:49:56 +0100, Tomas Vondra wrote:
>> On 03/06/2017 07:05 PM, Robert Haas wrote:
>>> On Mon, Mar 6, 2017 at 12:44 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
>>>> On 2017-03-06 12:40:18 -0500, Robert Haas wrote:
>>>>> On Wed, Mar 1, 2017 at 5:55 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
>>>>>> The issue was that on 32bit platforms the Datum returned by some
>>>>>> functions (int2int4_sum in this case) isn't actually a separately
>>>>>> allocated Datum, but rather just something embedded in a larger
>>>>>> struct. That, combined with the following code:
>>>>>> if (!peraggstate->resulttypeByVal && !*isnull &&
>>>>>> !MemoryContextContains(CurrentMemoryContext,
>>>>>> DatumGetPointer(*result)))
>>>>>> seems somewhat problematic to me. MemoryContextContains() can give
>>>>>> false positives when used on memory that's not a distinctly allocated
>>>>>> chunk, and if so, we violate memory lifetime rules. It's quite
>>>>>> unlikely, given the required bit patterns, but nonetheless it's making
>>>>>> me somewhat uncomfortable.
>>>>>>
>>>>>> Do others think this isn't an issue and we can just live with it?
>>>>>
>>>>> I think it's 100% broken to call MemoryContextContains() on something
>>>>> that's not guaranteed to be a palloc'd chunk.
>>>>
>>>> I agree, but to me it seems the only fix would be to just yank out the
>>>> whole optimization?
>>>
>>> Dunno, haven't looked into it.
>>>
>>
>> I think it might be fixable by adding a flag into the chunk, with 'true' for
>> regular allocations, and 'false' for the optimized ones. And then only use
>> MemoryContextContains() for 'flag=true' chunks.
>
> I'm not quite following here. We only get a Datum and the knowledge
> that it's a pass-by-ref argument, so we really don't know that much. We
> could create an "EmbeddedDatum" type that has a preceding chunk header
> (appropriately for the version), that just gets zeroed out at start. Is
> that what you mean?
>

Yes, that's roughly what I meant.

>
>> The question however is whether this won't make the optimization pointless.
>> I also, wonder how much we save by this optimization and how widely it's
>> used? Can someone point me to some numbers?
>
> I don't recall any recent numbers. I'm more than a bit doubful that it
> really matters - it's only used for the results of aggregate/window
> functions, and surely they've a good chunk of their own overhead...
>

And if the benefit is negligible, trying to keep the optimization might
easily result in slowdown (compared to non-optimized code).

But I'm puzzled why we haven't seen any reports of failures? I mean,
doing sum(int4) is not particularly extravagant thing, if there really
is an issue, shouldn't we see a lot of reports? What are we missing?

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, John Gorman <johngorman2(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: PATCH: two slab-like memory allocators
Date:	2017-03-06 22:42:00
Message-ID:	20170306224200.6eanapty52itdfj6@alap3.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

On 2017-03-06 23:32:30 +0100, Tomas Vondra wrote:
> > > The question however is whether this won't make the optimization pointless.
> > > I also, wonder how much we save by this optimization and how widely it's
> > > used? Can someone point me to some numbers?
> >
> > I don't recall any recent numbers. I'm more than a bit doubful that it
> > really matters - it's only used for the results of aggregate/window
> > functions, and surely they've a good chunk of their own overhead...
> >
>
> And if the benefit is negligible, trying to keep the optimization might
> easily result in slowdown (compared to non-optimized code).

I doubt the branch is noticeable here, given that we're doing a memory
allocation otherwise. Should also be decently predictable.

> But I'm puzzled why we haven't seen any reports of failures? I mean, doing
> sum(int4) is not particularly extravagant thing, if there really is an
> issue, shouldn't we see a lot of reports? What are we missing?

Reports about what? False positives causing crashes / wrong results? I
think it's quite unlikely to actually trigger this in practice, because
you need a properly aligned pointer, and then the preceding header has
to to point to a bit pattern that's equal to the context - that's
presumably quite unlikely in practice.

Regards,

Andres

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, John Gorman <johngorman2(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: PATCH: two slab-like memory allocators
Date:	2017-03-06 23:19:59
Message-ID:	20170306231959.435oxevpnv3gukzw@alap3.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2017-03-02 22:51:09 +0100, Tomas Vondra wrote:
> Attaches is the last part of the patch series, rebased to current master and
> adopting the new chunk header approach.

Something seems to have gone awry while sending that - the attachement
is a whopping 0 bytes...

From:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, John Gorman <johngorman2(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: PATCH: two slab-like memory allocators
Date:	2017-03-07 14:46:57
Message-ID:	ca5479fb-3129-c1e6-48e8-cd6933ea6ad0@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 03/07/2017 12:19 AM, Andres Freund wrote:
> On 2017-03-02 22:51:09 +0100, Tomas Vondra wrote:
>> Attaches is the last part of the patch series, rebased to current master and
>> adopting the new chunk header approach.
>
> Something seems to have gone awry while sending that - the attachement
> is a whopping 0 bytes...
>

Why are you complaining? That makes it much easier to review and commit!

But if you insist, here is the actual patch ;-)

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment	Content-Type	Size
gen-context.patch	text/x-diff	29.6 KB

From:	David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, John Gorman <johngorman2(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: PATCH: two slab-like memory allocators
Date:	2017-03-08 02:19:34
Message-ID:	CAKJS1f_WrsaKb4h=LrtLHosY0KYyQeP+KY9YYw1Ky0XrsFpkBQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 28 February 2017 at 01:02, Andres Freund <andres(at)anarazel(dot)de> wrote:
> Hi,
>
> On 2017-02-27 03:17:32 -0800, Andres Freund wrote:
>> I'll work on getting slab committed first, and then review / edit /
>> commit generation.c later. One first note there is that I'm wondering
>> if generation.c is a too generic filename.
>
> And pushed slab and its usage. Will have a look at generation.c
> tomorrow.

Attached is a patch to fix the compiler warning for compilers that
don't understand elog(ERROR) does not return.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment	Content-Type	Size
slaballoc_warning_fix.patch	application/octet-stream	390 bytes

From:	Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>
To:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, John Gorman <johngorman2(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: PATCH: two slab-like memory allocators
Date:	2017-04-02 22:42:58
Message-ID:	abaa1b67-5360-3cf0-96e0-fddfea5756ca@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

this patch is marked as committed in CF application but the second part
(generational allocator) was AFAICS never committed.

Does anybody plan to push this forward?

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To:	pgsql-hackers(at)postgreSQL(dot)org
Subject:	PATCH : Generational memory allocator (was PATCH: two slab-like memory allocators)
Date:	2017-08-14 00:35:42
Message-ID:	fe2e6aba-991e-27e3-eb06-1429db17e200@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

Attached is a rebased version of the Generational context, originally
submitted with SlabContext (which was already committed into Pg 10).

The main change is that I've abandoned the pattern of defining a Data
structure and then a pointer typedef, i.e.

typedef struct GenerationContextData { ... } GenerationContextData;
typedef struct GenerationContextData *GenerationContext;

Now it's just

typedef struct GenerationContext { ... } GenerationContext;

mostly because SlabContext was committed like that, and because Andres
was complaining about this code pattern ;-)

Otherwise the design is the same as repeatedly discussed before.

To show that this is still valuable change (even after SlabContext and
adding doubly-linked list to AllocSet), I've repeated the test done by
Andres in [1] using the test case described in [2], that is

-- generate data
SELECT COUNT(*) FROM (SELECT test1()
FROM generate_series(1, 50000)) foo;

-- benchmark (measure time and VmPeak)
SELECT COUNT(*) FROM (SELECT *
FROM pg_logical_slot_get_changes('test', NULL,
NULL, 'include-xids', '0')) foo;

with different values passed to the first step (instead of the 50000).
The VmPeak numbers look like this:

N master patched
--------------------------------------
100000 1155220 kB 361604 kB
200000 2020668 kB 434060 kB
300000 2890236 kB 502452 kB
400000 3751592 kB 570816 kB
500000 4621124 kB 639168 kB

and the timing (on assert-enabled build):

N master patched
--------------------------------------
100000 1103.182 ms 412.734 ms
200000 2216.711 ms 820.438 ms
300000 3320.095 ms 1223.576 ms
400000 4584.919 ms 1621.261 ms
500000 5590.444 ms 2113.820 ms

So it seems it's still a significant improvement, both in terms of
memory usage and timing. Admittedly, this is a single test, so ideas of
other useful test cases are welcome.

regards

[1]
https://www.postgresql.org/message-id/20170227111732.vrx5v72ighehwpkf%40alap3.anarazel.de

[2]
https://www.postgresql.org/message-id/20160706185502.1426.28143%40wrigleys.postgresql.org

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment	Content-Type	Size
0001-Generational-memory-allocator.patch	text/x-patch	31.4 KB

From:	Simon Riggs <simon(at)2ndquadrant(dot)com>
To:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: PATCH : Generational memory allocator (was PATCH: two slab-like memory allocators)
Date:	2017-09-14 14:21:27
Message-ID:	CANP8+jJUKeWXRVYvsJXzHu=+eW=sc=5QkaAReQYE3Wr86+02WQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 14 August 2017 at 01:35, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
> Hi,
>
> Attached is a rebased version of the Generational context, originally
> submitted with SlabContext (which was already committed into Pg 10).
>
> The main change is that I've abandoned the pattern of defining a Data
> structure and then a pointer typedef, i.e.
>
> typedef struct GenerationContextData { ... } GenerationContextData;
> typedef struct GenerationContextData *GenerationContext;
>
> Now it's just
>
> typedef struct GenerationContext { ... } GenerationContext;
>
> mostly because SlabContext was committed like that, and because Andres was
> complaining about this code pattern ;-)
>
> Otherwise the design is the same as repeatedly discussed before.
>
> To show that this is still valuable change (even after SlabContext and
> adding doubly-linked list to AllocSet), I've repeated the test done by
> Andres in [1] using the test case described in [2], that is
>
> -- generate data
> SELECT COUNT(*) FROM (SELECT test1()
> FROM generate_series(1, 50000)) foo;
>
> -- benchmark (measure time and VmPeak)
> SELECT COUNT(*) FROM (SELECT *
> FROM pg_logical_slot_get_changes('test', NULL,
> NULL, 'include-xids', '0')) foo;
>
> with different values passed to the first step (instead of the 50000). The
> VmPeak numbers look like this:
>
> N master patched
> --------------------------------------
> 100000 1155220 kB 361604 kB
> 200000 2020668 kB 434060 kB
> 300000 2890236 kB 502452 kB
> 400000 3751592 kB 570816 kB
> 500000 4621124 kB 639168 kB
>
> and the timing (on assert-enabled build):
>
> N master patched
> --------------------------------------
> 100000 1103.182 ms 412.734 ms
> 200000 2216.711 ms 820.438 ms
> 300000 3320.095 ms 1223.576 ms
> 400000 4584.919 ms 1621.261 ms
> 500000 5590.444 ms 2113.820 ms
>
> So it seems it's still a significant improvement, both in terms of memory
> usage and timing. Admittedly, this is a single test, so ideas of other
> useful test cases are welcome.

This all looks good.

What I think this needs is changes to
src/backend/utils/mmgr/README
which decribe the various options that now exist (normal?, slab) and
will exist (generational)

Don't really care about the name, as long as its clear when to use it
and when not to use it.

This description of generational seems wrong: "When the allocated
chunks have similar lifespan, this works very well and is extremely
cheap."
They don't need the same lifespan they just need to be freed in groups
and in the order they were allocated.

For this patch specifically, we need additional comments in
reorderbuffer.c to describe the memory allocation pattern in that
module so that it is clear that the choice of allocator is useful and
appropriate, possibly with details of how that testing was performed,
so it can be re-tested later or tested on a variety of platforms.

Particularly in reorderbuffer, surely we will almost immediately reuse
chunks again, so is it worth issuing free() and then malloc() again
soon after? Does that cause additional overhead we could also avoid?
Could we possibly keep the last/few free'd chunks around rather than
re-malloc?

Seems like we should commit this soon.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

From:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: PATCH : Generational memory allocator (was PATCH: two slab-like memory allocators)
Date:	2017-09-15 01:07:25
Message-ID:	5ef0d99b-d154-1222-684d-6ca435901c85@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 09/14/2017 04:21 PM, Simon Riggs wrote:
> On 14 August 2017 at 01:35, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>> Hi,
>>
>> Attached is a rebased version of the Generational context, originally
>> submitted with SlabContext (which was already committed into Pg 10).
>>
>> The main change is that I've abandoned the pattern of defining a Data
>> structure and then a pointer typedef, i.e.
>>
>> typedef struct GenerationContextData { ... } GenerationContextData;
>> typedef struct GenerationContextData *GenerationContext;
>>
>> Now it's just
>>
>> typedef struct GenerationContext { ... } GenerationContext;
>>
>> mostly because SlabContext was committed like that, and because Andres was
>> complaining about this code pattern ;-)
>>
>> Otherwise the design is the same as repeatedly discussed before.
>>
>> To show that this is still valuable change (even after SlabContext and
>> adding doubly-linked list to AllocSet), I've repeated the test done by
>> Andres in [1] using the test case described in [2], that is
>>
>> -- generate data
>> SELECT COUNT(*) FROM (SELECT test1()
>> FROM generate_series(1, 50000)) foo;
>>
>> -- benchmark (measure time and VmPeak)
>> SELECT COUNT(*) FROM (SELECT *
>> FROM pg_logical_slot_get_changes('test', NULL,
>> NULL, 'include-xids', '0')) foo;
>>
>> with different values passed to the first step (instead of the 50000). The
>> VmPeak numbers look like this:
>>
>> N master patched
>> --------------------------------------
>> 100000 1155220 kB 361604 kB
>> 200000 2020668 kB 434060 kB
>> 300000 2890236 kB 502452 kB
>> 400000 3751592 kB 570816 kB
>> 500000 4621124 kB 639168 kB
>>
>> and the timing (on assert-enabled build):
>>
>> N master patched
>> --------------------------------------
>> 100000 1103.182 ms 412.734 ms
>> 200000 2216.711 ms 820.438 ms
>> 300000 3320.095 ms 1223.576 ms
>> 400000 4584.919 ms 1621.261 ms
>> 500000 5590.444 ms 2113.820 ms
>>
>> So it seems it's still a significant improvement, both in terms of memory
>> usage and timing. Admittedly, this is a single test, so ideas of other
>> useful test cases are welcome.
>
> This all looks good.
>
> What I think this needs is changes to
> src/backend/utils/mmgr/README
> which decribe the various options that now exist (normal?, slab) and
> will exist (generational)
>
> Don't really care about the name, as long as its clear when to use it
> and when not to use it.
>
> This description of generational seems wrong: "When the allocated
> chunks have similar lifespan, this works very well and is extremely
> cheap."
> They don't need the same lifespan they just need to be freed in groups
> and in the order they were allocated.
>

Agreed, should be described differently. What matters is (mostly) FIFO
pattern of the palloc/pfree requests, which allows us to release the memory.

>
> For this patch specifically, we need additional comments in
> reorderbuffer.c to describe the memory allocation pattern in that
> module so that it is clear that the choice of allocator is useful
> and appropriate, possibly with details of how that testing was
> performed, so it can be re-tested later or tested on a variety of
> platforms.
>

Including details about the testing into reorderbuffer.c does not seem
very attractive to me. I don't recall any other place describing the
tests in detail. Why not to discuss the details here, and then include a
link to this thread in the commit message?

In any case, I doubt anyone will repeat the testing on a variety of
platforms (and I don't have any such comprehensive test suite anyway).
What will likely happen is someone bumping into a poorly performing
corner case, we will analyze it and fix it as usual.

>
> Particularly in reorderbuffer, surely we will almost immediately
> reuse chunks again, so is it worth issuing free() and then malloc()
> again soon after? Does that cause additional overhead we could also
> avoid? Could we possibly keep the last/few free'd chunks around
> rather than re-malloc?
>

I haven't seen anything like that in tests I've done. The malloc/free
overhead is negligible thanks as our allocators significantly reduce the
number of calls to those functions. If we happen to run into such case,
it shouldn't be difficult to keep a few empty blocks. But perhaps we can
leave that as a future optimization.

>
> Seems like we should commit this soon.
>

Thanks.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

From:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: PATCH : Generational memory allocator (was PATCH: two slab-like memory allocators)
Date:	2017-09-24 20:32:09
Message-ID:	224c71cc-46d3-1f5f-0447-8999b048cbd1@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

Attached is an updated version of the patch, tweaking the comments.

1) I've added a section at the end of src/backend/utils/mmgr/README,
briefly explaining the alternative memory allocators we have. I don't
think we should get into too much low-level detail here, that's more
appropriate for the .c file for each context.

2) I've slightly reworded a paragraph in generation.c describing what
use cases are suitable for the memory context. It used to say:

This memory context is based on the assumption that the allocated
chunks have similar lifespan, i.e. that chunks allocated close from
each other (by time) will also be freed in close proximity, and
mostly in the same order. This is typical for various queue-like use
cases, i.e. when tuples are constructed, processed and then thrown
away.

and now it says:

This memory context is based on the assumption that the chunks are
freed roughly in the same order as they were allocated (FIFO), or in
groups with similar lifespan (generations - hence the name of the
context). This is typical for various queue-like use cases, i.e. when
tuples are constructed, processed and then thrown away.

3) I've also added a brief note into reorderbuffer.c mentioning that it
uses SlabContext and GenerationContext. As I explained, I don't think we
should include details about how we tested the patch or whatever here.

regard

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment	Content-Type	Size
0001-Generational-memory-allocator-v2.patch	text/x-patch	33.3 KB

From:	Simon Riggs <simon(at)2ndquadrant(dot)com>
To:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: PATCH : Generational memory allocator (was PATCH: two slab-like memory allocators)
Date:	2017-09-24 21:08:02
Message-ID:	CANP8+jK8un7OBd7oeJufr8KSNKgdE5546_NBAG+q_Z-O1r8cdw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 24 September 2017 at 21:32, Tomas Vondra
<tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:

> Attached is an updated version of the patch, tweaking the comments.

That looks good, thanks. Marking Ready for Committer to give notice
before commit.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services