Re: PATCH: two slab-like memory allocators

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Petr Jelinek <petr(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: PATCH: two slab-like memory allocators
Date: 2016-09-25 20:17:41
Message-ID: 68b837a3-ce55-f9a2-f685-9eafac536f21@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 09/25/2016 08:48 PM, Petr Jelinek wrote:
> Hi Tomas,
>
> On 02/08/16 17:44, Tomas Vondra wrote:
>>
>> This patch actually includes two new memory allocators (not one). Very
>> brief summary (for more detailed explanation of the ideas, see comments
>> at the beginning of slab.c and genslab.c):
>>
>> Slab
>> ----
>> * suitable for fixed-length allocations (other pallocs fail)
>> * much simpler than AllocSet (no global freelist management etc.)
>> * free space is tracked per block (using a simple bitmap)
>> * which allows freeing the block once all chunks are freed (AllocSet
>> will hold the memory forever, in the hope of reusing it)
>>
>> GenSlab
>> -------
>> * suitable for non-fixed-length allocations, but with chunks of mostly
>> the same size (initially unknown, the context will tune itself)
>> * a combination AllocSet and Slab (or a sequence of Slab allocators)
>> * the goal is to do most allocations in Slab context
>> * there's always a single 'current' Slab context, and every now and and
>> then it's replaced with a new generation (with the chunk size computed
>> from recent requests)
>> * the AllocSet context is used for chunks too large for current Slab
>>
>
> So it's just wrapper around the other two allocators to make this
> usecase easier to handle. Do you expect there will be use for the
> GenSlab eventually outside of the reoderbuffer?
>

Yes, you might say it's just a wrapper around the other two allocators,
but it *also* includes the logic of recomputing chunk size etc.

I haven't thought about other places that might benefit from these new
allocators very much - in general, it's useful for places that produce a
stream of equally-sized items (GenSlab relaxes this), that are pfreed()
in ~FIFO manner (i.e. roughly in the order of allocation).

>> So none of this is meant as a universal replacement of AllocSet,
>> but in the suitable cases the results seem really promising. For
>> example for the simple test query in [1], the performance
>> improvement is this:
>>
>> N | master | patched
>> -----------------------------
>> 10000 | 100ms | 100ms
>> 50000 | 15000ms | 350ms
>> 100000 | 146000ms | 700ms
>> 200000 | ? | 1400ms
>>
>> That's a fairly significant improvement, and the submitted version
>> of the patches should perform even better (~2x, IIRC).
>>
>
> I agree that it improves performance quite nicely and that
> reoderbuffer could use this.
>
> About the code. I am not quite sure that this needs to be split into
> two patches especially if 1/3 of the second patch is the removal of
> the code added by the first one and otherwise it's quite small and
> straightforward. That is unless you expect the GenSlab to not go in.
>

I don't know - it seemed natural to first introduce the Slab, as it's
easier to discuss it separately, and it works for 2 of the 3 contexts
needed in reorderbuffer.

GenSlab is an improvement of Slab, or rather based on it, so that it
works for the third context. And it introduces some additional ideas
(particularly the generational design, etc.)

Of course, none of this means it has to be committed in two separate
chunks, or that I don't expect GenSlab to get committed ...

> Slab:
> In general it seems understandable, the initial description helps to
> understand what's happening well enough.
>
> One thing I don't understand however is why the freelist is both
> array and doubly linked list and why there is new implementation of
> said doubly linked list given that we have dlist.
>

Hmm, perhaps that should be explained better.

In AllocSet, we only have a global freelist of chunks, i.e. we have a
list of free chunks for each possible size (there's 11 sizes, starting
with 8 bytes and then doubling the size). So freelist[0] is a list of
free 8B chunks, freelist[1] is a list of free 16B chunks, etc.

In Slab, the freelist has two levels - first there's a bitmap on each
block (which is possible, as the chunks have constant size), tracking
which chunks of that particular block are free. This makes it trivial to
check that all chunks on the block are free, and free the whole block
(which is impossible with AllocSet).

Second, the freelist at the context level tracks blocks with a given
number of free chunks - so freelist[0] tracks completely full blocks,
freelist[1] is a list of blocks with 1 free chunk, etc. This is used to
reuse space on almost full blocks first, in the hope that some of the
less full blocks will get completely empty (and freed to the OS).

Is that clear?

>> +/*
>> + * SlabContext is our standard implementation of MemoryContext.
>> + *
>
> Really?
>

Meh, that's clearly a bogus comment.

>> +/*
>> + * SlabChunk
>> + * The prefix of each piece of memory in an SlabBlock
>> + *
>> + * NB: this MUST match StandardChunkHeader as defined by utils/memutils.h.
>> + */
>
> Is this true? Why? And if it is then why doesn't the SlabChunk
> actually match the StandardChunkHeader?

It is true, a lot of stuff in MemoryContext infrastructure relies on
that. For example when you do pfree(ptr), we actually do something like

header = (StandardChunkHeader*)(ptr - sizeof(StandardChunkHeader))

to get the chunk header - which includes pointer to the memory context
and other useful stuff.

This also means we can put additional fields before StandardChunkHeader
as that does not break this pointer arithmetic, i.e. SlabChunkData is
effectively defined like this:

typedef struct SlabChunkData
{
/* block owning this chunk */
void *block;

/* standard header */
StandardChunkHeader header;
} SlabChunkData;

>
>> +#define SlabPointerIsValid(pointer) PointerIsValid(pointer)
>
> What's the point of this given that it's defined in the .c file?
>

Meh, I've copied this from aset.c, but I see it's useless there too.

>
>> +static void *
>> +SlabAlloc(MemoryContext context, Size size)
>> +{
>> + Slab set = (Slab) context;
>> + SlabBlock block;
>> + SlabChunk chunk;
>> + int idx;
>> +
>> + AssertArg(SlabIsValid(set));
>> +
>> + Assert(size == set->chunkSize);
>
> I wonder if there should be stronger protection (ie, elog) for the size
> matching.
>

Perhaps, I'm not opposed to that.

>
>> +static void *
>> +SlabRealloc(MemoryContext context, void *pointer, Size size)
>> +{
>> + Slab set = (Slab)context;
>> +
>> + /* can't do actual realloc with slab, but let's try to be gentle */
>> + if (size == set->chunkSize)
>> + return pointer;
>> +
>> + /* we can't really do repalloc with this allocator */
>> + Assert(false);
>
> This IMHO should definitely be elog.
>

Yeah, you're probably right.

>
>> +static void
>> +SlabCheck(MemoryContext context)
>> +{
>> + /* FIXME */
>> +}
>
> Do you plan to implement this interface?
>

Yes, although I'm not sure what checks should go there. The only thing I
can think of right now is checking that the number of free chunks on a
block (according to the bitmap) matches the freelist index.

>> +#define SLAB_DEFAULT_BLOCK_SIZE 8192
>> +#define SLAB_LARGE_BLOCK_SIZE (8 * 1024 * 1024)
>
> I am guessing this is based on max_cached_tuplebufs? Maybe these
> could be written with same style?
>

Not sure I understand what you mean by "based on"? I don't quite
remember how I came up with those constants, but I guess 8kB and 8MB
seemed like good values.

Also, what style you mean? I've used the same style as for ALLOCSET_*
constants in the same file.

> GenSlab:
>
> Since this is relatively simple wrapper it looks mostly ok to me. The
> only issue I have here is that I am not quite sure about those FIXME
> functions (Free, Realloc, GetChunkSpace). It's slightly weird to call to
> mcxt but I guess the alternative there is to error out so this is
> probably preferable. Would want to hear other opinions here.
>

Yeah, I'd like to get some opinions on that too - that's why I left
there the FIXMEs, actually.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Rahila Syed 2016-09-25 20:26:58 Re: Optimization for lazy_scan_heap
Previous Message Enrique Meneses 2016-09-25 19:37:11 Re: Allowing GIN array_ops to work on anyarray