Pre-alloc ListCell's optimization

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Pre-alloc ListCell's optimization
Date: 2011-05-25 02:56:21
Message-ID: 20110525025621.GH4548@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Greetings,

Someone (*cough*Haas*cough) made a claim over beers at PGCon that it
would be very difficult to come up with a way to pre-allocate List
entries and maintain the current List API. I'll admit that it wasn't
quite as trivial as I had *hoped*, but attached is a proof-of-concept
patch which does it.

A couple of notes regarding the patch:

First, it uses ffs(), which might not be fully portable.. We could
certainly implement the same thing in userspace and use ffs() when
it's available.

Second, there are a couple of bugs (or at least, I'll characterize
them that way) where we're pfree'ing a list which has been passed to
list_concat(). That's not strictly legal as either argument passed to
list_concat() could be returned and so one might end up free'ing the
list pointer that was just returned. Those pfree's are commented out
in this patch, but really should probably just be removed or replaced
with 'right' code (if it's critical to free this stuff).

Third, COPY_NODE_CELL() wasn't used anywhere other than _copyList and
would be difficult to keep as a macro given the way allocations happen
now for lists. It's no longer being used at all and therefore should
really be removed.

Fourth, the current implementation goes ahead and allocates 8
ListCell's, which quadruples the size of a List (40 bytes to 168
bytes, assuming 64bit ints). I don't see that as really being an
issue, but perhaps others would disagree.

Fifth, list_concat() has become more like list_concat_unique() and
friends (and hence slower). Instead of being able to just tack one
list on to the end of the other, we have to do an actual copy of the
list. This is due to having to allow callers to list_free(), which
will call pfree(), and we don't have any way to know which ListCell's
from the *old* list were pre-allocated and which weren't, which can
end up causing pfree() to crash when it's passed an invalid pointer.
In general, I'd hope that not having to palloc() for a small list
might make up for a lot of that slowdown, but you can't really argue
that anything O(n) can compete with the previous O(1) approach.

Finally, sorry it's kind of a fugly patch, it's just a proof of
concept and I'd be happy to clean it up if others feel it's worthwhile
and a reasonable approach, but I really need to get it out there and
take a break from it (I've been a bit obsessive-compulsive about it
since PGCon.. :D).

Thanks!

Stephen

Attachment Content-Type Size
prealloc_list_20110524.patch text/x-diff 11.2 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2011-05-25 03:03:05 Re: Pre-alloc ListCell's optimization
Previous Message Jeff Davis 2011-05-25 02:52:41 Re: tackling full page writes