tbm_lossify causes "unbalanced" hashtables / bitmaps

Lists: pgsql-hackers
From: Andres Freund <andres(at)anarazel(dot)de>
To: pgsql-hackers(at)postgresql(dot)org
Subject: tbm_lossify causes "unbalanced" hashtables / bitmaps
Date: 2016-09-23 20:58:43
Message-ID: 20160923205843.zcs533sqdzlh4cpo@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

Playing around with my hashtable implementation [1] and using it for
bitmapscans/tidbitmap.c I noticed something curious. When using small
work_mem (4MB in my case) for large-ish tables (~5GB), the performance
tanks. That's primarily caused by some issues in the hashtable code
(since fixed), but it made me look more at the relevant tidbitmap
code. Namely tbm_lossify():

static void
tbm_lossify(TIDBitmap *tbm)
{
...
pagetable_start_iterate_random(tbm->pagetable, &i);
while ((page = pagetable_iterate(tbm->pagetable, &i)) != NULL)
{
if (page->ischunk)
continue; /* already a chunk header */

/*
* If the page would become a chunk header, we won't save anything by
* converting it to lossy, so skip it.
*/
if ((page->blockno % PAGES_PER_CHUNK) == 0)
continue;

/* This does the dirty work ... */
tbm_mark_page_lossy(tbm, page->blockno);

if (tbm->nentries <= tbm->maxentries / 2)
{
/* we have done enough */
break;
}

}

tbm_mark_page_lossy() in turn deletes the individual page entry, and
adds one for the chunk (spanning several pages).

The relevant part is that the loop stops when enough page ranges have
been lossified. This leads to some problems:

Because iteration (both in my implementation and dynahash) is
essentially in bucket order, the front of the hashtable will be mostly
empty, whereas later parts of the hashtable will be full (i.e. a lot of
collisions). The reason for that is that we'll find all parts of the
hashtable that are uncompressed and "early" in the hashspace, and
they'll possibly hash to something later in the table. As the number of
entries in the hashtable doesn't increase, we'll never grow the
hashtable to decrease the number of collisions. I've seen that cause
quite noticeable number of conflicts (which of course are worse in open
addressing table, rather than a separately chained one).

Secondly it'll lead to pretty random parts of the bitmap being
compressed. For performance it'd be good to compress "heavily used"
areas of the bitmap, instead of just whatever happens to be early in the
hash space.

I'm not entirely sure how to resolve that best, but it does seem worth
addressing. One thing that might be nice is to record the last N
insertions once at 90% full or so, and then lossify in a more targeted
manner using those. E.g. lossify block ranges (256 long atm) that
contain a lot of individual entries or such.

Greetings,

Andres Freund

[1] http://archives.postgresql.org/message-id/20160727004333.r3e2k2y6fvk2ntup%40alap3.anarazel.de


From: Andres Freund <andres(at)anarazel(dot)de>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: tbm_lossify causes "unbalanced" hashtables / bitmaps
Date: 2016-09-23 21:25:58
Message-ID: 20160923212558.pvtkh3bhw7gujhrk@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2016-09-23 13:58:43 -0700, Andres Freund wrote:
> static void
> tbm_lossify(TIDBitmap *tbm)
> {
> ...
> pagetable_start_iterate_random(tbm->pagetable, &i);

Uh, obviously that's from one of my attempts to address the problem.

Greetings,

Andres Freund


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: tbm_lossify causes "unbalanced" hashtables / bitmaps
Date: 2016-09-23 21:40:13
Message-ID: 30437.1474666813@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andres Freund <andres(at)anarazel(dot)de> writes:
> Because iteration (both in my implementation and dynahash) is
> essentially in bucket order, the front of the hashtable will be mostly
> empty, whereas later parts of the hashtable will be full (i.e. a lot of
> collisions). The reason for that is that we'll find all parts of the
> hashtable that are uncompressed and "early" in the hashspace, and
> they'll possibly hash to something later in the table.

Hm, yeah, I had supposed that this would hit a random part of the key
space, but you're right that over successive calls it's not good.

My idea of an appropriate fix would be to resume the scan at the same
point where the last scan stopped, and work circularly around the table
when necessary. But I'm not sure there is any really good way to do that
in the dynahash infrastructure. Maybe it'd work to keep the iteration
state around, but I don't remember how well that interacts with other
insertions/deletions. What about in your implementation?

There's also the point mentioned in the existing comment, that it'd be
better to go after pages with more bits set first. Not sure of an
inexpensive way to do that (ie, one that doesn't involve multiple
scans of the hashtable). But your results suggest that maybe it'd
be worth making tbm_lossify slower in order to get better results.

regards, tom lane


From: Andres Freund <andres(at)anarazel(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: tbm_lossify causes "unbalanced" hashtables / bitmaps
Date: 2016-09-23 22:14:49
Message-ID: 20160923221449.4fvtxhu654k6n3uh@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2016-09-23 17:40:13 -0400, Tom Lane wrote:
> My idea of an appropriate fix would be to resume the scan at the same
> point where the last scan stopped, and work circularly around the table
> when necessary.

I've played with that idea, and it does help a great deal. Not that
surprisingly, it's better than starting at a random point (which in turn
is better than starting at one end all the time).

> But I'm not sure there is any really good way to do that
> in the dynahash infrastructure. Maybe it'd work to keep the iteration
> state around, but I don't remember how well that interacts with other
> insertions/deletions. What about in your implementation?

It's easy enough to specify a start point. Requires exposing some things
that I don't necessarily want to - that's why I played around with a
random start point - but other than that it's easy to
implement. Internally growing the hashtable would be kind of an issue,
invalidating that point, but a) we're most of the time not growing at
that point anymore b) it'd be quite harmless to start at the wrong
point.

> There's also the point mentioned in the existing comment, that it'd be
> better to go after pages with more bits set first. Not sure of an
> inexpensive way to do that (ie, one that doesn't involve multiple
> scans of the hashtable). But your results suggest that maybe it'd
> be worth making tbm_lossify slower in order to get better results.

It's not easy, I agree.

Greetings,

Andres Freund