Re: tweaking NTUP_PER_BUCKET

From: Tomas Vondra <tv(at)fuzzy(dot)cz>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: tweaking NTUP_PER_BUCKET
Date: 2014-07-03 18:03:08
Message-ID: 53B59ADC.5060709@fuzzy.cz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 3.7.2014 02:13, Tomas Vondra wrote:
> Hi,
>
> while hacking on the 'dynamic nbucket' patch, scheduled for the next CF
> (https://commitfest.postgresql.org/action/patch_view?id=1494) I was
> repeatedly stumbling over NTUP_PER_BUCKET. I'd like to propose a change
> in how we handle it.
>
>
> TL;DR; version
> --------------
>
> I propose dynamic increase of the nbuckets (up to NTUP_PER_BUCKET=1)
> once the table is built and there's free space in work_mem. The patch
> mentioned above makes implementing this possible / rather simple.

Attached is v1 of this experimental patch. It's supposed to be applied
on top of v7 of this patch

http://www.postgresql.org/message-id/53B59498.3000800@fuzzy.cz

as it shared most of the implementation. So apply it like this:

patch -p1 < hashjoin-nbuckets-growth-v7.patch
patch -p1 < hashjoin-dynamic-ntup-v1.patch

It implements the ideas outlined in the previous message, most importantly:

(a) decreases NTUP_PER_BUCKET to 4

(b) checks free work_mem when deciding whether to add batch

(c) after building the batches, increases the number of buckets as
much as possible, i.e. up to the number of batch tuples, but not
exceeding work_mem

The improvements for the queries I tried so far are quite significant
(partially due to the NTUP_PER_BUCKET decrease, partially due to the
additional bucket count increase).

The rebuild is quite fast - the patch measures and reports this as a
WARNING, and the timings I've seen are ~12ms per 7MB (i.e. ~120ms for
70MB and ~1200ms for 700MB). Of course, this only makes sense when
compared to how much time it saved, but for the queries I tested so far
this was a good investment.

However it's likely there are queries where this may not be the case,
i.e. where rebuilding the hash table is not worth it. Let me know if you
can construct such query (I wasn't).

regards
Tomas

Attachment Content-Type Size
hashjoin-dynamic-ntup-v1.patch text/x-diff 5.9 KB
hashjoin-nbuckets-growth-v7.patch text/x-diff 14.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2014-07-03 18:10:13 Re: tweaking NTUP_PER_BUCKET
Previous Message Tomas Vondra 2014-07-03 17:36:24 Re: bad estimation together with large work_mem generates terrible slow hash joins