Re: bad estimation together with large work_mem generates terrible slow hash joins

From: Tomas Vondra <tv(at)fuzzy(dot)cz>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: bad estimation together with large work_mem generates terrible slow hash joins
Date: 2014-08-16 13:31:08
Message-ID: 53EF5D1C.3060505@fuzzy.cz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 12.8.2014 00:30, Tomas Vondra wrote:
> On 11.8.2014 20:25, Robert Haas wrote:
>> It also strikes me that when there's only 1 batch, the set of bits
>> that map onto the batch number is zero-width, and one zero-width bit
>> range is as good as another. In other words, if you're only planning
>> to do one batch, you can easily grow the number of buckets on the fly.
>> Growing the number of buckets only becomes difficult once you have
>> more than one batch.

...

> I was considering using reversing the bits of the hash, because that's
> pretty much the simplest solution. But I think you're right it might
> actually work like this:
>
> * Are more batches needed?
>
> (yes) => just use nbuckets = my_log2(work_mem / tuple_size)
>
> (no) => go ahead, until processing all tuples or hitting work_mem
>
> (work_mem) => meh, use the same nbuckets above
>
> (all tuples) => compute optimal nbuckets / resize
>
>
> But I need to think about this a bit. So far it seems to me there's no
> way additional batches might benefit from increasing nbuckets further.

I think this is a simple and solid solution, solving the batchno
computation issues quite nicely. Attached is v10 patch (bare and
combined with the dense allocation), that does this:

1) when we know we'll need batching, buckets are sized for full work_mem
(using the estimated tuple width, etc.)

2) without the batching, we estimate the 'right number of buckets' for
the estimated number of tuples, and keep track of the optimal number
as tuples are added to the hash table

- if we discover we need to start batching, we keep the current
optimal value (which should be the same as the max number of
buckets) and don't mess with it anymore (making it possible to
compute batch IDs just like before)

- also, on the fist rebatch (nbatch=1 => nbatch=2) the hash table
is resized as part of the rebatch

- if the hash build completes without batching, we do the resize

I believe the patch is pretty much perfect. I plan to do more thorough
testing on a wide range of queries in the next few days.

I also removed the 'enable_hash_resize' GUC, because it would be more
complex to implement this properly after doing the resize as part of
rebatch etc.. So either it would make the patch more complex, or it
wouldn't do what the name promises.

regards
Tomas

Attachment Content-Type Size
hashjoin-nbuckets-growth-v10-combined.patch text/x-diff 22.6 KB
hashjoin-nbuckets-growth-v10.patch text/x-diff 15.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2014-08-16 14:00:50 Re: 9.5: Better memory accounting, towards memory-bounded HashAgg
Previous Message Amit Kapila 2014-08-16 12:46:46 Re: option -T in pg_basebackup doesn't work on windows