Re: bad estimation together with large work_mem generates terrible slow hash joins

From: Tomas Vondra <tv(at)fuzzy(dot)cz>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: bad estimation together with large work_mem generates terrible slow hash joins
Date: 2014-07-03 18:35:54
Message-ID: 53B5A28A.1080803@fuzzy.cz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 3.7.2014 15:42, Atri Sharma wrote:
>
>
>
> On Tue, Jul 1, 2014 at 4:54 AM, Tomas Vondra <tv(at)fuzzy(dot)cz
> <mailto:tv(at)fuzzy(dot)cz>> wrote:
>
> On 30.6.2014 23:12, Tomas Vondra wrote:
> > Hi,
> >
> > attached is v5 of the patch. The main change is that scaling the
> number
> > of buckets is done only once, after the initial hash table is
> build. The
> > main advantage of this is lower price. This also allowed some
> cleanup of
> > unecessary code.
> >
> > However, this new patch causes warning like this:
> >
> > WARNING: temporary file leak: File 231 still referenced
> >
> > I've been eyeballing the code for a while now, but I still fail to see
> > where this comes from :-( Any ideas?
>
> Meh, the patches are wrong - I haven't realized the tight coupling
> between buckets/batches in ExecHashGetBucketAndBatch:
>
> *bucketno = hashvalue & (nbuckets - 1);
> *batchno = (hashvalue >> hashtable->log2_nbuckets) & (nbatch - 1);
>
> The previous patches worked mostly by pure luck (the nbuckets was set
> only in the first batch), but once I moved the code at the end, it
> started failing. And by "worked" I mean "didn't throw an error, but
> probably returned bogus results with (nbatch>1)".
>
> However, ISTM this nbuckets-nbatch coupling is not really necessary,
> it's only constructed this way to assign independent batch/bucket. So I
> went and changed the code like this:
>
> *bucketno = hashvalue & (nbuckets - 1);
> *batchno = (hashvalue >> (32 - hashtable->log2_nbatch));
>
> I.e. using the top bits for batchno, low bits for bucketno (as before).
>
> Hopefully I got it right this time. At least it seems to be working for
> cases that failed before (no file leaks, proper rowcounts so far).
>
>
> Hi,
>
> I had a look at the patch and I was wondering if there is a way we can
> set a cap on the resized size, and instead increase the number of
> batches instead? Since this patch evaluates the growth of tuples vs
> increase of space, we could look at increasing the number of batches
> itself instead of number of buckets, if the resized number of buckets
> exceeds a threshold. It shouldnt be too hard, AIUI it would involve a
> call in MultiExecHash right after the new cost evaluation the patch adds.

So you propose to fill the hash table, and when hitting NTUP_PER_BUCKET
(i.e. after adding 'nbuckets * NTUP_PER_BUCKET' tuples) increase the
number of batches?

Hmmm, that'd be easy to implement. I don't think it's possible to do
that in MultiExecHash (that's too late). But it's a matter of changing
this condition in ExecHashTableInsert:

if (hashtable->spaceUsed > hashtable->spaceAllowed)
ExecHashIncreaseNumBatches(hashtable);

to something like this

if ((hashtable->spaceUsed > hashtable->spaceAllowed) ||
(hashtable->totalTuples / hashtable->nbatch
== NTUP_PER_BUCKET * hashtable->nbuckets))
ExecHashIncreaseNumBatches(hashtable);

But I don't think increasing number of batches is a good approach, as it
means more rescans. I think using memory is more efficient (after all,
that's what work_mem is for, right?).

regards
Tomas

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Atri Sharma 2014-07-03 18:40:26 Re: tweaking NTUP_PER_BUCKET
Previous Message Stephen Frost 2014-07-03 18:10:13 Re: tweaking NTUP_PER_BUCKET