Re: PATCH: postpone building buckets to the end of Hash (in HashJoin)

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PATCH: postpone building buckets to the end of Hash (in HashJoin)
Date: 2016-01-19 21:49:07
Message-ID: 569EAF53.30202@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 01/19/2016 08:34 PM, Robert Haas wrote:
> On Mon, Jan 18, 2016 at 10:57 PM, Tomas Vondra
> <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>>>> If this doesn't regress performance in the case where the number of
>>>> buckets is estimated accurately to begin with, then I think this is
>>>> a great idea. Can you supply some performance tests results for that
>>>> case, and maybe some of the other cases also?
>>>
>>> I don't see how it could regress performance, and the benchmarks I've
>>> done confirm that. I'll do more thorough benchmarking and post the
>>> results here, but not now as this patch is in 2016-01 CF and I want to
>>> put all my time into reviewing patches from the open commitfest.
>>
>> I've finally got to do more thorough benchmarks, and surprisingly there
>> seems to be a minor regression.
> ...
>>
>> So even if we entirely skipped the bucket build, we would not recover the
>> difference. Not sure what's going on here.
>>
>> I've also done some profiling using perf, but I haven't really found
>> anything suspicious there.
>>
>> Any ideas what might be the cause of this?
>
> My guess is that memory locality is not as good with this patch. Consider this:
>
> copyTuple->next = hashtable->buckets[bucketno];
>
> We've only just populated copytuple via memcpy, so that cache line is
> presumably in whatever place cache lines go when they are really hot.
> We basically make one pass over the allocated space for the hash
> table, filling in the hash tuples and linking things into bucket
> chains. OTOH, with the patch, we don't do that on the first pass,
> just writing the tuples. Then when we come back through we still have
> to do that part:
>
> hashTuple->next = hashtable->buckets[bucketno];
>
> By the time we're doing this, especially at larger work_mem settings,
> we've probably pushed those cache lines out of the faster levels of
> the CPU cache, and we have to pull them back in again, and that takes
> time. If the tuples are small, we'll dirty every cache line twice;
> if they're larger, we might not dirty every cache line on the second
> pass, but just some fraction of them.
>
> I could be totally off base, but this is why I was concerned about the patch.

I can totally see why this would slow-down the BuildBuckets function,
but I don't quite see why it should make the other code significantly
slower. Yet BuildBuckets takes just ~25ms while the total duration
increases by ~200ms (for the 1x10 dataset).

I do understand calling BuildBuckets may affect the code executed after
that as it keeps other data in the cache, but i would not expect ~175ms.

Yet another thing is that BuildBuckets accesses the tuples in mostly
sequential way, so I'd expect prefetch to take care of that.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2016-01-19 21:50:20 Re: Combining Aggregates
Previous Message David Rowley 2016-01-19 21:44:16 Re: Combining Aggregates