Re: bad estimation together with large work_mem generates terrible slow hash joins

From: Tomas Vondra <tv(at)fuzzy(dot)cz>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: bad estimation together with large work_mem generates terrible slow hash joins
Date: 2014-09-10 18:59:52
Message-ID: 54109FA8.5000706@fuzzy.cz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 10.9.2014 20:25, Heikki Linnakangas wrote:
> On 09/10/2014 01:49 AM, Tomas Vondra wrote:
>> I also did a few 'minor' changes to the dense allocation patch, most
>> notably:
>>
>> * renamed HashChunk/HashChunkData to MemoryChunk/MemoryChunkData
>> The original naming seemed a bit awkward.
>
> That's too easy to confuse with regular MemoryContext and AllocChunk
> stuff. I renamed it to HashMemoryChunk.

OK.

>
>> * the chunks size is 32kB (instead of 16kB), and we're using 1/4
>> threshold for 'oversized' items
>>
>> We need the threshold to be >=8kB, to trigger the special case
>> within AllocSet. The 1/4 rule is consistent with ALLOC_CHUNK_FRACTION.
>
> Should we care about the fact that if there are only a few tuples, we
> will nevertheless waste 32kB of memory for the chunk? I guess not, but I
> thought I'd mention it. The smallest allowed value for work_mem is 64kB.

I don't think that's a problem.

>> I also considered Heikki's suggestion that while rebatching, we can
>> reuse chunks with a single large tuple, instead of copying it
>> needlessly. My attempt however made the code much uglier (I almost used
>> a GOTO, but my hands rejected to type it ...). I'll look into that.
>
> I tried constructing a test case where the rehashing time would be
> significant enough for this to matter, but I couldn't find one. Even
> if the original estimate on the number of batches is way too small,
> we ramp up quickly to a reasonable size and the rehashing time is
> completely insignificant compared to all the other work. So I think
> we can drop that idea.

I don't think that had anything to do with rehashing. What you pointed
out is that when rebatching, we have to keep ~50% of the tuples, which
means 'copy into a new chunk, then throw away the old ones'.

For large tuples (you mentioned 100MB tuples, IIRC), we needlessly
allocate this amount of memory only to copy the single tuple and then
throw away the old tuple. So (a) that's an additional memcpy, and (b) it
allocates additional 100MB of memory for a short period of time.

Now, I'd guess when dealing with tuples this large, there will be more
serious trouble elsewhere, but I don't want to neglect it. Maybe it's
worth optimizing?

Tomas

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2014-09-10 19:02:18 Re: bad estimation together with large work_mem generates terrible slow hash joins
Previous Message Heikki Linnakangas 2014-09-10 18:55:50 Re: bad estimation together with large work_mem generates terrible slow hash joins