Re: bad estimation together with large work_mem generates terrible slow hash joins

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tomas Vondra <tv(at)fuzzy(dot)cz>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: bad estimation together with large work_mem generates terrible slow hash joins
Date: 2014-09-09 14:09:59
Message-ID: CA+TgmoaRB4r6norb1P-2xFXavwLjd4EUwi_ERnNfQ5PQq0WOYA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Sep 8, 2014 at 5:53 PM, Tomas Vondra <tv(at)fuzzy(dot)cz> wrote:
> So I only posted the separate patch for those who want to do a review,
> and then a "complete patch" with both parts combined. But it sure may be
> a bit confusing.

Let's do this: post each new version of the patches only on this
thread, as a series of patches that can be applied one after another.

>> In ExecChooseHashTableSize(), if buckets_bytes is independent of
>> nbatch, could we just compute it before working out dbatch, and just
>> deduct it from hash_table_bytes? That seems like it'd do the same
>> thing for less work.
>
> Well, we can do that. But I really don't think that's going to make
> measurable difference. And I think the code is more clear this way.

Really? It seems to me that you're doing more or less the same
calculation that's already been done a second time. It took me 20
minutes of staring to figure out what it was really doing. Now
admittedly I was a little low on caffeine, but...

>> Either way, what happens if buckets_bytes is already bigger than
>> hash_table_bytes?
>
> Hm, I don't see how that could happen.

How about an Assert() and a comment, then?

>> A few more stylistic issues that I see:
>>
>> + if ((hashtable->nbatch == 1) &&
>> + if (hashtable->nbuckets_optimal <= (INT_MAX/2))
>> + if (size > (HASH_CHUNK_SIZE/8))
>>
>> While I'm all in favor of using parentheses generously where it's
>> needed to add clarity, these and similar usages seem over the top to
>> me.
>
> It seemed more readable for me at the time of coding it, and it still
> seems better this way, but ...
>
> https://www.youtube.com/watch?v=CxK_nA2iVXw

Heh. Well, at any rate, I think PostgreSQL style is to not use
parentheses in that situation.

>> +char * chunk_alloc(HashJoinTable hashtable, int size) {
>>
>> Eh... no.
>>
>> + /* XXX maybe we should use MAXALIGN(size) here ... */
>>
>> +1.
>
> I'm not sure what's the 'no' pointing at? Maybe that the parenthesis
> should be on the next line? And is the +1 about doing MAXALING?

The "no" is about the fact that you have the return type, the function
name, and the opening brace on one line instead of three lines as is
project style. The +1 is for applying MAXALIGN to the size.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Albe Laurenz 2014-09-09 14:16:13 Re: Optimization for updating foreign tables in Postgres FDW
Previous Message Arthur Silva 2014-09-09 14:08:05 Memory Alignment in Postgres