Re: why postgresql define NTUP_PER_BUCKET as 10, not other numbers smaller

From: b8flowerfire <b8flowerfire(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: why postgresql define NTUP_PER_BUCKET as 10, not other numbers smaller
Date: 2014-06-10 05:13:08
Message-ID: 1402377188082-5806617.post@n5.nabble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Robert Haas wrote
> On Mon, Jun 9, 2014 at 4:06 AM, b8flowerfire &lt;

> b8flowerfire@

> &gt; wrote:
>
> This has come up before. Basically, the problem is that if you reduce
> NTUP_PER_BUCKET, the bucket array gets larger, which might reduce the
> amount of space available for tuples to the point where the hash join
> overflows to multiple batches. That will be more expensive than
> performing the hash join with a higher NTUP_PER_BUCKET.
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company

Thanks for the explanation. But i don't think it is very convincible.
Simply reduce the value of NTUP_PER_BUCKET will enlarge the pointer array
and reduce the tuples in one batch. But is that effect significant to the
performance?
The utilization of the work_mem, i think, is determined by the ratio of size
of the pointer and the size of the tuple.
Let's assume the size of tuple is 28 bytes, which is very reasonable because
it's the sum of the size of HJTUPLE_OVERHEAD(at least 8 bytes), the size of
MinimalTupleData(at least 10 bytes) and the content of a tuple(assume 10
bytes). And the size of pointer is 4 bytes.
So, if NTUP_PER_BUCKET is set to 10, about (28 * 10 / 28 * 10 + 4) of the
work_mem is used to store tuples. If NTUP_PER_BUCKET is set to 1, about (28
/ 28 + 4) of the work_mem is used to store tuples, reduced to 90% of the
original.
As a result, changing the value of NTUP_PER_BUCKET to 1 may increase the
batches number by only about 10%. So it that enough to effect the
performance? Or maybe i can not do the calculation simply in this way.

Besides, we have larger main-memory now. If we set the work_mem larger, the
more batches effect introduced by the smaller NTUP_PER_BUCKET value may be
reduced, couldn't it?

I have read about discussion about the NTUP_PER_BUCKET before. It seems that
if we change NTUP_PER_BUCKET to 50 or even larger, the performance wouldn't
be much worse. Because every tuple in the chain of a bucket has a hash
value. Having more tuples in a bucket simply increase some comparisons of
two integers. So is it the same if we change it smaller, that we could not
get much better? Is it one of the reasons that we define it as 10?

After all, it is my first time to discuss in a open source community. Thank
you all for the reply and the discussion. Thanks.

--
View this message in context: http://postgresql.1045698.n5.nabble.com/why-postgresql-define-NTUP-PER-BUCKET-as-10-not-other-numbers-smaller-tp5806472p5806617.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message furuyao 2014-06-10 08:01:47 Re: pg_receivexlog add synchronous mode
Previous Message Amit Kapila 2014-06-10 04:27:38 Re: [bug fix] Memory leak in dblink