Re: A better way than tweaking NTUP_PER_BUCKET

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Bruce Momjian <bruce(at)momjian(dot)us>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: A better way than tweaking NTUP_PER_BUCKET
Date: 2014-01-27 17:36:10
Message-ID: 20140127173610.GG31026@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

* Simon Riggs (simon(at)2ndQuadrant(dot)com) wrote:
> I don't see anything for 9.4 in here now.

Attached is what I was toying with (thought I had attached it previously
somewhere.. perhaps not), but in re-testing, it doesn't appear to do
enough to move things in the right direction in all cases. I did play
with this a fair bit yesterday and while it improved some cases by 20%
(eg: a simple join between pgbench_accounts and pgbench_history), when
we decide to *still* hash the larger side (as in my 'test_case2.sql'),
it can cause a similairly-sized decrease in performance. Of course, if
we can push that case to hash the smaller side (which I did by hand with
cpu_tuple_cost), then it goes back to being a win to use a larger number
of buckets.

I definitely feel that there's room for improvment here but it's not an
easily done thing, unfortunately. To be honest, I was pretty surprised
when I saw that the larger number of buckets performed worse, even if it
was when we picked the "wrong" side to hash and I plan to look into that
more closely to try and understand what's happening. My first guess
would be what Tom had mentioned over the summer- if the size of the
bucket array ends up being larger than the CPU cache, we can end up
paying a great deal more to build the hash table than it costs to scan
through the deeper buckets that we end up with as a result (particularly
when we're scanning a smaller table). Of course, choosing to hash the
larger table makes that more likely..

Thanks,

Stephen

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2014-01-27 17:40:45 Re: A better way than tweaking NTUP_PER_BUCKET
Previous Message Ronan Dunklau 2014-01-27 17:35:43 Re: [bug fix] "pg_ctl stop" times out when it should respond quickly