Re: tweaking NTUP_PER_BUCKET

From: Tomas Vondra <tv(at)fuzzy(dot)cz>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: tweaking NTUP_PER_BUCKET
Date: 2014-07-19 21:07:41
Message-ID: 53CADE1D.3010601@fuzzy.cz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 19.7.2014 20:28, Tomas Vondra wrote:
> On 19.7.2014 20:24, Tom Lane wrote:
>> Tomas Vondra <tv(at)fuzzy(dot)cz> writes:
>>> I've reviewed the two test cases mentioned here, and sadly there's
>>> nothing that can be 'fixed' by this patch. The problem here lies in the
>>> planning stage, which decides to hash the large table - we can't fix
>>> that in the executor.
>>
>> We've heard a couple reports before of the planner deciding to hash a
>> larger table rather than a smaller one. The only reason I can think of
>> for that is if the smaller table has many more duplicates, so that the
>> planner thinks the executor might end up traversing long hash chains.
>> The planner's estimates could easily be off in this area, of course.
>> estimate_hash_bucketsize() is the likely culprit if it's wrong.
>>
>> Which test case are you seeing this in, exactly?
>
> The two reported by Stephen here:
>
>
> http://www.postgresql.org/message-id/20130328235627.GV4361@tamriel.snowman.net
>
> Just download this (I had to gunzip it):
>
> http://snowman.net/~sfrost/test_case.sql
> http://snowman.net/~sfrost/test_case2.sql

Anyway, looking a bit at the distributions, I don't think the
distributions are significantly skewed. In the first testscase, I get this

test=# select cnt, count(*) from (select id_short, count(*) AS cnt from
small_table group by 1) foo group by cnt order by cnt;
cnt | count
-----+-------
1 | 23
2 | 18795
3 | 13
4 | 726
5 | 4
6 | 93
8 | 4
10 | 1
(8 rows)

and in the second one I get this:

test=# select cnt, count(*) from (select id_short, count(*) AS cnt from
small_table group by 1) foo group by cnt order by cnt;
cnt | count
-----+--------
1 | 182161
2 | 5582
3 | 371
4 | 28
5 | 2
6 | 3
9 | 1
(7 rows)

So while #1 contains most values twice, #2 is almost perfectly distinct.
In the big_table, the values are unique.

For the first case, a WARNING at the end of estimate_hash_bucketsize
says this:

WARNING: nbuckets=8388608.00 estfract=0.000001
WARNING: nbuckets=65536.00 estfract=0.000267

There are 4.3M rows in the big_table, so it says ~4 tuples (selectivity
>= 1e-6), and ~10 tuples/bucket for the small_table (42k rows).

For the second case, I get this:

WARNING: nbuckets=16777216.00 estfract=0.000001
WARNING: nbuckets=262144.00 estfract=0.000100

i.e. ~11 tuples/bucket for big_table and ~20 tuples for small_table.

That sounds really strange, because small_table in the second test case
is almost perfectly unique. And in the first test case, there are only
very few keys with >2 occurences.

By explicitly setting the selectivity in estimate_hash_bucketsize to
1e-6, I got these plans:

Test case #1

QUERY PLAN
---------------------------------------------------------------------
Hash Join (cost=1229.46..79288.95 rows=41176 width=24) (actual
time=50.397..634.986 rows=13731 loops=1)
Hash Cond: (big_table.id_short = small_table.id_short)
-> Seq Scan on big_table (cost=0.00..61626.71 rows=4272271 width=4)
(actual time=0.018..229.597 rows=4272271 loops=1)
-> Hash (cost=714.76..714.76 rows=41176 width=24) (actual
time=31.653..31.653 rows=41176 loops=1)
Buckets: 65536 Batches: 1 Memory Usage: 3086kB
-> Seq Scan on small_table (cost=0.00..714.76 rows=41176
width=24) (actual time=0.012..13.341 rows=41176 loops=1)
Planning time: 0.597 ms
Execution time: 635.498 ms
(8 rows)

Without explain analyze: 486 ms (original plan: >850ms).

Test case #2

QUERY PLAN
---------------------------------------------------------------------
Hash Join (cost=5240.21..220838.44 rows=194587 width=4) (actual
time=88.252..2143.457 rows=149540 loops=1)
Hash Cond: (big_table.id_short = small_table.id_short)
-> Seq Scan on big_table (cost=0.00..169569.72 rows=11755372
width=4) (actual time=0.018..533.955 rows=11755475 loops=1)
-> Hash (cost=2807.87..2807.87 rows=194587 width=4) (actual
time=87.548..87.548 rows=194587 loops=1)
Buckets: 262144 Batches: 1 Memory Usage: 8889kB
-> Seq Scan on small_table (cost=0.00..2807.87 rows=194587
width=4) (actual time=0.012..29.929 rows=194587 loops=1)
Planning time: 0.438 ms
Execution time: 2146.818 ms
(8 rows)

Without explain analyze: 1600 ms (original plan: >2300ms)

The differences are fairly small - well, it's 30-40% faster, but it's
less than 1s in both cases. I'm wondering whether we could get into
similar problems with much longer queries and still get 30-40% improvement.

Tomas

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2014-07-19 22:12:56 Re: tweaking NTUP_PER_BUCKET
Previous Message John Cochran 2014-07-19 18:30:45 Re: Proposal for updating src/timezone