BUG #6668: hashjoin cost problem

Lists: pgsql-bugs
From: postgresuser(at)yahoo(dot)com
To: pgsql-bugs(at)postgresql(dot)org
Subject: BUG #6668: hashjoin cost problem
Date: 2012-05-31 00:24:49
Message-ID: E1SZtC5-00028E-DI@wrigleys.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

The following bug has been logged on the website:

Bug reference: 6668
Logged by: Postgres User
Email address: postgresuser(at)yahoo(dot)com
PostgreSQL version: 9.1.3
Operating system: Ubuntu
Description:

work_mem 1MB

create table small(i) as select (g/1000) * 1000 from
generate_series(1,10000) g;

create table large(i) as select generate_series(1,100000000);

vacuum;
vacuum;
vacuum analyze;

explain analyze select * from small inner join large using (i);

QUERY PLAN


------------------------------------------------------------------------------------------------------------------------
-------------
Hash Join (cost=3083103.00..3475328.00 rows=10000 width=4) (actual
time=84079.899..84989.419 rows=9001 loops=1)
Hash Cond: (small.i = large.i)
-> Seq Scan on small (cost=0.00..145.00 rows=10000 width=4) (actual
time=0.008..0.588 rows=10000 loops=1)
-> Hash (cost=1442478.00..1442478.00 rows=100000000 width=4) (actual
time=84079.741..84079.741 rows=100000000 loops
=1)
Buckets: 4096 Batches: 4096 Memory Usage: 853kB
-> Seq Scan on large (cost=0.00..1442478.00 rows=100000000
width=4) (actual time=0.005..59011.443 rows=100000
000 loops=1)
Total runtime: 84990.270 ms
(7 rows)

It doesn't matter how big the big table is... for this distribution large
table is hashed.

Forcing (gdb) the cost in one of the cost_hashjoin calls to 0, it chooses to
hash the smaller table with better results:

explain analyze select * from small inner join large using (i);
QUERY PLAN


------------------------------------------------------------------------------------------------------------------------
------
Hash Join (cost=270.00..0.00 rows=10000 width=4) (actual
time=14028.034..16510.598 rows=9001 loops=1)
Hash Cond: (large.i = small.i)
-> Seq Scan on large (cost=0.00..1442478.00 rows=100000000 width=4)
(actual time=0.010..5893.344 rows=100000000 loo
ps=1)
-> Hash (cost=145.00..145.00 rows=10000 width=4) (actual
time=3.854..3.854 rows=10000 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 352kB
-> Seq Scan on small (cost=0.00..145.00 rows=10000 width=4)
(actual time=0.005..1.585 rows=10000 loops=1)
Total runtime: 16510.962 ms
(7 rows)

More in gdb, all of the cost seems to come from:

run_cost += hash_qual_cost.per_tuple * outer_path_rows *
clamp_row_est(inner_path_rows * innerbucketsize) * 0.5;

(outer_path_rows * clamp_row_est(inner_path_rows * innerbucketsize) * 0.5)
is 50 billion, leading to a wild cost. The parent's estimate of the number
of rows is rightly estimated at 10000, so 50 billion comparisons is
obviously bad estimate.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: postgresuser(at)yahoo(dot)com
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #6668: hashjoin cost problem
Date: 2012-05-31 05:03:24
Message-ID: 28259.1338440604@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

postgresuser(at)yahoo(dot)com writes:
> create table small(i) as select (g/1000) * 1000 from
> generate_series(1,10000) g;
> create table large(i) as select generate_series(1,100000000);

> It doesn't matter how big the big table is... for this distribution large
> table is hashed.

I don't think that's wrong. If it hashes the small table, there cannot
be less than 1000 entries on each populated hash chain; adding more
work_mem doesn't help. The planner is designed to avoid hashing such
unfriendly distributions as that. The fact that you can get a somewhat
smaller runtime by forcing hashing in the other direction suggests that
its cost factors are not quite right for your specific case --- but it's
a long way from that observation to deciding that we should change the
cost factors for everyone. In any case, the sizes of the tables are not
the only determinant of which one should be hashed.

regards, tom lane


From: Postgres User <postgresuser(at)yahoo(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "pgsql-bugs(at)postgresql(dot)org" <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #6668: hashjoin cost problem
Date: 2012-05-31 05:57:17
Message-ID: 1338443837.61025.YahooMailNeo@web121105.mail.ne1.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Why is cost_hashjoin estimating 50 billion tuple comparisons for 10K rows of output though?

________________________________
From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: postgresuser(at)yahoo(dot)com
Cc: pgsql-bugs(at)postgresql(dot)org
Sent: Wednesday, May 30, 2012 10:03 PM
Subject: Re: [BUGS] BUG #6668: hashjoin cost problem

postgresuser(at)yahoo(dot)com writes:
> create table small(i) as select (g/1000) * 1000 from
> generate_series(1,10000) g;
> create table large(i) as select generate_series(1,100000000);

> It doesn't matter how big the big table is... for this distribution large
> table is hashed.

I don't think that's wrong.  If it hashes the small table, there cannot
be less than 1000 entries on each populated hash chain; adding more
work_mem doesn't help.  The planner is designed to avoid hashing such
unfriendly distributions as that.  The fact that you can get a somewhat
smaller runtime by forcing hashing in the other direction suggests that
its cost factors are not quite right for your specific case --- but it's
a long way from that observation to deciding that we should change the
cost factors for everyone.  In any case, the sizes of the tables are not
the only determinant of which one should be hashed.

            regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Postgres User <postgresuser(at)yahoo(dot)com>
Cc: "pgsql-bugs(at)postgresql(dot)org" <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #6668: hashjoin cost problem
Date: 2012-05-31 22:48:01
Message-ID: 23375.1338504481@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Postgres User <postgresuser(at)yahoo(dot)com> writes:
> Why is cost_hashjoin estimating 50 billion tuple comparisons for 10K rows of output though?

Well, if it hashes the smaller table, there's 100 million rows on the
outside, and each of them will probe one hash chain in the hash table.
If you're unlucky, each of those probes will hit a populated hash chain
with at least 1000 entries, leading to 100 billion comparisons. I think
it might derate that worst-case by a factor of 2. Now if you're lucky,
a lot of the outer tuples hit unpopulated hash chains and so the number
of comparisons is a lot less --- but in non-artificial examples,
that's not a very good bet to make. The conservative assumption is that
both sides of the join have similar key distributions, so that the more
populated hash chains are also more likely to be probed. The cost
estimate is therefore designed to discriminate against using an inner
relation with a non-flat distribution.

regards, tom lane