Re: costing of hash join

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: costing of hash join
Date: 2014-01-03 22:50:55
Message-ID: 27396.1388789455@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Jeff Janes <jeff(dot)janes(at)gmail(dot)com> writes:
> I'm trying to figure out why hash joins seem to be systematically underused
> in my hands. In the case I am immediately looking at it prefers a merge
> join with both inputs getting seq scanned and sorted, despite the hash join
> being actually 2 to 3 times faster, where inputs and intermediate working
> sets are all in memory. I normally wouldn't worry about a factor of 3
> error, but I see this a lot in many different situations. The row
> estimates are very close to actual, the errors is only in the cpu estimates.

Can you produce a test case for other people to look at?

What datatype(s) are the join keys?

> A hash join is charged cpu_tuple_cost for each inner tuple for inserting it
> into the hash table:

Doesn't seem like monkeying with that is going to account for a 3x error.

Have you tried using perf or oprofile or similar to see where the time is
actually, rather than theoretically, going?

regards, tom lane

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2014-01-03 22:55:24 Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE
Previous Message Jeff Janes 2014-01-03 22:18:50 costing of hash join