Quick Links

Re: costing of hash join

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: costing of hash join
Date:	2014-01-03 22:50:55
Message-ID:	27396.1388789455@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Jeff Janes <jeff(dot)janes(at)gmail(dot)com> writes:
> I'm trying to figure out why hash joins seem to be systematically underused
> in my hands. In the case I am immediately looking at it prefers a merge
> join with both inputs getting seq scanned and sorted, despite the hash join
> being actually 2 to 3 times faster, where inputs and intermediate working
> sets are all in memory. I normally wouldn't worry about a factor of 3
> error, but I see this a lot in many different situations. The row
> estimates are very close to actual, the errors is only in the cpu estimates.

Can you produce a test case for other people to look at?

What datatype(s) are the join keys?

> A hash join is charged cpu_tuple_cost for each inner tuple for inserting it
> into the hash table:

Doesn't seem like monkeying with that is going to account for a 3x error.

Have you tried using perf or oprofile or similar to see where the time is
actually, rather than theoretically, going?

regards, tom lane

In response to

costing of hash join at 2014-01-03 22:18:50 from Jeff Janes

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Peter Geoghegan	2014-01-03 22:55:24	Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE
Previous Message	Jeff Janes	2014-01-03 22:18:50	costing of hash join