Re: Proposed Patch to Improve Performance of Multi-Batch Hash Join for Skewed Data Sets

From: "Bryce Cutt" <pandasuit(at)gmail(dot)com>
To: "Joshua Tolley" <eggyknap(at)gmail(dot)com>
Cc: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Lawrence, Ramon" <ramon(dot)lawrence(at)ubc(dot)ca>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Proposed Patch to Improve Performance of Multi-Batch Hash Join for Skewed Data Sets
Date: 2008-11-06 00:06:11
Message-ID: 1924d1180811051606w19aaf30du589e8ea10ea5534d@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

The error is causes by me Asserting against the wrong variable. I
never noticed this as I apparently did not have assertions turned on
on my development machine. That is fixed now and with the new patch
version I have attached all assertions are passing with your query and
my test queries. I added another assertion to that section of the
code so that it is a bit more vigorous in confirming the hash table
partition is correct. It does not change the operation of the code.

There are two partition counts. One holds the maximum number of
buckets in the hash table and the other counts the number of actual
buckets created for hash values. I was incorrectly testing against
the second one because that was valid before I started using a hash
table to store the buckets.

The enable_hashjoin_usestatmcvs flag was valuable for my own research
and tests and likely useful for your review but Tom is correct that it
can be removed in the final version.

- Bryce Cutt

On Wed, Nov 5, 2008 at 7:22 AM, Joshua Tolley <eggyknap(at)gmail(dot)com> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On Wed, Nov 5, 2008 at 8:20 AM, Tom Lane wrote:
>> Joshua Tolley writes:
>>> On Mon, Oct 20, 2008 at 03:42:49PM -0700, Lawrence, Ramon wrote:
>>>> We propose a patch that improves hybrid hash join's performance for large
>>>> multi-batch joins where the probe relation has skew.
>>
>>> I also recommend modifying docs/src/sgml/config.sgml to include the
>>> enable_hashjoin_usestatmcvs option.
>>
>> If the patch is actually a win, why would we bother with such a GUC
>> at all?
>>
>> regards, tom lane
>
> Good point. Leaving it in place for patch review purposes is useful,
> but we can probably lose it in the end.
>
> - - Josh / eggyknap
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.9 (GNU/Linux)
> Comment: http://getfiregpg.org
>
> iEYEARECAAYFAkkRujsACgkQRiRfCGf1UMNSTACfbpDSQn0HGSVr3jI30GJApcRD
> YbQAn2VZdI/aIalGBrbn1hlRWPEvbgV5
> =LKZ3
> -----END PGP SIGNATURE-----
>

Attachment Content-Type Size
histojoin_v2.patch application/octet-stream 17.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Matthew T. O'Connor 2008-11-06 00:06:35 Re: RAM-only temporary tables
Previous Message Alex Hunsaker 2008-11-05 23:57:07 Re: RAM-only temporary tables