Re: Proposed Patch to Improve Performance of Multi-Batch Hash Join for Skewed Data Sets

From: "Joshua Tolley" <eggyknap(at)gmail(dot)com>
To: "Bryce Cutt" <pandasuit(at)gmail(dot)com>
Cc: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Lawrence, Ramon" <ramon(dot)lawrence(at)ubc(dot)ca>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Proposed Patch to Improve Performance of Multi-Batch Hash Join for Skewed Data Sets
Date: 2008-11-06 22:33:09
Message-ID: e7e0a2570811061433p34733d1fs2f94f2c508b84e5b@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Nov 5, 2008 at 5:06 PM, Bryce Cutt <pandasuit(at)gmail(dot)com> wrote:
> The error is causes by me Asserting against the wrong variable. I
> never noticed this as I apparently did not have assertions turned on
> on my development machine. That is fixed now and with the new patch
> version I have attached all assertions are passing with your query and
> my test queries. I added another assertion to that section of the
> code so that it is a bit more vigorous in confirming the hash table
> partition is correct. It does not change the operation of the code.
>
> There are two partition counts. One holds the maximum number of
> buckets in the hash table and the other counts the number of actual
> buckets created for hash values. I was incorrectly testing against
> the second one because that was valid before I started using a hash
> table to store the buckets.
>
> The enable_hashjoin_usestatmcvs flag was valuable for my own research
> and tests and likely useful for your review but Tom is correct that it
> can be removed in the final version.
>
> - Bryce Cutt

Well, that builds nicely, lets me import the data, and I've seen a
performance improvement with enable_hashjoin_usestatmcvs on vs. off. I
plan to test that more formally (though probably not fully to the
extent you did in your paper; just enough to feel comfortable that I'm
getting similar results). Then I'll spend some time poking in the
code, for the relatively little good I feel I can do in that capacity,
and I'll also investigate scenarios with particularly inaccurate
statistics. Stay tuned.

- Josh

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Joshua D. Drake 2008-11-06 22:34:07 Re: Final /contrib cleanup -- yes/no?
Previous Message Tom Lane 2008-11-06 22:24:09 Re: Final /contrib cleanup -- yes/no?