Re: Proposed Patch to Improve Performance of Multi-Batch Hash Join for Skewed Data Sets

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Lawrence, Ramon" <ramon(dot)lawrence(at)ubc(dot)ca>
Cc: pgsql-hackers(at)postgresql(dot)org, "Bryce Cutt" <pandasuit(at)gmail(dot)com>
Subject: Re: Proposed Patch to Improve Performance of Multi-Batch Hash Join for Skewed Data Sets
Date: 2008-11-21 00:44:06
Message-ID: 22901.1227228246@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

"Lawrence, Ramon" <ramon(dot)lawrence(at)ubc(dot)ca> writes:
> We propose a patch that improves hybrid hash join's performance for
> large multi-batch joins where the probe relation has skew.
> ...
> The basic idea
> is to keep build relation tuples in a small in-memory hash table that
> have join values that are frequently occurring in the probe relation.

I looked at this patch a little.

I'm a tad worried about what happens when the values that are frequently
occurring in the outer relation are also frequently occurring in the
inner (which hardly seems an improbable case). Don't you stand a severe
risk of blowing out the in-memory hash table? It doesn't appear to me
that the code has any way to back off once it's decided that a certain
set of join key values are to be treated in-memory. Splitting the main
join into more batches certainly doesn't help with that.

Also, AFAICS the benefit of this patch comes entirely from avoiding dump
and reload of tuples bearing the most common values, which means it's a
significant waste of cycles when there's only one batch. It'd be better
to avoid doing any of the extra work in the single-batch case.

One thought that might address that point as well as the difficulty of
getting stats in nontrivial cases is to wait until we've overrun memory
and are forced to start batching, and at that point determine on-the-fly
which are the most common hash values from inspection of the hash table
as we dump it out. This would amount to optimizing on the basis of
frequency in the *inner* relation not the outer, but offhand I don't see
any strong theoretical basis why that wouldn't be just as good. It
could lose if the first work_mem worth of inner tuples isn't
representative of what follows; but this hardly seems more dangerous
than depending on MCV stats that are for the whole outer relation rather
than the portion of it being selected.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ron Mayer 2008-11-21 01:07:40 Re: Re: Updated interval patches - ECPG [was, intervalstyle....]
Previous Message Bruce Momjian 2008-11-21 00:21:00 Re: [HACKERS] [FINALLY] the TODO list has migrated to Wiki