Re: A better way than tweaking NTUP_PER_BUCKET

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: A better way than tweaking NTUP_PER_BUCKET
Date: 2013-06-22 20:40:36
Message-ID: CAOuzzgo5F-t0R=9YoDGZN5=-PaQM9pUHEm3HvN0A-oX_MPSXFw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Saturday, June 22, 2013, Simon Riggs wrote:

> On 22 June 2013 14:48, Stephen Frost <sfrost(at)snowman(dot)net <javascript:;>>
> wrote:
>
> > Based on your argument that we want to have a bucket load which is, on
> > average, the size of NTUP_PER_BUCKET, I have to '-1' on this.
>
> That is not my argument. I am pointing out that the comments claim the
> code does that, and yet the code does not.

To be honest, I don't read that comment quite the same way as you do.

Tom's wish for an average tuples per bucket of 10 may be connected to
> that error; I can't say. But we definitely don't end up with an
> average of 10 in many cases, which is the basis for previous poor
> performance reports.

I don't believe Tom wishes for an average of 10.. That's just what
NTUP_PER_BUCKET has always been set to.

The bottom line is that you're hoping for perfection. If we knew
> exactly how many tuples were in the hash table then we could size it
> precisely for the hash join we are about to perform. We don't know
> that, so we can't size it precisely. Worse than that is that we know
> we badly estimate the number of buckets on a regular basis.

I'm actually not trying for perfection. What I think we should start with
is to at least not shoot ourselves in the foot by cutting the bucket size
to one-tenth of our distinct tuple estimate and virtually guaranteeing lots
of true collisions which are expensive.

> That gives us three choices: if we have a hash table fixed without
> prior knowledge then we either (0) we continue to underallocate them
> in many cases or (1) potentially overallocate buckets. Or (2) we learn
> to cope better with the variation in the number of entries. If we rule
> out the first two we have only the last one remaining.
>
> (1) will always be a heuristic, and as you point out could itself be
> an extreme setting

A heuristic based on our estimates is a good choice, imv. I do think we
can do better than we are now though.

So I think that (2) is the best route: Given that we know with much
> better certainty the number of rows in the scanned-relation, we should
> be able to examine our hash table after it has been built and decide
> whether it would be cheaper to rebuild the hash table with the right
> number of buckets, or continue processing with what we have now. Which
> is roughly what Heikki proposed already, in January.
>

I'm actually not a huge fan of this as it's certainly not cheap to do. If
it can be shown to be better than an improved heuristic then perhaps it
would work but I'm not convinced.

One other way to address having a smaller actual hash table is to come up
with another way to detect empty parts of the hash space, perhaps by using
a bitmap to represent the hash space (obviously the individual bits would
represent some chunk of the 32bit hash space, perhaps as much as 1/64'th,
allowing our bitmap to fit into a 64bit int; that may end up being too
small to actually help though). This would mean that when we did get to
scanning a bucket we would at least be much more likely of finding a match
instead of scanning it and finding nothing.

Thanks,

Stephen

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fabien COELHO 2013-06-22 20:51:48 Re: [PATCH] add --progress option to pgbench (submission 3)
Previous Message Fabien COELHO 2013-06-22 20:29:07 Re: [Review] Re: minor patch submission: CREATE CAST ... AS EXPLICIT