Re: A better way than tweaking NTUP_PER_BUCKET

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: A better way than tweaking NTUP_PER_BUCKET
Date: 2013-06-22 23:13:05
Message-ID: CAOuzzgqwT3jwjSE8=npDQ5+why2KDMpDZKND4GsG+vjfgrCCHg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Saturday, June 22, 2013, Heikki Linnakangas wrote:

> On 22.06.2013 19:19, Simon Riggs wrote:
>
>> So I think that (2) is the best route: Given that we know with much
>> better certainty the number of rows in the scanned-relation, we should
>> be able to examine our hash table after it has been built and decide
>> whether it would be cheaper to rebuild the hash table with the right
>> number of buckets, or continue processing with what we have now. Which
>> is roughly what Heikki proposed already, in January.
>>
>
> Back in January, I wrote a quick patch to experiment with rehashing when
> the hash table becomes too full. It was too late to make it into 9.3 so I
> didn't pursue it further back then, but IIRC it worked. If we have the
> capability to rehash, the accuracy of the initial guess becomes much less
> important.
>

What we're hashing isn't going to change mid-way through or be updated
after we've started doing lookups against it.

Why not simply scan and queue the data and then build the hash table right
the first time? Also, this patch doesn't appear to address dups and
therefore would rehash unnecessarily. There's no point rehashing into more
buckets if the buckets are only deep due to lots of dups. Figuring out how
many distinct values there are, in order to build the best hash table, is
actually pretty expensive compared to how quickly we can build the table
today. Lastly, this still encourages collisions due to too few buckets. If
we would simply start with more buckets outright we'd reduce the need to
rehash..

Thanks,

Stephen

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message ian link 2013-06-22 23:38:49 Re: Support for RANGE ... PRECEDING windows in OVER
Previous Message Simon Riggs 2013-06-22 22:48:45 Re: A better way than tweaking NTUP_PER_BUCKET