Re: A better way than tweaking NTUP_PER_BUCKET

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: A better way than tweaking NTUP_PER_BUCKET
Date: 2013-06-23 02:16:25
Message-ID: CAOuzzgqdpX-m1R+ZvLsPYgxqGz1EFViWjQzzbkUquSpnzmFiUQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Saturday, June 22, 2013, Simon Riggs wrote:

> On 22 June 2013 21:40, Stephen Frost <sfrost(at)snowman(dot)net <javascript:;>>
> wrote:
>
> > I'm actually not a huge fan of this as it's certainly not cheap to do.
> If it
> > can be shown to be better than an improved heuristic then perhaps it
> would
> > work but I'm not convinced.
>
> We need two heuristics, it would seem:
>
> * an initial heuristic to overestimate the number of buckets when we
> have sufficient memory to do so
>
> * a heuristic to determine whether it is cheaper to rebuild a dense
> hash table into a better one.
>
> Although I like Heikki's rebuild approach we can't do this every x2
> overstretch. Given large underestimates exist we'll end up rehashing
> 5-12 times, which seems bad. Better to let the hash table build and
> then re-hash once, it we can see it will be useful.
>
> OK?
>

I've been thinking a bit more on your notion of simply using as much memory
as we're permitted, but maybe adjust it down based on how big the input to
the hash table is (which I think we have better stats on, and even if we
don't, we could easily keep track of how many tuples we've seen and
consider rehashing as we go). Still doesn't really address the issue of
dups though. It may still be much larger than it should be if there's a lot
of duplicates in the input that hash into a much smaller set of buckets.

Will think on it more.

Thanks,

Stephen

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message MauMau 2013-06-23 03:46:59 Re: backend hangs at immediate shutdown (Re: Back-branch update releases coming in a couple weeks)
Previous Message ian link 2013-06-23 02:03:00 Re: Patch for fast gin cache performance improvement