Re: A better way than tweaking NTUP_PER_BUCKET

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: A better way than tweaking NTUP_PER_BUCKET
Date: 2014-01-25 22:33:46
Message-ID: 20140125223346.GT31026@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tom,

* Tom Lane (tgl(at)sss(dot)pgh(dot)pa(dot)us) wrote:
> Stephen Frost <sfrost(at)snowman(dot)net> writes:
> > In the end, I believe we absolutely should do something about this.
> > Hashing a 64M-row table (requiring upwards of 8G) instead of hashing
> > a 2M-row table is really bad of us.
>
> Huh? I don't see anything in the thread suggesting that we're doing that,
> or that changing NTUP_PER_BUCKET would fix it if we do. Are you thinking
> of some other discussion?

This thread sprung from or was at least related to, as I recall, my
issues with NTUP_PER_BUCKET over the summer. Perhaps I'm wrong, but
here's the thread I was referring to:

http://www.postgresql.org/message-id/20130404201612.GM4361@tamriel.snowman.net

Where I demonstrated that we decide to hash a much larger table, rather
than the smaller one, based on the estimated depth of the buckets and
including the costing from that, which is driven based on how big we
decide to make the hash table where we use NTUP_PER_BUCKET to pick a
table size much smaller than we really should be.

> AFAICT, there was no consensus in this thread on what to do, which
> probably has something to do with the lack of concrete performance
> tests presented to back up any particular proposal.

This I entirely agree with- more testing and more information on how
such a change impacts other workloads would be great. Unfortunately,
while I've provided a couple of test cases and seen similar situations
on IRC, this is very data-dependent which makes it difficult to have
concrete answers for every workload.

Still, I'll try and spend some time w/ pg_bench's schema definition and
writing up some larger queries to run through it (aiui, the default set
of queries won't typically result in a hashjoin) and see what happens
there.

Thanks,

Stephen

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2014-01-25 22:34:58 Re: What is happening on buildfarm member crake?
Previous Message Marco Atzeri 2014-01-25 22:32:57 Re: Postgresql for cygwin - 3rd