Re: Poor memory context performance in large hash joins

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Poor memory context performance in large hash joins
Date: 2017-02-24 19:57:17
Message-ID: CAMkU=1xdSg0g8JuwUq=zro=PG508_0qKaKFLQbEcGciT4hAYLw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Feb 24, 2017 at 10:59 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Jeff Janes <jeff(dot)janes(at)gmail(dot)com> writes:
> > On Thu, Feb 23, 2017 at 2:28 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >> Maybe it's time to convert that to a doubly-linked list.
>
> > I don't think that would help. You would need some heuristic to guess
> > whether the chunk you are looking for is near the front, or near the end.
>
> Uh, what? In a doubly-linked list, you can remove an element in O(1)
> time, you don't need any searching. It basically becomes
> item->prev->next = item->next;
> item->next->prev = item->prev;
> modulo possible special cases for the head and tail elements.
>

Currently it is walking the chain to identify which block holds the chunk
in the first place, not just to get the pointer to the previous block. But
I see that that could be fixed by pointer arithmetic once there is a reason
to fix it. Which there isn't a reason to as long as you need to walk the
chain to get the prev pointer anyway. Like this:?

targetblock = (AllocBlock) (((char*)chunk) - ALLOC_BLOCKHDRSZ);

> >> Although if the
> >> hash code is producing a whole lot of requests that are only a bit
> bigger
> >> than the separate-block threshold, I'd say It's Doing It Wrong. It
> should
> >> learn to aggregate them into larger requests.
>
> > Right now it is using compiled-in 32KB chunks. Should it use something
> > like max(32kb,work_mem/128) instead?
>
> I'd say it should double the size of the request each time. That's what
> we do in most places.
>

I thought the design goal there was that the space in old chunks could get
re-used into the new chunks in a reasonably fine-grained way. If the last
chunk contains just over half of all the data, that couldn't happen.

Cheers,

Jeff

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jim Nasby 2017-02-24 19:57:25 Re: case_preservation_and_insensitivity = on
Previous Message Andrew Dunstan 2017-02-24 19:55:21 Re: btree_gin and btree_gist for enums