Quick Links

Re: Poor memory context performance in large hash joins

From:	Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Tomas Vondra <tv(at)fuzzy(dot)cz>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Poor memory context performance in large hash joins
Date:	2017-02-24 18:47:14
Message-ID:	CAMkU=1wCoM1bO3G2mNnEMWgGhcHqH_M5O3VuwUQZD8zRYDdK0g@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Thu, Feb 23, 2017 at 10:47 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:

> On 2017-02-23 17:28:26 -0500, Tom Lane wrote:
> > Jeff Janes <jeff(dot)janes(at)gmail(dot)com> writes:
> > > The number of new chunks can be almost as as large as the number of old
> > > chunks, especially if there is a very popular value. The problem is
> that
> > > every time an old chunk is freed, the code in aset.c around line 968
> has to
> > > walk over all the newly allocated chunks in the linked list before it
> can
> > > find the old one being freed. This is an N^2 operation, and I think
> it has
> > > horrible CPU cache hit rates as well.
> >
> > Maybe it's time to convert that to a doubly-linked list.
>
> Yes, I do think so. Given that we only have that for full blocks, not
> for small chunks, the cost seems neglegible.
>
> That would also, partially, address the performance issue
> http://archives.postgresql.org/message-id/d15dff83-0b37-28ed
> -0809-95a5cc7292ad%402ndquadrant.com
> addresses, in a more realistically backpatchable manner.
>
> Jeff, do you have a handy demonstrator?
>

Not exactly "handy", as it takes about 45 minutes to set up and uses >50GB
of disk and 16GB of RAM, but here is a demonstration.

create table foobar as select CASE when random()<(420.0/540.0) then 1 else
floor(random()*880000)::int END as titleid from
generate_series(1,540000000);

create table foobar2 as select distinct titleid from foobar ;

analyze;

set work_mem TO "13GB";

select count(*) from foobar2 where not exists (select 1 from foobar t where
t.titleid=foobar2.titleid);

This will run for effectively forever, unless it gets killed/dies due to
OOM first. If I have other things consuming some of my 16GB of RAM, then
it gets OOM (which is not a complaint: it is as one should expect given
that I told it work_mem was 13GB). If I have no other demands on RAM, then
I can't tell if it would eventually OOM or not because of how long it would
take to get that far.

This is inspired by the thread
https://www.postgresql.org/message-id/flat/CACw4T0p4Lzd6VpwptxgPgoTMh2dEKTQBGu7NTaJ1%2BA0PRx1BGg%40mail(dot)gmail(dot)com#CACw4T0p4Lzd6VpwptxgPgoTMh2dEKTQBGu7NTaJ1+A0PRx1BGg(at)mail(dot)gmail(dot)com

I ran into this while evaluating Tom's responding patch, but you don't need
to apply that patch to run this example and see the effect. I don't have
an example of a case that demonstrates the problem in the absence of a
degenerate hash bucket. I think there should be non-degenerate cases that
trigger it, but I haven't been able to produce one yet.

> > Although if the
> > hash code is producing a whole lot of requests that are only a bit bigger
> > than the separate-block threshold, I'd say It's Doing It Wrong. It
> should
> > learn to aggregate them into larger requests.
>
> That's probably right, but we can't really address that in the
> back-branches. And to me this sounds like something we should address
> in the branches, not just in master. Even if we'd also fix the
> hash-aggregation logic, I think such an O(n^2) behaviour in the
> allocator is a bad idea in general, and we should fix it anyway.
>

I don't know how important a back-patch would be. This is a toy case for
me. Presumably not for David Hinkle, except he doesn't want a hash join in
the first place. While I want one that finishes in a reasonable time. It
might depend on what happens to Tom's OOM patch.

It would be great if the allocator was made bullet-proof, but I don't think
adding reverse links (or anything else back-patchable) is going to be
enough to do that.

Cheers,

Jeff

In response to

Re: Poor memory context performance in large hash joins at 2017-02-24 06:47:44 from Andres Freund

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tom Lane	2017-02-24 18:59:45	Re: Poor memory context performance in large hash joins
Previous Message	Ants Aasma	2017-02-24 18:31:09	Re: Checksums by default?