Re: Re: custom hash-based COUNT(DISTINCT) aggregate - unexpectedly high memory consumption

From: Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
To: Tomas Vondra <tv(at)fuzzy(dot)cz>
Cc: Tomas Vondra <tv(at)fuzzy(dot)cz>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: custom hash-based COUNT(DISTINCT) aggregate - unexpectedly high memory consumption
Date: 2013-10-08 09:42:00
Message-ID: 1D681ED2-9418-45E7-8090-FB66BE5D799F@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Sent from my iPad

>
> However I'm unable to make it at least comparable to chaining. The trick
> is that the hash table performs reasonably only until it's ~ 65-70% full,
> it gets really slow after that. So to maintain performance comparable to
> chaining, I'd have to keep the table below this threshold, effectively
> wasting ~30% of memory.
I expected that. AFAIK, open addressing gets slow when the load factor approaches 1. I feel this is what is happening here.
>
> And the topic of this thread was about decreasing the memory consumptions,
> so it seems to me open-addressing is not a good match here. I'll try a few
> more things but I don't think it's going to fly.
>
Yeah, I also feel that open addressing isn't the way to go for the problem here.

> I've made some significant improvements in the chaining version (in the
> master branch), now getting about the memory consumption I've estimated.
>
I agree, we can hope to reduce the memory consumption by making changes in the current chaining implementation. I would like to look into changing the data structure used for chaining from singly linked list to maybe skip list or something else.

Regards,

Atri

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2013-10-08 09:46:10 Re: Patch for fail-back without fresh backup
Previous Message Marko Tiikkaja 2013-10-08 09:41:47 Re: plpgsql.print_strict_params