Re: 9.5: Memory-bounded HashAgg

From: Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
To: Tomas Vondra <tv(at)fuzzy(dot)cz>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: 9.5: Memory-bounded HashAgg
Date: 2014-08-14 18:11:50
Message-ID: CAOeZVieK0jCE88ZpUH1aoJNJ-+c_c7u5virW8fayc0F-3s3ZRg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Aug 14, 2014 at 10:21 PM, Tomas Vondra <tv(at)fuzzy(dot)cz> wrote:

> On 14 Srpen 2014, 18:02, Atri Sharma wrote:
> > On Thursday, August 14, 2014, Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
> >
> >> On Thu, 2014-08-14 at 10:06 -0400, Tom Lane wrote:
> >> > If you're following the HashJoin model, then what you do is the same
> >> thing
> >> > it does: you write the input tuple back out to the pending batch file
> >> for
> >> > the hash partition that now contains key 1001, whence it will be
> >> processed
> >> > when you get to that partition. I don't see that there's any special
> >> case
> >> > here.
> >>
> >> HashJoin only deals with tuples. With HashAgg, you have to deal with a
> >> mix of tuples and partially-computed aggregate state values. Not
> >> impossible, but it is a little more awkward than HashJoin.
> >>
> >>
> > +1
> >
> > Not to mention future cases if we start maintaining multiple state
> > values,in regarded to grouping sets.
>
> So what would you do for aggregates where the state is growing quickly?
> Say, things like median() or array_agg()?
>
> I think that "we can't do that for all aggregates" does not imply "we must
> not do that at all."
>
> There will always be aggregates not implementing dumping state for various
> reasons, and in those cases the proposed approach is certainly a great
> improvement. I like it, and I hope it will get committed.
>
> But maybe for aggregates supporting serialize/deserialize of the state
> (including all aggregates using known types, not just fixed-size types) a
> hashjoin-like batching would be better? I can name a few custom aggregates
> that'd benefit tremendously from this.
>

Yeah, could work, but is it worth adding additional paths (assuming this
patch gets committed) for some aggregates? I think we should do a further
analysis on the use case.

>
> Just to be clear - this is certainly non-trivial to implement, and I'm not
> trying to force anyone (e.g. Jeff) to implement the ideas I proposed. I'm
> ready to spend time on reviewing the current patch, implement the approach
> I proposed and compare the behaviour.
>

Totally agreed. It would be a different approach, albeit as you said, the
approach can be done off the current patch.

>
> Kudos to Jeff for working on this.
>
> Agreed :)

>
>

--
Regards,

Atri
*l'apprenant*

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2014-08-14 18:13:44 Re: jsonb format is pessimal for toast compression
Previous Message Peter Geoghegan 2014-08-14 18:05:48 Re: jsonb format is pessimal for toast compression