Re: 9.5: Memory-bounded HashAgg

From: "Tomas Vondra" <tv(at)fuzzy(dot)cz>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Jeff Davis" <pgsql(at)j-davis(dot)com>, "Tomas Vondra" <tv(at)fuzzy(dot)cz>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: 9.5: Memory-bounded HashAgg
Date: 2014-08-14 16:37:13
Message-ID: 0412b787d00a8b16811c8d05c42b8c4a.squirrel@sq.gransy.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 14 Srpen 2014, 18:12, Tom Lane wrote:
> Jeff Davis <pgsql(at)j-davis(dot)com> writes:
>> HashJoin only deals with tuples. With HashAgg, you have to deal with a
>> mix of tuples and partially-computed aggregate state values. Not
>> impossible, but it is a little more awkward than HashJoin.
>
> Not sure that I follow your point. You're going to have to deal with that
> no matter what, no?

That is not how the patch work. Once the memory consumption hits work_mem,
it keeps the already existing groups in memory, and only stops creating
new groups. For each tuple, hashagg does a lookup - if the group is
already in memory, it performs the transition, otherwise it writes the
tuple to disk (and does some batching, but that's mostly irrelevant here).

This way it's not necessary to dump the partially-computed states, and for
fixed-size states it actually limits the amount of consumed memory. For
variable-length aggregates (array_agg et.al.) not so much.

> I guess in principle you could avoid the need to dump agg state to disk.
> What you'd have to do is write out tuples to temp files even when you
> think you've processed them entirely, so that if you later realize you
> need to split the current batch, you can recompute the states of the
> postponed aggregates from scratch (ie from the input tuples) when you get
> around to processing the batch they got moved to. This would avoid
> confronting the how-to-dump-agg-state problem, but it seems to have little
> else to recommend it. Even if splitting a batch is a rare occurrence,
> the killer objection here is that even a totally in-memory HashAgg would
> have to write all its input to a temp file, on the small chance that it
> would exceed work_mem and need to switch to batching.

Yeah, I think putting this burden on each hashagg is not a good thing.

I was thinking about is an automatic fall-back - try to do an in-memory
hash-agg. When you hit work_mem limit, see how far we are (have we scanned
10% or 90% of tuples?), and decide whether to restart with batching.

But I think there's no single solution, fixing all the possible cases. I
think the patch proposed here is a solid starting point, that may be
improved and extended by further patches. Eventually, what I think might
work is this combination of approaches:

1) fixed-size states and states with serialize/deserialize methods

=> hashjoin-like batching (i.e. dumping both tuples and states)

2) variable-size states without serialize/deserialize

=> Jeff's approach (keep states in memory, dump tuples)
=> possibly with the rescan fall-back, for quickly growing states

Tomas

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2014-08-14 16:51:45 Re: 9.5: Memory-bounded HashAgg
Previous Message Tom Lane 2014-08-14 16:22:46 Re: jsonb format is pessimal for toast compression