Re: Using 128-bit integers for sum, avg and statistics aggregates

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Andreas Karlsson <andreas(at)proxel(dot)se>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Using 128-bit integers for sum, avg and statistics aggregates
Date: 2014-12-31 14:00:50
Message-ID: CAApHDvoshk1L1r8fCJjf_=OuCwaycnZp4W3C2R-js-sJsFr+Uw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 31 December 2014 at 18:20, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> On Fri, Dec 26, 2014 at 7:57 AM, David Rowley <dgrowleyml(at)gmail(dot)com>
> wrote:
> > 1. Do we need to keep the 128 byte aggregate state size for machines
> without
> > 128 bit ints? This has been reduced to 48 bytes in the patch, which is in
> > favour code being compiled with a compiler which has 128 bit ints. I
> kind
> > of think that we need to keep the 128 byte estimates for compilers that
> > don't support int128, but I'd like to hear any counter arguments.
>
> I think you're referring to the estimated state size in pg_aggregate
> here, and I'd say it's probably not a big deal one way or the other.
> Presumably, at some point, 128-bit arithmetic will become common
> enough that we'll want to change that estimate, but I don't know
> whether we've reached that point or not.
>
>
Yeah, that's what I was talking about.
I'm just looking at the code which uses this size estimate
in choose_hashed_grouping(). I'd be a bit worried giving the difference
between 48 and 128 that we'd under estimate the hash size too much and end
up going to disk during hashagg.

I think the patch should be modified so that it uses 128 bytes for the size
estimate on machines that don't support int128, but I'm not quite sure at
this stage if that causes issues for replication, if 1 machine had support
and one didn't, would it matter if the pg_aggregate row on the replica was
48 bytes instead of 128? Is this worth worrying about?

> > 2. References to int16 meaning 16 bytes. I'm really in two minds about
> this,
> > it's quite nice to keep the natural flow, int4, int8, int16, but I can't
> > help think that this will confuse someone one day. I think it'll be a
> long
> > time before it confused anyone if we called it int128 instead, but I'm
> not
> > that excited about seeing it renamed either. I'd like to hear what others
> > have to say... Is there a chance that some sql standard in the distant
> > future will have HUGEINT and we might regret not getting the internal
> names
> > nailed down?
>
> Yeah, I think using int16 to mean 16-bytes will be confusing to
> someone almost immediately.
>
>
hmm, I think it should be changed to int128 then. Pitty int4 was selected
as a name instead of int32 back in the day...

I'm going to mark the patch as waiting on author, pending those two changes.

My view with the size estimates change is that if a committer deems it
overkill, then they can rip it out, but at least it's been thought of and
considered rather than forgotten about.

Regards

David Rowley

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2014-12-31 14:08:42 Re: Additional role attributes && superuser review
Previous Message David Rowley 2014-12-31 13:47:49 Performance improvement for joins where outer side is unique