Re: First Aggregate Funtion?

From: Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
To: Paul A Jungwirth <pj(at)illuminatedcomputing(dot)com>
Cc: Merlin Moncure <mmoncure(at)gmail(dot)com>, sudalai <sudalait2(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: First Aggregate Funtion?
Date: 2015-07-20 16:02:34
Message-ID: CADkLM=foA_oC_Ri23F9PbfLnfwXFbC3Lt8bBzRu3=CB77G9_qw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jul 20, 2015 at 11:06 AM, Paul A Jungwirth <
pj(at)illuminatedcomputing(dot)com> wrote:

> > The above implementation of "first" aggregate returns the first non-NULL
> item
> > value.
>
> I'm curious what advantages this approach has over these FIRST/LAST
> functions from the Wiki?:
>
> https://wiki.postgresql.org/wiki/First/last_%28aggregate%29
>
> Also to get the "first non-null value" you can apply an ordering to
> just the aggregate function, e.g.:
>
> select first(id order by start_time nulls last) from events;
>
> If you want speed you should probably write a C version.
>
> Is there something I'm missing?
>
> Also since we're on the hackers list is this a proposal to add these
> functions to core Postgres?
>
> Yours,
> Paul
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

If it is a proposal to add to core, I'd like to suggest a close cousin
function of first()/last(): only(). [1]

It would behave like first() but would throw an error if it encountered
more than one distinct value in the window.

This would be helpful in dependent grouping situations like this:
select a.keyval, a.name_of_the thing, sum(b.metric_value) as
metric_value
from a
join b on b.a_keyval = a.keyval
group by a.keyval, a.name_of_the_thing

Now, everyone's made this optimization to reduce group-by overhead:
select a.keyval, min(a.name_of_the_thing) as name_of_the_thing,
sum(b.metric_value) as metric_value
from a
join b on b.a_keyval = a.keyval
group by a.keyval

Which works fine, but it's self-anti-documenting:
- it implies that name of the thing *could* be different across rows
with the same keyval
- it implies we have some business preference for names that are first
in alphabetical order.
- it implies that the string has more in common with the summed metrics
(imagine this query has dozens of them) than the key values to the left.

Using first(a.name_of_the_thing) is less overhead than min()/max(), but has
the same issues listed above.

By using only(a.name_of_the_thing) we'd have a bit more clarity that the
author expected all of those values to be the same across the aggregate
window, and discovering otherwise was reason enough to fail the query.

*IF* we're considering adding these to core, I think that only() would be
just a slight modification of the last() implementation, and could be done
at the same time.

[1] I don't care what it gets named. I just want the functionality.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Marko Tiikkaja 2015-07-20 16:07:00 Re: First Aggregate Funtion?
Previous Message Emre Hasegeli 2015-07-20 15:39:19 Re: TABLESAMPLE patch is really in pretty sad shape