Re: First Aggregate Funtion?

Lists: pgsql-hackers
From: Tony Caduto <tony_caduto(at)amsoftwaredesign(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: First Aggregate Funtion?
Date: 2006-03-31 21:02:47
Message-ID: 442D98F7.4020403@amsoftwaredesign.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Has there ever been any talk of adding a first aggregate function?
It would make porting from Oracle and Access much easier.

Or is there something in the contrib modules that I might have missed?

Thanks,

--
Tony Caduto
AM Software Design
http://www.amsoftwaredesign.com
Home of PG Lightning Admin for Postgresql
Your best bet for Postgresql Administration


From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: Tony Caduto <tony_caduto(at)amsoftwaredesign(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: First Aggregate Funtion?
Date: 2006-03-31 21:09:43
Message-ID: 20060331210943.GA5660@svana.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Mar 31, 2006 at 03:02:47PM -0600, Tony Caduto wrote:
> Has there ever been any talk of adding a first aggregate function?
> It would make porting from Oracle and Access much easier.
>
> Or is there something in the contrib modules that I might have missed?

There are several oracle compatability modules:

http://pgfoundry.org/projects/oracompat/
http://pgfoundry.org/projects/orafce/

I'm sure there's many more if you look...
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.


From: "Mike Rylander" <mrylander(at)gmail(dot)com>
To: "Martijn van Oosterhout" <kleptog(at)svana(dot)org>
Cc: "Tony Caduto" <tony_caduto(at)amsoftwaredesign(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: First Aggregate Funtion?
Date: 2006-04-01 01:35:35
Message-ID: b918cf3d0603311735v5dde21d4pfcd613c09d80cf57@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 3/31/06, Martijn van Oosterhout <kleptog(at)svana(dot)org> wrote:
> On Fri, Mar 31, 2006 at 03:02:47PM -0600, Tony Caduto wrote:
> > Has there ever been any talk of adding a first aggregate function?
> > It would make porting from Oracle and Access much easier.
> >
> > Or is there something in the contrib modules that I might have missed?
>
> There are several oracle compatability modules:
>
> http://pgfoundry.org/projects/oracompat/
> http://pgfoundry.org/projects/orafce/
>
> I'm sure there's many more if you look...

If all you want is FIRST() and LAST() then:

-- Create a function that always returns the first non-NULL item
CREATE OR REPLACE FUNCTION public.first_agg ( anyelement, anyelement )
RETURNS anyelement AS $$
SELECT CASE WHEN $1 IS NULL THEN $2 ELSE $1 END;
$$ LANGUAGE SQL STABLE;

-- And then wrap an aggreagate around it
CREATE AGGREGATE public.first (
sfunc = public.first_agg,
basetype = anyelement,
stype = anyelement
);

-- Create a function that always returns the last non-NULL item
CREATE OR REPLACE FUNCTION public.last_agg ( anyelement, anyelement )
RETURNS anyelement AS $$
SELECT $2;
$$ LANGUAGE SQL STABLE;

-- And then wrap an aggreagate around it
CREATE AGGREGATE public.last (
sfunc = public.last_agg,
basetype = anyelement,
stype = anyelement
);

Hope that helps!

--
Mike Rylander
mrylander(at)gmail(dot)com
GPLS -- PINES Development
Database Developer
http://open-ils.org


From: Bruno Wolff III <bruno(at)wolff(dot)to>
To: Tony Caduto <tony_caduto(at)amsoftwaredesign(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: First Aggregate Funtion?
Date: 2006-04-05 20:19:59
Message-ID: 20060405201959.GA17070@wolff.to
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Mar 31, 2006 at 15:02:47 -0600,
Tony Caduto <tony_caduto(at)amsoftwaredesign(dot)com> wrote:
> Has there ever been any talk of adding a first aggregate function?
> It would make porting from Oracle and Access much easier.

Note, that without special support those functions aren't going to run
very fast. So you are still probably going to want to go back and
rewrite them to use something like DISTINCT ON anyway.


From: sudalai <sudalait2(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: First Aggregate Funtion?
Date: 2015-07-14 13:23:25
Message-ID: 1436880205628-5857866.post@n5.nabble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

The above implementation of "first" aggregate returns the first non-NULL item
value.

To get *first row item value* for a column use the below implementation.

-- create a function that push at most two element on given array
-- push the first row value at second index of the array
CREATE OR REPLACE FUNCTION two_value_holder(anyarray, anyelement)
returns anyarray as $$
select case when array_length($1,1) < 2 then array_append($1,$2) else
$1 end ;
$$ language sql immutable;

-- create a function that returns second element of an array
CREATE OR replace FUNCTION second_element (ANYARRAY)
RETURNS ANYELEMENT AS $$ SELECT $1[2]; $$ LANGUAGE SQL;

-- create first aggregate function that return first_row item value
CREATE AGGREGATE first(anyelement)(
SFUNC=two_value_holder,
STYPE=ANYARRAY,
INITCOND='{NULL}',
FINALFUNC=second_element
);

I hope this work..
--
Sudalai

-----
sudalai
--
View this message in context: http://postgresql.nabble.com/First-Aggregate-Funtion-tp1943031p5857866.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: sudalai <sudalait2(at)gmail(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: First Aggregate Funtion?
Date: 2015-07-15 15:01:48
Message-ID: CA+TgmoZ9Osy54hRezZ-JbCYZ-PR0is6bp+pJ9zETH1n2PqHjeQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Jul 14, 2015 at 9:23 AM, sudalai <sudalait2(at)gmail(dot)com> wrote:
> The above implementation of "first" aggregate returns the first non-NULL item
> value.
>
> To get *first row item value* for a column use the below implementation.
>
> -- create a function that push at most two element on given array
> -- push the first row value at second index of the array
> CREATE OR REPLACE FUNCTION two_value_holder(anyarray, anyelement)
> returns anyarray as $$
> select case when array_length($1,1) < 2 then array_append($1,$2) else
> $1 end ;
> $$ language sql immutable;
>
> -- create a function that returns second element of an array
> CREATE OR replace FUNCTION second_element (ANYARRAY)
> RETURNS ANYELEMENT AS $$ SELECT $1[2]; $$ LANGUAGE SQL;
>
> -- create first aggregate function that return first_row item value
> CREATE AGGREGATE first(anyelement)(
> SFUNC=two_value_holder,
> STYPE=ANYARRAY,
> INITCOND='{NULL}',
> FINALFUNC=second_element
> );
>
> I hope this work..

I don't think so, because arrays can contain duplicates.

rhaas=# select coalesce(first(x.column1), 'wrong') from (values
(null), ('correct')) x;
coalesce
----------
wrong
(1 row)

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: sudalai <sudalait2(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: First Aggregate Funtion?
Date: 2015-07-20 13:40:22
Message-ID: 1437399622145-5858584.post@n5.nabble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


>I don't think so, because arrays can contain duplicates.

I just add two element to the array. One for INITCOND value NULL, second
for first row value.
So Array size is always 2. So no duplicates.

>rhaas=# select coalesce(first(x.column1), 'wrong') from (values
>(null), ('correct')) x;
>coalesce
>----------
> wrong
>(1 row)
It works correct..
I didn't said it returns, first non-null value for a column from aggregate
window.
I said my implementation returns first row value for a column.
Here first row element is "null ", hence it returns null.

check this....
db=# select
db-# coalesce(first(x.column1),'null') as col1 ,
db-# coalesce(first(x.column2),'null') as col2,
db-# coalesce(first(x.column3),'null') as col3
db-# from (values (null,'abc',null), ('correct','wrong','notsure'),
('second','second1','second3')) x
db-# ;
col1 | col2 | col3
------+------+------
null | abc | null
(1 row)

Its work correct. It returns first row value for a column.

--Sudalai

-----
sudalai
--
View this message in context: http://postgresql.nabble.com/First-Aggregate-Funtion-tp1943031p5858584.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: sudalai <sudalait2(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: First Aggregate Funtion?
Date: 2015-07-20 14:53:29
Message-ID: CAHyXU0zWC7BNi4WouFwSVdp=jyj9EzthoWv+rb3QSsw2_f-bFg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Jul 20, 2015 at 8:40 AM, sudalai <sudalait2(at)gmail(dot)com> wrote:
>
>>I don't think so, because arrays can contain duplicates.
>
> I just add two element to the array. One for INITCOND value NULL, second
> for first row value.
> So Array size is always 2. So no duplicates.
>
>>rhaas=# select coalesce(first(x.column1), 'wrong') from (values
>>(null), ('correct')) x;
> >coalesce
>>----------
>> wrong
>>(1 row)
> It works correct..
> I didn't said it returns, first non-null value for a column from aggregate
> window.
> I said my implementation returns first row value for a column.
> Here first row element is "null ", hence it returns null.
>
>
> check this....
> db=# select
> db-# coalesce(first(x.column1),'null') as col1 ,
> db-# coalesce(first(x.column2),'null') as col2,
> db-# coalesce(first(x.column3),'null') as col3
> db-# from (values (null,'abc',null), ('correct','wrong','notsure'),
> ('second','second1','second3')) x
> db-# ;
> col1 | col2 | col3
> ------+------+------
> null | abc | null
> (1 row)
>
> Its work correct. It returns first row value for a column.

I was able to get ~45% runtime reduction by simply converting
"two_value_holder" from sql to plpgsql. SQL functions (unlike
pl/pgsql) are parsed and planned every time they are run unless they
are inlined. Our aggregation API unfortunately is a hard fence
against inlining; solving this is a major optimization target IMO.

merlin


From: Paul A Jungwirth <pj(at)illuminatedcomputing(dot)com>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: sudalai <sudalait2(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: First Aggregate Funtion?
Date: 2015-07-20 15:06:23
Message-ID: CA+renyURWvNyYiR++4zbzuXyzrY0jBgS0ViQYM2Lj0A8B7M=YQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> The above implementation of "first" aggregate returns the first non-NULL item
> value.

I'm curious what advantages this approach has over these FIRST/LAST
functions from the Wiki?:

https://wiki.postgresql.org/wiki/First/last_%28aggregate%29

Also to get the "first non-null value" you can apply an ordering to
just the aggregate function, e.g.:

select first(id order by start_time nulls last) from events;

If you want speed you should probably write a C version.

Is there something I'm missing?

Also since we're on the hackers list is this a proposal to add these
functions to core Postgres?

Yours,
Paul


From: Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
To: Paul A Jungwirth <pj(at)illuminatedcomputing(dot)com>
Cc: Merlin Moncure <mmoncure(at)gmail(dot)com>, sudalai <sudalait2(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: First Aggregate Funtion?
Date: 2015-07-20 16:02:34
Message-ID: CADkLM=foA_oC_Ri23F9PbfLnfwXFbC3Lt8bBzRu3=CB77G9_qw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Jul 20, 2015 at 11:06 AM, Paul A Jungwirth <
pj(at)illuminatedcomputing(dot)com> wrote:

> > The above implementation of "first" aggregate returns the first non-NULL
> item
> > value.
>
> I'm curious what advantages this approach has over these FIRST/LAST
> functions from the Wiki?:
>
> https://wiki.postgresql.org/wiki/First/last_%28aggregate%29
>
> Also to get the "first non-null value" you can apply an ordering to
> just the aggregate function, e.g.:
>
> select first(id order by start_time nulls last) from events;
>
> If you want speed you should probably write a C version.
>
> Is there something I'm missing?
>
> Also since we're on the hackers list is this a proposal to add these
> functions to core Postgres?
>
> Yours,
> Paul
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

If it is a proposal to add to core, I'd like to suggest a close cousin
function of first()/last(): only(). [1]

It would behave like first() but would throw an error if it encountered
more than one distinct value in the window.

This would be helpful in dependent grouping situations like this:
select a.keyval, a.name_of_the thing, sum(b.metric_value) as
metric_value
from a
join b on b.a_keyval = a.keyval
group by a.keyval, a.name_of_the_thing

Now, everyone's made this optimization to reduce group-by overhead:
select a.keyval, min(a.name_of_the_thing) as name_of_the_thing,
sum(b.metric_value) as metric_value
from a
join b on b.a_keyval = a.keyval
group by a.keyval

Which works fine, but it's self-anti-documenting:
- it implies that name of the thing *could* be different across rows
with the same keyval
- it implies we have some business preference for names that are first
in alphabetical order.
- it implies that the string has more in common with the summed metrics
(imagine this query has dozens of them) than the key values to the left.

Using first(a.name_of_the_thing) is less overhead than min()/max(), but has
the same issues listed above.

By using only(a.name_of_the_thing) we'd have a bit more clarity that the
author expected all of those values to be the same across the aggregate
window, and discovering otherwise was reason enough to fail the query.

*IF* we're considering adding these to core, I think that only() would be
just a slight modification of the last() implementation, and could be done
at the same time.

[1] I don't care what it gets named. I just want the functionality.


From: Marko Tiikkaja <marko(at)joh(dot)to>
To: Corey Huinker <corey(dot)huinker(at)gmail(dot)com>, Paul A Jungwirth <pj(at)illuminatedcomputing(dot)com>
Cc: Merlin Moncure <mmoncure(at)gmail(dot)com>, sudalai <sudalait2(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: First Aggregate Funtion?
Date: 2015-07-20 16:07:00
Message-ID: 55AD1CA4.5040002@joh.to
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 7/20/15 6:02 PM, Corey Huinker wrote:
> By using only(a.name_of_the_thing) we'd have a bit more clarity that the
> author expected all of those values to be the same across the aggregate
> window, and discovering otherwise was reason enough to fail the query.
>
> *IF* we're considering adding these to core, I think that only() would be
> just a slight modification of the last() implementation, and could be done
> at the same time.
>
> [1] I don't care what it gets named. I just want the functionality.

A big +1 from me. In fact, I wrote a patch implementing this for 9.5
but never got around to finishing it.

.m


From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Paul A Jungwirth <pj(at)illuminatedcomputing(dot)com>
Cc: sudalai <sudalait2(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: First Aggregate Funtion?
Date: 2015-07-20 18:28:55
Message-ID: CAHyXU0xKQhrqwimGvQ8b3CvrZjWMye0r+pay_50RUBc+SFfbMw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Jul 20, 2015 at 10:06 AM, Paul A Jungwirth
<pj(at)illuminatedcomputing(dot)com> wrote:
>> The above implementation of "first" aggregate returns the first non-NULL item
>> value.
>
> I'm curious what advantages this approach has over these FIRST/LAST
> functions from the Wiki?:
>
> https://wiki.postgresql.org/wiki/First/last_%28aggregate%29
>
> Also to get the "first non-null value" you can apply an ordering to
> just the aggregate function, e.g.:
>
> select first(id order by start_time nulls last) from events;
>
> If you want speed you should probably write a C version.

C functions come with a lot of administration headaches, and the
performance gain will probably not be significant unless you totally
bypass the SPI interface. Even then, I suspect (vs the pl/pgsql
variant which caches plan) the majority of overhead is is in calling
the function, not the actual implementation. It's be interesting to
see the results though.

merlin


From: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>
To: Marko Tiikkaja <marko(at)joh(dot)to>, Corey Huinker <corey(dot)huinker(at)gmail(dot)com>, Paul A Jungwirth <pj(at)illuminatedcomputing(dot)com>
Cc: Merlin Moncure <mmoncure(at)gmail(dot)com>, sudalai <sudalait2(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: First Aggregate Funtion?
Date: 2015-07-21 18:11:27
Message-ID: 55AE8B4F.3050207@BlueTreble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 7/20/15 11:07 AM, Marko Tiikkaja wrote:
> On 7/20/15 6:02 PM, Corey Huinker wrote:
>> By using only(a.name_of_the_thing) we'd have a bit more clarity that the
>> author expected all of those values to be the same across the aggregate
>> window, and discovering otherwise was reason enough to fail the query.
>>
>> *IF* we're considering adding these to core, I think that only() would be
>> just a slight modification of the last() implementation, and could be
>> done
>> at the same time.
>>
>> [1] I don't care what it gets named. I just want the functionality.
>
> A big +1 from me. In fact, I wrote a patch implementing this for 9.5
> but never got around to finishing it.

A big +1 here too; I've wanted this many times in the past.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Data in Trouble? Get it in Treble! http://BlueTreble.com