Re: array_agg and array_accum (patch)

Lists: pgsql-hackers
From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: array_agg and array_accum (patch)
Date: 2008-10-26 18:32:17
Message-ID: 1225045937.4434.21.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Here is a patch to support two aggregate functions:

1) ARRAY_AGG() -- SQL 2008 standard behavior, returns NULL on no input,
and skips NULL inputs.

2) ARRAY_ACCUM() -- Returns empty array on no input, and includes NULL
inputs.

These accumulate the result in a memory context that lives across calls
to the state function, so it's reasonably efficient. On my old laptop it
takes about 5s to generate an array of 1M elements -- not great, but at
least it's linear.

Although array_agg is the standard behavior, array_accum is important
because otherwise you always lose the NULLs, and that's difficult to
work around even with COALESCE.

I added them as new native functions because ARRAY_AGG is in the
standard, but if others think they should live elsewhere that's fine. I
think that they are generally pretty useful functions for people using
arrays.

This patch is contributed by Truviso.

Regards,
Jeff Davis

Attachment Content-Type Size
array_agg.diff.gz application/x-gzip 1.8 KB

From: "Ian Caulfield" <ian(dot)caulfield(at)gmail(dot)com>
To:
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: array_agg and array_accum (patch)
Date: 2008-10-26 20:08:51
Message-ID: 27bbfebe0810261308m3496c484q62cc327214bdcc4c@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I think array_agg also keeps nulls - although the draft standard I
have seems to contradict itself about this...

Ian

2008/10/26 Jeff Davis <pgsql(at)j-davis(dot)com>:
> Here is a patch to support two aggregate functions:
>
> 1) ARRAY_AGG() -- SQL 2008 standard behavior, returns NULL on no input,
> and skips NULL inputs.
>
> 2) ARRAY_ACCUM() -- Returns empty array on no input, and includes NULL
> inputs.
>
> These accumulate the result in a memory context that lives across calls
> to the state function, so it's reasonably efficient. On my old laptop it
> takes about 5s to generate an array of 1M elements -- not great, but at
> least it's linear.
>
> Although array_agg is the standard behavior, array_accum is important
> because otherwise you always lose the NULLs, and that's difficult to
> work around even with COALESCE.
>
> I added them as new native functions because ARRAY_AGG is in the
> standard, but if others think they should live elsewhere that's fine. I
> think that they are generally pretty useful functions for people using
> arrays.
>
> This patch is contributed by Truviso.
>
> Regards,
> Jeff Davis
>
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>


From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: array_agg and array_accum (patch)
Date: 2008-10-27 00:24:17
Message-ID: 20081027002417.GN4452@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Jeff,

* Jeff Davis (pgsql(at)j-davis(dot)com) wrote:
> 2) ARRAY_ACCUM() -- Returns empty array on no input, and includes NULL
> inputs.

Excellent.. I added it the easy way (from the online docs), but that's
clearly not at all efficient and was going to try and fix it, for psql
to use with the column-level privs patch. It'd be great to use a more
efficient mechanism like this, and to remove adding it from my patch
(erm, it's only one line currently, but it would have been alot more
eventually :).

I havn't actually reviewed the code at all, but +1 in general to adding
this to core.

Thanks,

Stephen


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Ian Caulfield" <ian(dot)caulfield(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, Jeff Davis <pgsql(at)j-davis(dot)com>
Subject: Re: array_agg and array_accum (patch)
Date: 2008-10-27 03:02:59
Message-ID: 29750.1225076579@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Ian Caulfield" <ian(dot)caulfield(at)gmail(dot)com> writes:
> I think array_agg also keeps nulls - although the draft standard I
> have seems to contradict itself about this...

The SQL:2008 draft I have says, in 10.9 <aggregate function> general
rule 8g

NOTE 267 - Null values are not eliminated when computing <array
aggregate function>. This, plus the optional <sort specification
list>, sets <array aggregate function> apart from <general set
function>s.

So that seems to make it perfectly clear that nulls aren't eliminated,
and furthermore to be an intentional override of any other part of the
spec that you might think says nulls should be eliminated. If you have
an argument to read it otherwise, please say exactly what.

A larger objection to Jeff's draft patch is that it doesn't implement
the <sort specification list>. I'm entirely happy about not doing that
--- the current SQL committee's willingness to invent random new syntax
and nonorthogonal behavior for every function they can think of will be
the death of SQL yet --- but it's something that we at least need to
document the workaround for.

regards, tom lane


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Ian Caulfield <ian(dot)caulfield(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: array_agg and array_accum (patch)
Date: 2008-10-27 15:47:23
Message-ID: 1225122443.1375.23.camel@jdavis
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, 2008-10-26 at 23:02 -0400, Tom Lane wrote:
> "Ian Caulfield" <ian(dot)caulfield(at)gmail(dot)com> writes:
> > I think array_agg also keeps nulls - although the draft standard I
> > have seems to contradict itself about this...
>
> The SQL:2008 draft I have says, in 10.9 <aggregate function> general
> rule 8g
>

I apologize, clearly I skimmed the standard too fast.

I'll review the standard, allow array_agg() to collect NULLs, perhaps
drop array_accum (if the only difference is the return value on no
input), and resubmit with docs.

Regards,
Jeff Davis


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Ian Caulfield <ian(dot)caulfield(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Jeff Davis <pgsql(at)j-davis(dot)com>
Subject: Re: array_agg and array_accum (patch)
Date: 2008-10-27 16:47:50
Message-ID: 4905F0B6.4070505@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> A larger objection to Jeff's draft patch is that it doesn't implement
> the <sort specification list>. I'm entirely happy about not doing that
> --- the current SQL committee's willingness to invent random new syntax
> and nonorthogonal behavior for every function they can think of will be
> the death of SQL yet --- but it's something that we at least need to
> document the workaround for.

How else will you tell an aggregate function whose result depends on the
input order which order you want? The only aggregates defined in the
standard where this matters are array_agg, array_accum, and xmlagg, but
it would also be useful in other cases such as a text concatenation
aggregate function or an aggregate function to calculate the correlation
(or whatever alternative metric we come up with). Given that the
standard does not provide for user-defined aggregates, I think the way
it's specified is perfectly OK.

Without a way to control the order, how useful are these array
aggregates really?


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: Ian Caulfield <ian(dot)caulfield(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Jeff Davis <pgsql(at)j-davis(dot)com>
Subject: Re: array_agg and array_accum (patch)
Date: 2008-10-27 17:14:56
Message-ID: 20365.1225127696@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Peter Eisentraut <peter_e(at)gmx(dot)net> writes:
> How else will you tell an aggregate function whose result depends on the
> input order which order you want?

You feed it from a subquery that has ORDER BY. The only reason the spec
needs this kluge is their insistence that ORDER BY not be used in
subqueries. Now I grant that there's some basis in relational theory
for that stand, but they certainly feel free to ignore academic notions
of cleanliness everywhere else in the spec.

regards, tom lane


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Ian Caulfield <ian(dot)caulfield(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: array_agg and array_accum (patch)
Date: 2008-10-27 17:41:26
Message-ID: 1225129286.25425.81.camel@dell.linuxdev.us.dell.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, 2008-10-27 at 18:47 +0200, Peter Eisentraut wrote:
> How else will you tell an aggregate function whose result depends on the
> input order which order you want? The only aggregates defined in the
> standard where this matters are array_agg, array_accum, and xmlagg, but

I don't see array_accum() in the standard, I wrote it just as an
alternative to array_agg() because I thought array_agg() ignored NULLs.

Regards,
Jeff Davis


From: "Robert Haas" <robertmhaas(at)gmail(dot)com>
To: "Jeff Davis" <pgsql(at)j-davis(dot)com>
Cc: "Peter Eisentraut" <peter_e(at)gmx(dot)net>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Ian Caulfield" <ian(dot)caulfield(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: array_agg and array_accum (patch)
Date: 2008-10-29 04:08:50
Message-ID: 603c8f070810282108m3f4dc1d3qe4e127b5268ecb81@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

It's worth noting that this is the third version of this idea that has
been submitted. Ian Caulfield submitted a patch to add this, and so
did I. Someone should probably look at all three of them and compare.

...Robert

On Mon, Oct 27, 2008 at 1:41 PM, Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
> On Mon, 2008-10-27 at 18:47 +0200, Peter Eisentraut wrote:
>> How else will you tell an aggregate function whose result depends on the
>> input order which order you want? The only aggregates defined in the
>> standard where this matters are array_agg, array_accum, and xmlagg, but
>
> I don't see array_accum() in the standard, I wrote it just as an
> alternative to array_agg() because I thought array_agg() ignored NULLs.
>
> Regards,
> Jeff Davis
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Ian Caulfield <ian(dot)caulfield(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: array_agg and array_accum (patch)
Date: 2008-10-31 06:19:15
Message-ID: 1225433955.1375.136.camel@jdavis
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, 2008-10-29 at 00:08 -0400, Robert Haas wrote:
> It's worth noting that this is the third version of this idea that has
> been submitted. Ian Caulfield submitted a patch to add this, and so
> did I. Someone should probably look at all three of them and compare.
>

If we include a function named array_accum(), it should return an empty
array on no input to match the function in the docs:

http://www.postgresql.org/docs/8.3/static/xaggr.html

Your function returns NULL on no input, which seems more like
array_agg().

Aside from that, I'm pretty open to anything, as long as one of our
patches makes it. If there are potential problems with the standard
(where we don't want to implement a violation), we should just do
array_accum(). If not, we might as well do the standard array_agg(),
perhaps without the ORDER BY clause.

We could also do both, because it is a little annoying to coalesce the
result or array_agg().

Regards,
Jeff Davis


From: Sam Mason <sam(at)samason(dot)me(dot)uk>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: array_agg and array_accum (patch)
Date: 2008-10-31 12:19:54
Message-ID: 20081031121954.GG2459@frubble.xen.chris-lamb.co.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Oct 30, 2008 at 11:19:15PM -0700, Jeff Davis wrote:
> If there are potential problems with the standard
> (where we don't want to implement a violation), we should just do
> array_accum(). If not, we might as well do the standard array_agg(),
> perhaps without the ORDER BY clause.

I've wanted an array_sort() function before; having this functionality
as a separate function also seems considerably prettier than some ad
hoc grammar, it also generalizes nicely to cases where the array isn't
coming from an aggregate.

Sam


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: array_agg and array_accum (patch)
Date: 2008-11-03 03:22:06
Message-ID: 1225682527.1375.150.camel@jdavis
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Here's an updated patch for just array_accum() with some simple docs. If
I should document this in more places, let me know.

I decided not to include array_agg() in this patch because it doesn't
support the standard's ORDER BY clause.

My reasoning is that, if someone is using the standard array_agg() and
porting to PostgreSQL, there's a fairly high chance they would be using
the ORDER BY clause as well, due to the nature of the function. If not,
and they really want a function called array_agg that returns NULL on no
input, it would be trivial to just create an extra final function that
behaved that way and create a new aggregate.

However, if people want me to put array_agg() back in I will.

Regards,
Jeff Davis

Attachment Content-Type Size
array_accum.patch.gz application/x-gzip 1.8 KB

From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: array_agg and array_accum (patch)
Date: 2008-11-13 16:07:14
Message-ID: 491C50B2.10506@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Jeff Davis wrote:
> Here's an updated patch for just array_accum() with some simple docs.

I have committed a "best of Robert Haas and Jeff Davis" array_agg()
function with standard SQL semantics. I believe this gives the best
consistency with other aggregate functions for the no-input-rows case.
If some other behavior is wanted, it is a coalesce() away, as the
documentation states.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: array_agg and array_accum (patch)
Date: 2008-11-13 23:12:00
Message-ID: 3517.1226617920@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Peter Eisentraut <peter_e(at)gmx(dot)net> writes:
> Jeff Davis wrote:
>> Here's an updated patch for just array_accum() with some simple docs.

> I have committed a "best of Robert Haas and Jeff Davis" array_agg()
> function with standard SQL semantics. I believe this gives the best
> consistency with other aggregate functions for the no-input-rows case.
> If some other behavior is wanted, it is a coalesce() away, as the
> documentation states.

The original reason for doing this work, I think, was to let us
deprecate contrib/intagg, or at least turn it into a thin wrapper
around core-provided functionality. We now have the means to do that
for int_array_aggregate, but what about int_array_enum?

It seems that it would be an easy evening's work to implement unnest(),
at least in the simple form
function unnest(anyarray) returns setof anyelement

without the WITH ORDINALITY syntax proposed by the SQL spec. Then
we could eliminate intagg's C code altogether, and just write it
as a couple of wrapper functions.

Does anyone have an objection to doing that?

regards, tom lane


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, Jeff Davis <pgsql(at)j-davis(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: array_agg and array_accum (patch)
Date: 2008-11-14 00:08:44
Message-ID: 20081114000844.GO4062@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:

> The original reason for doing this work, I think, was to let us
> deprecate contrib/intagg, or at least turn it into a thin wrapper
> around core-provided functionality. We now have the means to do that
> for int_array_aggregate, but what about int_array_enum?

And what about the patch to add sorted-array versions of some routines?

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: "Robert Haas" <robertmhaas(at)gmail(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Peter Eisentraut" <peter_e(at)gmx(dot)net>, "Jeff Davis" <pgsql(at)j-davis(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: array_agg and array_accum (patch)
Date: 2008-11-14 00:37:46
Message-ID: 603c8f070811131637n1c82422bv6737fcb7650f089b@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> It seems that it would be an easy evening's work to implement unnest(),
> at least in the simple form
> function unnest(anyarray) returns setof anyelement
>
> without the WITH ORDINALITY syntax proposed by the SQL spec. Then
> we could eliminate intagg's C code altogether, and just write it
> as a couple of wrapper functions.
>
> Does anyone have an objection to doing that?

I think it would be great.

...Robert


From: "Robert Haas" <robertmhaas(at)gmail(dot)com>
To: "Peter Eisentraut" <peter_e(at)gmx(dot)net>
Cc: "Jeff Davis" <pgsql(at)j-davis(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: array_agg and array_accum (patch)
Date: 2008-11-14 03:40:26
Message-ID: 603c8f070811131940n4e71cf4oabdee2154da6b95f@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

It looks to me like section 34.10 of the docs might benefit from some
sort of update in light of this patch, since the builtin array_agg now
does the same thing as the proposed user-defined array_accum, only
better. Presumably we should either pick a different example, or add
a note that a builtin is available that does the same thing more
efficiently.

...Robert

On Thu, Nov 13, 2008 at 11:07 AM, Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:
> Jeff Davis wrote:
>>
>> Here's an updated patch for just array_accum() with some simple docs.
>
> I have committed a "best of Robert Haas and Jeff Davis" array_agg() function
> with standard SQL semantics. I believe this gives the best consistency with
> other aggregate functions for the no-input-rows case. If some other behavior
> is wanted, it is a coalesce() away, as the documentation states.
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Robert Haas" <robertmhaas(at)gmail(dot)com>
Cc: "Peter Eisentraut" <peter_e(at)gmx(dot)net>, "Jeff Davis" <pgsql(at)j-davis(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: array_agg and array_accum (patch)
Date: 2008-11-20 21:11:50
Message-ID: 7986.1227215510@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Robert Haas" <robertmhaas(at)gmail(dot)com> writes:
> It looks to me like section 34.10 of the docs might benefit from some
> sort of update in light of this patch, since the builtin array_agg now
> does the same thing as the proposed user-defined array_accum, only
> better. Presumably we should either pick a different example, or add
> a note that a builtin is available that does the same thing more
> efficiently.

I did the latter. If you can think of an equally plausible and short
example of a polymorphic aggregate, we could certainly replace the
example instead ...

regards, tom lane


From: "Merlin Moncure" <mmoncure(at)gmail(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Robert Haas" <robertmhaas(at)gmail(dot)com>, "Peter Eisentraut" <peter_e(at)gmx(dot)net>, "Jeff Davis" <pgsql(at)j-davis(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: array_agg and array_accum (patch)
Date: 2008-11-20 21:44:46
Message-ID: b42b73150811201344s2e1a0380n7842889a86a9f259@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Nov 20, 2008 at 4:11 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> "Robert Haas" <robertmhaas(at)gmail(dot)com> writes:
>> It looks to me like section 34.10 of the docs might benefit from some
>> sort of update in light of this patch, since the builtin array_agg now
>> does the same thing as the proposed user-defined array_accum, only
>> better. Presumably we should either pick a different example, or add
>> a note that a builtin is available that does the same thing more
>> efficiently.
>
> I did the latter. If you can think of an equally plausible and short
> example of a polymorphic aggregate, we could certainly replace the
> example instead ...

maybe show how to stack arrays?
see: http://www.nabble.com/text-array-accumulate-to-multidimensional-text-array-td20098591.html

IMO a good example of how you can write aggregates in a language other
than C, which is IMO an underutilized technique.

CREATE OR REPLACE FUNCTION array_cat1(p1 anyarray, p2 anyarray)
RETURNS anyarray AS
$$
SELECT CASE WHEN $1 = '{}'::text[] THEN ARRAY[p2] ELSE ARRAY_CAT(p1, p2) END;
$$ LANGUAGE sql;

CREATE AGGREGATE array_stack(anyarray)
(
sfunc = array_cat1,
stype = anyarray,
initcond = '{}'
);

merlin