Re: array_agg() NULL Handling

Lists: pgsql-hackers
From: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: array_agg() NULL Handling
Date: 2010-09-01 05:45:05
Message-ID: 770F1292-EC1C-4EEA-A315-186BC5AF3A40@kineticode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

The aggregate docs say:

> The first form of aggregate expression invokes the aggregate across all input rows for which the given expression(s) yield non-null values. (Actually, it is up to the aggregate function whether to ignore null values or not — but all the standard ones do.)

-- http://developer.postgresql.org/pgdocs/postgres/sql-expressions.html#SYNTAX-AGGREGATES

That, however, is not true of array_agg():

try=# CREATE TABLE foo(id int);
CREATE TABLE
try=# INSERT INTO foo values(1), (2), (NULL), (3);
INSERT 0 4
try=# select array_agg(id) from foo;
array_agg
──────────────
{1,2,NULL,3}
(1 row)

So are the docs right, or is array_agg() right?

Best,

David


From: Thom Brown <thom(at)linux(dot)com>
To: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: array_agg() NULL Handling
Date: 2010-09-01 06:56:58
Message-ID: AANLkTinWJ7w-w6K1Y156=gja25D2aNaLqURoMHcXQkPJ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 1 September 2010 06:45, David E. Wheeler <david(at)kineticode(dot)com> wrote:
> The aggregate docs say:
>
>> The first form of aggregate expression invokes the aggregate across all input rows for which the given expression(s) yield non-null values. (Actually, it is up to the aggregate function whether to ignore null values or not — but all the standard ones do.)
>
> -- http://developer.postgresql.org/pgdocs/postgres/sql-expressions.html#SYNTAX-AGGREGATES
>
> That, however, is not true of array_agg():
>
> try=# CREATE TABLE foo(id int);
> CREATE TABLE
> try=# INSERT INTO foo values(1), (2), (NULL), (3);
> INSERT 0 4
> try=# select array_agg(id) from foo;
>  array_agg
> ──────────────
>  {1,2,NULL,3}
> (1 row)
>
> So are the docs right, or is array_agg() right?

I think it might be both. array_agg doesn't return NULL, it returns
an array which contains NULL.

--
Thom Brown
Twitter: @darkixion
IRC (freenode): dark_ixion
Registered Linux user: #516935


From: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
To: Thom Brown <thom(at)linux(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: array_agg() NULL Handling
Date: 2010-09-01 07:03:41
Message-ID: 63C3C7B1-5C2A-4EC4-A4F6-856FD2949434@kineticode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Aug 31, 2010, at 11:56 PM, Thom Brown wrote:

>>> The first form of aggregate expression invokes the aggregate across all input rows for which the given expression(s) yield non-null values. (Actually, it is up to the aggregate function whether to ignore null values or not — but all the standard ones do.)
>>
>> -- http://developer.postgresql.org/pgdocs/postgres/sql-expressions.html#SYNTAX-AGGREGATES
>>
>> That, however, is not true of array_agg():
>>
>> try=# CREATE TABLE foo(id int);
>> CREATE TABLE
>> try=# INSERT INTO foo values(1), (2), (NULL), (3);
>> INSERT 0 4
>> try=# select array_agg(id) from foo;
>> array_agg
>> ──────────────
>> {1,2,NULL,3}
>> (1 row)
>>
>> So are the docs right, or is array_agg() right?
>
> I think it might be both. array_agg doesn't return NULL, it returns
> an array which contains NULL.

No, string_agg() doesn't work this way, for example:

select string_agg(id::text, ',') from foo;
string_agg
────────────
1,2,3
(1 row)

Note that it's not:

select string_agg(id::text, ',') from foo;
string_agg
────────────
1,2,,3
(1 row)

Best,

David


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: array_agg() NULL Handling
Date: 2010-09-01 07:30:08
Message-ID: AANLkTinQbAptPRsfO8KED2kxpZN=ePNw30AMxLRm14DS@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2010/9/1 David E. Wheeler <david(at)kineticode(dot)com>:
> The aggregate docs say:
>
>> The first form of aggregate expression invokes the aggregate across all input rows for which the given expression(s) yield non-null values. (Actually, it is up to the aggregate function whether to ignore null values or not — but all the standard ones do.)
>
> -- http://developer.postgresql.org/pgdocs/postgres/sql-expressions.html#SYNTAX-AGGREGATES
>
> That, however, is not true of array_agg():
>
> try=# CREATE TABLE foo(id int);
> CREATE TABLE
> try=# INSERT INTO foo values(1), (2), (NULL), (3);
> INSERT 0 4
> try=# select array_agg(id) from foo;
>  array_agg
> ──────────────
>  {1,2,NULL,3}
> (1 row)
>
> So are the docs right, or is array_agg() right?

Docs is wrong :) I like current implementation. You can remove a NULLs
from aggregation very simply, but different direction isn't possible

Regards
Pavel Stehule
>
> Best,
>
> David
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


From: Thom Brown <thom(at)linux(dot)com>
To: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: array_agg() NULL Handling
Date: 2010-09-01 08:06:48
Message-ID: AANLkTi=mmrk8txGTWPdqa-A6aw8L62yZDXAiLW71LXZ0@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 1 September 2010 07:56, Thom Brown <thom(at)linux(dot)com> wrote:
> On 1 September 2010 06:45, David E. Wheeler <david(at)kineticode(dot)com> wrote:
>> The aggregate docs say:
>>
>>> The first form of aggregate expression invokes the aggregate across all input rows for which the given expression(s) yield non-null values. (Actually, it is up to the aggregate function whether to ignore null values or not — but all the standard ones do.)
>>
>> -- http://developer.postgresql.org/pgdocs/postgres/sql-expressions.html#SYNTAX-AGGREGATES
>>
>> That, however, is not true of array_agg():
>>
>> try=# CREATE TABLE foo(id int);
>> CREATE TABLE
>> try=# INSERT INTO foo values(1), (2), (NULL), (3);
>> INSERT 0 4
>> try=# select array_agg(id) from foo;
>>  array_agg
>> ──────────────
>>  {1,2,NULL,3}
>> (1 row)
>>
>> So are the docs right, or is array_agg() right?
>
> I think it might be both.  array_agg doesn't return NULL, it returns
> an array which contains NULL.

The second I wrote that, I realised it was b*ll%$ks, as I was still in
the process of waking up.

--
Thom Brown
Twitter: @darkixion
IRC (freenode): dark_ixion
Registered Linux user: #516935


From: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: array_agg() NULL Handling
Date: 2010-09-01 15:16:41
Message-ID: CCAA9583-4F07-4B66-BE2E-8ADE59993180@kineticode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sep 1, 2010, at 12:30 AM, Pavel Stehule wrote:

> Docs is wrong :) I like current implementation. You can remove a NULLs
> from aggregation very simply, but different direction isn't possible

Would appreciate the recipe for removing the NULLs.

Best,

David


From: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
To: Thom Brown <thom(at)linux(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: array_agg() NULL Handling
Date: 2010-09-01 16:02:12
Message-ID: 32C7C1B2-8B42-4D14-B34F-9777BFAAD253@kineticode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sep 1, 2010, at 1:06 AM, Thom Brown wrote:

>> I think it might be both. array_agg doesn't return NULL, it returns
>> an array which contains NULL.
>
> The second I wrote that, I realised it was b*ll%$ks, as I was still in
> the process of waking up.

I know that feeling.

/me sips his coffee

Best,

David


From: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: array_agg() NULL Handling
Date: 2010-09-01 16:08:04
Message-ID: F9D43A22-AB3F-4329-BF65-C9E7CD4A3400@kineticode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sep 1, 2010, at 12:30 AM, Pavel Stehule wrote:

>> So are the docs right, or is array_agg() right?
>
> Docs is wrong :) I like current implementation. You can remove a NULLs
> from aggregation very simply, but different direction isn't possible

Patch:

diff --git a/doc/src/sgml/syntax.sgml b/doc/src/sgml/syntax.sgml
index 9f91939..e301019 100644
*** a/doc/src/sgml/syntax.sgml
--- b/doc/src/sgml/syntax.sgml
*************** sqrt(2)
*** 1543,1549 ****
The first form of aggregate expression invokes the aggregate
across all input rows for which the given expression(s) yield
non-null values. (Actually, it is up to the aggregate function
! whether to ignore null values or not &mdash; but all the standard ones do.)
The second form is the same as the first, since
<literal>ALL</literal> is the default. The third form invokes the
aggregate for all distinct values of the expressions found
--- 1543,1550 ----
The first form of aggregate expression invokes the aggregate
across all input rows for which the given expression(s) yield
non-null values. (Actually, it is up to the aggregate function
! whether to ignore null values or not &mdash; but all the standard
! ones except <function>array_agg</> do.)
The second form is the same as the first, since
<literal>ALL</literal> is the default. The third form invokes the
aggregate for all distinct values of the expressions found

Best,

David


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: array_agg() NULL Handling
Date: 2010-09-01 17:12:25
Message-ID: 27165.1283361145@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"David E. Wheeler" <david(at)kineticode(dot)com> writes:
> *** 1543,1549 ****
> The first form of aggregate expression invokes the aggregate
> across all input rows for which the given expression(s) yield
> non-null values. (Actually, it is up to the aggregate function
> ! whether to ignore null values or not &mdash; but all the standard ones do.)
> The second form is the same as the first, since
> <literal>ALL</literal> is the default. The third form invokes the
> aggregate for all distinct values of the expressions found
> --- 1543,1550 ----
> The first form of aggregate expression invokes the aggregate
> across all input rows for which the given expression(s) yield
> non-null values. (Actually, it is up to the aggregate function
> ! whether to ignore null values or not &mdash; but all the standard
> ! ones except <function>array_agg</> do.)
> The second form is the same as the first, since
> <literal>ALL</literal> is the default. The third form invokes the
> aggregate for all distinct values of the expressions found

I think when that text was written, it was meant to imply "all the
aggregates defined in SQL92". There seems to be a lot of confusion
in this thread about whether "standard" means "defined by SQL spec"
or "built-in in Postgres". Should we try to refine the wording to
clarify that?

Even more to the point, should we deliberately make this vaguer so that
we aren't finding ourselves with obsolete text again and again? You can
bet that people adding new aggregates in the future aren't going to
think to update this sentence, any more than happened with array_agg.

regards, tom lane


From: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: array_agg() NULL Handling
Date: 2010-09-01 17:16:05
Message-ID: 1CCE16A9-D6B8-4251-B388-1373AFB6D0C7@kineticode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sep 1, 2010, at 10:12 AM, Tom Lane wrote:

> I think when that text was written, it was meant to imply "all the
> aggregates defined in SQL92". There seems to be a lot of confusion
> in this thread about whether "standard" means "defined by SQL spec"
> or "built-in in Postgres". Should we try to refine the wording to
> clarify that?

Yes please.

> Even more to the point, should we deliberately make this vaguer so that
> we aren't finding ourselves with obsolete text again and again? You can
> bet that people adding new aggregates in the future aren't going to
> think to update this sentence, any more than happened with array_agg.

Perhaps “consult the docs for each aggregate to determine how it handles NULLs.”

Best,

David


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: array_agg() NULL Handling
Date: 2010-09-01 17:30:37
Message-ID: 27470.1283362237@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"David E. Wheeler" <david(at)kineticode(dot)com> writes:
> On Sep 1, 2010, at 10:12 AM, Tom Lane wrote:
>> Even more to the point, should we deliberately make this vaguer so that
>> we aren't finding ourselves with obsolete text again and again? You can
>> bet that people adding new aggregates in the future aren't going to
>> think to update this sentence, any more than happened with array_agg.

> Perhaps consult the docs for each aggregate to determine how it handles NULLs.

Hm, actually the whole para needs work. It was designed at a time when
DISTINCT automatically discarded nulls, which isn't true anymore, and
that fact was patched-in in a very awkward way too. Perhaps something
like

The first form of aggregate expression invokes the aggregate
once for each input row.
The second form is the same as the first, since
<literal>ALL</literal> is the default.
The third form invokes the aggregate once for each distinct value,
or set of values, of the expression(s) found in the input rows.
The last form invokes the aggregate once for each input row; since no
particular input value is specified, it is generally only useful
for the <function>count(*)</function> aggregate function.

Most aggregate functions ignore null inputs, so that rows in which
one or more of the expression(s) yield null are discarded. (This
can be assumed to be true, unless otherwise specified, for all
built-in aggregates.)

Then we have to make sure array_agg is properly documented, but we
don't have to insert something into the description of every single
aggregate, which is what your proposal would require.

regards, tom lane


From: David Fetter <david(at)fetter(dot)org>
To: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: array_agg() NULL Handling
Date: 2010-09-01 17:47:21
Message-ID: 20100901174721.GA10987@fetter.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Sep 01, 2010 at 08:16:41AM -0700, David Wheeler wrote:
> On Sep 1, 2010, at 12:30 AM, Pavel Stehule wrote:
>
> > Docs is wrong :) I like current implementation. You can remove a
> > NULLs from aggregation very simply, but different direction isn't
> > possible
>
> Would appreciate the recipe for removing the NULLs.

WHERE clause :P

Cheers,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


From: Thom Brown <thom(at)linux(dot)com>
To: David Fetter <david(at)fetter(dot)org>
Cc: "David E(dot) Wheeler" <david(at)kineticode(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: array_agg() NULL Handling
Date: 2010-09-01 17:52:35
Message-ID: AANLkTikOynn0Hh-19+H6MeR0MsT1h3Yk3vHCfZDXQZBV@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 1 September 2010 18:47, David Fetter <david(at)fetter(dot)org> wrote:
> On Wed, Sep 01, 2010 at 08:16:41AM -0700, David Wheeler wrote:
>> On Sep 1, 2010, at 12:30 AM, Pavel Stehule wrote:
>>
>> > Docs is wrong :) I like current implementation.  You can remove a
>> > NULLs from aggregation very simply, but different direction isn't
>> > possible
>>
>> Would appreciate the recipe for removing the NULLs.
>
> WHERE clause :P

There may be cases where that's undesirable, such as there being more
than one aggregate in the SELECT list, or the column being grouped on
needing to return rows regardless as to whether there's NULLs in the
column being targeted by array_agg() or not.
--
Thom Brown
Twitter: @darkixion
IRC (freenode): dark_ixion
Registered Linux user: #516935


From: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
To: Thom Brown <thom(at)linux(dot)com>
Cc: David Fetter <david(at)fetter(dot)org>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: array_agg() NULL Handling
Date: 2010-09-01 17:59:42
Message-ID: F9B5FA27-B90B-4D4A-A3A4-DB85BE96D2BE@kineticode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sep 1, 2010, at 10:52 AM, Thom Brown wrote:

>>> ould appreciate the recipe for removing the NULLs.
>>
>> WHERE clause :P
>
> There may be cases where that's undesirable, such as there being more
> than one aggregate in the SELECT list, or the column being grouped on
> needing to return rows regardless as to whether there's NULLs in the
> column being targeted by array_agg() or not.

Exactly the issue I ran into:

SELECT name AS distribution,
array_agg(
CASE relstatus WHEN 'stable'
THEN version
ELSE NULL
END ORDER BY version) AS stable,
array_agg(
CASE relstatus
WHEN 'testing'
THEN version
ELSE NULL
END ORDER BY version) AS testing
FROM distributions
GROUP BY name;

distribution │ stable │ testing
──────────────┼───────────────────┼────────────────────
pair │ {NULL,1.0.0,NULL} │ {0.0.1,NULL,1.2.0}
pgtap │ {NULL} │ {0.0.1}
(2 rows)

Annoying.

Best,

David


From: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: array_agg() NULL Handling
Date: 2010-09-01 18:06:51
Message-ID: 15DE4E41-24DD-4ABC-9FFE-620F6BD9E4B4@kineticode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sep 1, 2010, at 10:30 AM, Tom Lane wrote:

> Hm, actually the whole para needs work. It was designed at a time when
> DISTINCT automatically discarded nulls, which isn't true anymore, and
> that fact was patched-in in a very awkward way too. Perhaps something
> like
>
> The first form of aggregate expression invokes the aggregate
> once for each input row.
> The second form is the same as the first, since
> <literal>ALL</literal> is the default.
> The third form invokes the aggregate once for each distinct value,
> or set of values, of the expression(s) found in the input rows.
> The last form invokes the aggregate once for each input row; since no
> particular input value is specified, it is generally only useful
> for the <function>count(*)</function> aggregate function.
>
> Most aggregate functions ignore null inputs, so that rows in which
> one or more of the expression(s) yield null are discarded. (This
> can be assumed to be true, unless otherwise specified, for all
> built-in aggregates.)

I don't think you need the parentheses, though without them, "This" might be better written as "The ignoring of NULLs".

Just my $0.02.

Best,

David


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Thom Brown <thom(at)linux(dot)com>
Cc: David Fetter <david(at)fetter(dot)org>, "David E(dot) Wheeler" <david(at)kineticode(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: array_agg() NULL Handling
Date: 2010-09-01 18:09:48
Message-ID: AANLkTik4Q8_Ddjwk8BLRUu51mRHShG5JG4QB5vP7Pbt-@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2010/9/1 Thom Brown <thom(at)linux(dot)com>:
> On 1 September 2010 18:47, David Fetter <david(at)fetter(dot)org> wrote:
>> On Wed, Sep 01, 2010 at 08:16:41AM -0700, David Wheeler wrote:
>>> On Sep 1, 2010, at 12:30 AM, Pavel Stehule wrote:
>>>
>>> > Docs is wrong :) I like current implementation.  You can remove a
>>> > NULLs from aggregation very simply, but different direction isn't
>>> > possible
>>>
>>> Would appreciate the recipe for removing the NULLs.
>>
>> WHERE clause :P
>
> There may be cases where that's undesirable, such as there being more
> than one aggregate in the SELECT list, or the column being grouped on
> needing to return rows regardless as to whether there's NULLs in the
> column being targeted by array_agg() or not.

Then you can eliminate NULLs with simple function

CREATE OR REPLACE FUNCTION remove_null(anyarray)
RETURNS anyarray AS $$
SELECT ARRAY(SELECT x FROM unnest($1) g(x) WHERE x IS NOT NULL)
$$ LANGUAGE sql;
> --
> Thom Brown
> Twitter: @darkixion
> IRC (freenode): dark_ixion
> Registered Linux user: #516935
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


From: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Thom Brown <thom(at)linux(dot)com>, David Fetter <david(at)fetter(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: array_agg() NULL Handling
Date: 2010-09-01 18:12:37
Message-ID: BDD5C37A-0294-487E-9B7A-337C40489B2C@kineticode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sep 1, 2010, at 11:09 AM, Pavel Stehule wrote:

> Then you can eliminate NULLs with simple function
>
> CREATE OR REPLACE FUNCTION remove_null(anyarray)
> RETURNS anyarray AS $$
> SELECT ARRAY(SELECT x FROM unnest($1) g(x) WHERE x IS NOT NULL)
> $$ LANGUAGE sql;

Kind of defeats the purpose of the efficiency of the aggregate.

Best,

David


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: array_agg() NULL Handling
Date: 2010-09-01 18:23:32
Message-ID: 28324.1283365412@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"David E. Wheeler" <david(at)kineticode(dot)com> writes:
> On Sep 1, 2010, at 10:30 AM, Tom Lane wrote:
>> Most aggregate functions ignore null inputs, so that rows in which
>> one or more of the expression(s) yield null are discarded. (This
>> can be assumed to be true, unless otherwise specified, for all
>> built-in aggregates.)

> I don't think you need the parentheses, though without them, "This" might be better written as "The ignoring of NULLs".

Done, without the parentheses. I didn't add "The ignoring of NULLs",
it seemed a bit too verbose.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, David Fetter <david(at)fetter(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: array_agg() NULL Handling
Date: 2010-09-01 18:25:51
Message-ID: 28391.1283365551@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"David E. Wheeler" <david(at)kineticode(dot)com> writes:
> On Sep 1, 2010, at 11:09 AM, Pavel Stehule wrote:
>> Then you can eliminate NULLs with simple function

> Kind of defeats the purpose of the efficiency of the aggregate.

Well, you can build your own version of array_agg with the same
implementation, except you mark the transition function as strict ...

regards, tom lane


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "David E(dot) Wheeler" <david(at)kineticode(dot)com>, Thom Brown <thom(at)linux(dot)com>, David Fetter <david(at)fetter(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: array_agg() NULL Handling
Date: 2010-09-01 18:27:42
Message-ID: AANLkTi=vAhu6Q+579pXWaXe-6qyXvTA0+F7t3UA447MU@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2010/9/1 Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>:
> "David E. Wheeler" <david(at)kineticode(dot)com> writes:
>> On Sep 1, 2010, at 11:09 AM, Pavel Stehule wrote:
>>> Then you can eliminate NULLs with simple function
>
>> Kind of defeats the purpose of the efficiency of the aggregate.
>
> Well, you can build your own version of array_agg with the same
> implementation, except you mark the transition function as strict ...
>

I am checking this now, and it is not possible - it needs a some
initial value and there isn't possible to set a "internal" value.
probably some C coding is necessary.

Regards

Pavel

>                        regards, tom lane
>


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: "David E(dot) Wheeler" <david(at)kineticode(dot)com>, Thom Brown <thom(at)linux(dot)com>, David Fetter <david(at)fetter(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: array_agg() NULL Handling
Date: 2010-09-01 18:46:00
Message-ID: 28700.1283366760@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> writes:
> 2010/9/1 Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>:
>> Well, you can build your own version of array_agg with the same
>> implementation, except you mark the transition function as strict ...

> I am checking this now, and it is not possible - it needs a some
> initial value and there isn't possible to set a "internal" value.

Well, you can cheat a bit ...

regression=# create or replace function array_agg_transfn_strict(internal, anyelement) returns internal as 'array_agg_transfn' language internal immutable;
CREATE FUNCTION
regression=# create aggregate array_agg_strict(anyelement) (stype = internal,
sfunc = array_agg_transfn_strict, finalfunc = array_agg_finalfn);
CREATE AGGREGATE
regression=# create or replace function array_agg_transfn_strict(internal, anyelement) returns internal as 'array_agg_transfn' language internal strict immutable;
CREATE FUNCTION

regards, tom lane


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "David E(dot) Wheeler" <david(at)kineticode(dot)com>, Thom Brown <thom(at)linux(dot)com>, David Fetter <david(at)fetter(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: array_agg() NULL Handling
Date: 2010-09-01 18:52:40
Message-ID: AANLkTi=vjPvU4Y9AxmgDwr7r07ZmijjTEYdmLwBS4H_5@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2010/9/1 Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>:
> Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> writes:
>> 2010/9/1 Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>:
>>> Well, you can build your own version of array_agg with the same
>>> implementation, except you mark the transition function as strict ...
>
>> I am checking this now, and it is not possible - it needs a some
>> initial value and there isn't possible to set a "internal" value.
>
> Well, you can cheat a bit ...
>
> regression=# create or replace function array_agg_transfn_strict(internal, anyelement) returns internal as 'array_agg_transfn' language internal immutable;
> CREATE FUNCTION
> regression=# create aggregate array_agg_strict(anyelement) (stype = internal,
> sfunc = array_agg_transfn_strict, finalfunc = array_agg_finalfn);
> CREATE AGGREGATE
> regression=# create or replace function array_agg_transfn_strict(internal, anyelement) returns internal as 'array_agg_transfn' language internal strict immutable;
> CREATE FUNCTION
>

nice dark trick :) - but it doesn't work

ERROR: aggregate 16395 needs to have compatible input type and transition type
postgres=#

Pavel

>                        regards, tom lane
>


From: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>
To: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
Cc: Thom Brown <thom(at)linux(dot)com>, David Fetter <david(at)fetter(dot)org>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: array_agg() NULL Handling
Date: 2010-09-02 09:47:14
Message-ID: 87occgxysd.fsf@hi-media-techno.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"David E. Wheeler" <david(at)kineticode(dot)com> writes:
> On Sep 1, 2010, at 10:52 AM, Thom Brown wrote:
>
>>>> ould appreciate the recipe for removing the NULLs.
>>>
>>> WHERE clause :P
>>
>> There may be cases where that's undesirable, such as there being more
>> than one aggregate in the SELECT list, or the column being grouped on
>> needing to return rows regardless as to whether there's NULLs in the
>> column being targeted by array_agg() or not.
>
> Exactly the issue I ran into:
>
> SELECT name AS distribution,
> array_agg(
> CASE relstatus WHEN 'stable'
> THEN version
> ELSE NULL
> END ORDER BY version) AS stable,
> array_agg(
> CASE relstatus
> WHEN 'testing'
> THEN version
> ELSE NULL
> END ORDER BY version) AS testing
> FROM distributions
> GROUP BY name;

What about adding WHERE support to aggregates, adding to the ORDER BY
capability they already have?

SELECT array_agg(version WHERE relstatus = 'stable' ORDER BY version)

The current way to do that is using a subquery and unnest() and where
clause there, but that's not a good way to avoid to process stored data
in the aggregate / in the query.

Regards,
--
dim


From: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Thom Brown <thom(at)linux(dot)com>, David Fetter <david(at)fetter(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: array_agg() NULL Handling
Date: 2010-10-02 02:57:50
Message-ID: 98DC2665-75D3-4B2F-BACA-7DEED9A0C118@kineticode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sep 1, 2010, at 11:52 AM, Pavel Stehule wrote:

>> regression=# create or replace function array_agg_transfn_strict(internal, anyelement) returns internal as 'array_agg_transfn' language internal immutable;
>> CREATE FUNCTION
>> regression=# create aggregate array_agg_strict(anyelement) (stype = internal,
>> sfunc = array_agg_transfn_strict, finalfunc = array_agg_finalfn);
>> CREATE AGGREGATE
>> regression=# create or replace function array_agg_transfn_strict(internal, anyelement) returns internal as 'array_agg_transfn' language internal strict immutable;
>> CREATE FUNCTION
>>
>
> nice dark trick :) - but it doesn't work
>
> ERROR: aggregate 16395 needs to have compatible input type and transition type
> postgres=#

I could use this trick now. Anyone got any bright ideas how to fix it?

Thanks,

David