Re: Logical Aggregate Functions (eg ANY())

Lists: pgsql-general
From: Robert James <srobertjames(at)gmail(dot)com>
To: Postgres General <pgsql-general(at)postgresql(dot)org>
Subject: Logical Aggregate Functions (eg ANY())
Date: 2011-12-15 16:10:35
Message-ID: CAGYyBggtk3Lc9q51N7zbyHxFVA2nHhVTt03OG+KEMc9R390QtQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

I see Postgres (I'm using 8.3) has bitwise aggregate functions
(bit_or), but doesn't seem to have logical aggregate functions.

How do I do the equivalent of an ANY() or ALL() in PG Aggregate SQL?


From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Robert James <srobertjames(at)gmail(dot)com>
Cc: Postgres General <pgsql-general(at)postgresql(dot)org>
Subject: Re: Logical Aggregate Functions (eg ANY())
Date: 2011-12-15 17:05:23
Message-ID: CAHyXU0w-JHEPxj6EZ4QHEDcknKpV6QmdN8MA_K1c884OZT46AQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Thu, Dec 15, 2011 at 10:10 AM, Robert James <srobertjames(at)gmail(dot)com> wrote:
> I see Postgres (I'm using 8.3) has bitwise aggregate functions
> (bit_or), but doesn't seem to have logical aggregate functions.
>
> How do I do the equivalent of an ANY() or ALL() in PG Aggregate SQL?

CREATE OR REPLACE FUNCTION OrAgg(bool, bool) RETURNS BOOL AS
$$
SELECT COALESCE($1 or $2, false);
$$ LANGUAGE SQL IMMUTABLE;

create aggregate "any"(bool)
(
sfunc=OrAgg,
stype=bool
);

postgres=# select "any"(v) from (values (false), (true)) q(v);
any
-----
t
(1 row)

etc

note:, I don't like the use of double quoted "any" -- but I'm too lazy
to come up with a better name. :-)

merlin


From: Kirill Simonov <xi(at)resolvent(dot)net>
To: Robert James <srobertjames(at)gmail(dot)com>
Cc: Postgres General <pgsql-general(at)postgresql(dot)org>
Subject: Re: Logical Aggregate Functions (eg ANY())
Date: 2011-12-15 18:52:00
Message-ID: 4EEA41D0.1050707@resolvent.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On 12/15/2011 11:10 AM, Robert James wrote:
> I see Postgres (I'm using 8.3) has bitwise aggregate functions
> (bit_or), but doesn't seem to have logical aggregate functions.
>

They are called BOOL_AND and BOOL_OR, see
http://www.postgresql.org/docs/8.3/interactive/functions-aggregate.html

Thanks,
Kirill


From: Marti Raudsepp <marti(at)juffo(dot)org>
To: Robert James <srobertjames(at)gmail(dot)com>
Cc: Postgres General <pgsql-general(at)postgresql(dot)org>
Subject: Re: Logical Aggregate Functions (eg ANY())
Date: 2011-12-15 19:08:48
Message-ID: CABRT9RBN_LabYc+GGBD=3uCZPz70SyuiMHk2iDd0mG_vayiLYw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Thu, Dec 15, 2011 at 18:10, Robert James <srobertjames(at)gmail(dot)com> wrote:
> How do I do the equivalent of an ANY() or ALL() in PG Aggregate SQL?

Note that in many cases, writing an EXISTS(SELECT ...) or NOT
EXISTS(...) subquery is faster, since the planner can often optimize
those to a single index access -- whereas an aggregate function would
necessarily need to walk through and evaluate all potential rows.

Regards,
Marti


From: Robert James <srobertjames(at)gmail(dot)com>
To: Marti Raudsepp <marti(at)juffo(dot)org>
Cc: Postgres General <pgsql-general(at)postgresql(dot)org>
Subject: Re: Logical Aggregate Functions (eg ANY())
Date: 2011-12-18 00:06:21
Message-ID: CAGYyBgie=9a3JaM9tYORVYhtwybW_Er+v+B+fY2Bof377xgkvQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On 12/15/11, Marti Raudsepp <marti(at)juffo(dot)org> wrote:
> On Thu, Dec 15, 2011 at 18:10, Robert James <srobertjames(at)gmail(dot)com> wrote:
>> How do I do the equivalent of an ANY() or ALL() in PG Aggregate SQL?
>
> Note that in many cases, writing an EXISTS(SELECT ...) or NOT
> EXISTS(...) subquery is faster, since the planner can often optimize
> those to a single index access -- whereas an aggregate function would
> necessarily need to walk through and evaluate all potential rows.
>

Really? The planner can't tell that, for instance, BOOL_AND (false, *)
is automatically false?

(BTW Thanks for the great responses everyone! On this and other posts
on this list)


From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Robert James <srobertjames(at)gmail(dot)com>
Cc: Marti Raudsepp <marti(at)juffo(dot)org>, Postgres General <pgsql-general(at)postgresql(dot)org>
Subject: Re: Logical Aggregate Functions (eg ANY())
Date: 2011-12-19 04:32:09
Message-ID: CAHyXU0xU0RZ-3QOt_sPtWubYfaCyXYrmkz30yAzfKyu8EhcvKQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Sat, Dec 17, 2011 at 6:06 PM, Robert James <srobertjames(at)gmail(dot)com> wrote:
> On 12/15/11, Marti Raudsepp <marti(at)juffo(dot)org> wrote:
>> On Thu, Dec 15, 2011 at 18:10, Robert James <srobertjames(at)gmail(dot)com> wrote:
>>> How do I do the equivalent of an ANY() or ALL() in PG Aggregate SQL?
>>
>> Note that in many cases, writing an EXISTS(SELECT ...) or NOT
>> EXISTS(...) subquery is faster, since the planner can often optimize
>> those to a single index access -- whereas an aggregate function would
>> necessarily need to walk through and evaluate all potential rows.
>>
>
> Really? The planner can't tell that, for instance, BOOL_AND (false, *)
> is automatically false?

No (by the way, I really should have known about the bool_x aggregate
functions before suggesting a hand rolled one!), that would require
that the planner have very special understanding of the internal
workings of aggregate functions. There are a couple of cases where
the planner *does* have that function, for example it can convert
max(v) to 'order by v desc limit 1' to bag the index, but that's the
exception rather than the rule.

Most queries that can be equivalently expressed in aggregate and
non-aggregate form are faster without aggregates. However,
aggregation can be a cleaner expression of the problem which is
important as well (performance isn't everything!).

merlin


From: Marti Raudsepp <marti(at)juffo(dot)org>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: Robert James <srobertjames(at)gmail(dot)com>, Postgres General <pgsql-general(at)postgresql(dot)org>
Subject: Re: Logical Aggregate Functions (eg ANY())
Date: 2011-12-19 09:42:35
Message-ID: CABRT9RAGwQEP+EFhVpZ6=B4cJEcUE2-QCpb_ZdrNPgQNa8xKuA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Mon, Dec 19, 2011 at 06:32, Merlin Moncure <mmoncure(at)gmail(dot)com> wrote:
> that would require
> that the planner have very special understanding of the internal
> workings of aggregate functions.  There are a couple of cases where
> the planner *does* have that function, for example it can convert
> max(v) to 'order by v desc limit 1'

In fact, there's no reason why bool_or/bool_and couldn't do the same
thing. bool_or() is like the max() for boolean values, and bool_and()
is min().

CREATE AGGREGATE my_bool_or(bool) (sfunc=boolor_statefunc, stype=bool,
sortop= >);
CREATE AGGREGATE my_bool_and(bool) (sfunc=booland_statefunc,
stype=bool, sortop= <);

db=# explain analyze select bool_and(b) from bools;
Aggregate (cost=1693.01..1693.02 rows=1 width=1)
-> Seq Scan on bools (cost=0.00..1443.01 rows=100001 width=1)
Total runtime: 29.736 ms

db=# explain analyze select my_bool_and(b) from bools;
Result (cost=0.03..0.04 rows=1 width=0)
InitPlan 1 (returns $0)
-> Limit (cost=0.00..0.03 rows=1 width=1)
-> Index Scan using bools_b_idx on bools
(cost=0.00..3300.28 rows=100001 width=1)
Index Cond: (b IS NOT NULL)
Total runtime: 0.109 ms

Now obviously this still has limitations -- it doesn't do index
accesses in a GROUP BY query -- but it's a fairly simple modification.

Regards,
Marti


From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Marti Raudsepp <marti(at)juffo(dot)org>
Cc: Robert James <srobertjames(at)gmail(dot)com>, Postgres General <pgsql-general(at)postgresql(dot)org>
Subject: Re: Logical Aggregate Functions (eg ANY())
Date: 2011-12-19 13:02:19
Message-ID: CAHyXU0z9pBXkG7nDi+-TyTig0=EZFVr-Ctj66XMh3Kxtm6Lxfw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Mon, Dec 19, 2011 at 3:42 AM, Marti Raudsepp <marti(at)juffo(dot)org> wrote:
> In fact, there's no reason why bool_or/bool_and couldn't do the same
> thing. bool_or() is like the max() for boolean values, and bool_and()
> is min().
>
> CREATE AGGREGATE my_bool_or(bool) (sfunc=boolor_statefunc, stype=bool,
> sortop= >);
> CREATE AGGREGATE my_bool_and(bool) (sfunc=booland_statefunc,
> stype=bool, sortop= <);
>
> db=# explain analyze select bool_and(b) from bools;
>  Aggregate  (cost=1693.01..1693.02 rows=1 width=1)
>   ->  Seq Scan on bools  (cost=0.00..1443.01 rows=100001 width=1)
>  Total runtime: 29.736 ms
>
> db=# explain analyze select my_bool_and(b) from bools;
>  Result  (cost=0.03..0.04 rows=1 width=0)
>   InitPlan 1 (returns $0)
>     ->  Limit  (cost=0.00..0.03 rows=1 width=1)
>           ->  Index Scan using bools_b_idx on bools
> (cost=0.00..3300.28 rows=100001 width=1)
>                 Index Cond: (b IS NOT NULL)
>  Total runtime: 0.109 ms
>
> Now obviously this still has limitations -- it doesn't do index
> accesses in a GROUP BY query -- but it's a fairly simple modification.

That's really clever...bravo.

merlin