Re: [PATCH] Enable min/max optimization for bool_and/bool_or/every

Lists: pgsql-hackers
From: Marti Raudsepp <marti(at)juffo(dot)org>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: [PATCH] Enable min/max optimization for bool_and/bool_or/every
Date: 2011-12-19 10:16:19
Message-ID: CABRT9RCtunuH5RZB=uy1OcOwZHYLDHshuOc29eexVBmZz7m6PQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi list,

As discussed on the pgsql-general list, the bool_and() and bool_or()
aggregate functions behave exactly like min() and max() would over
booleans. While it's not likely that people would have an appropriate
index on a boolean column, it seems it wouldn't cost us anything to
take advantage of this optimization, as it requires no code changes at
all, simply value changes in the pg_aggregate catalog.

Before:
db=# explain analyze select bool_and(b) from bools;
Aggregate (cost=1693.01..1693.02 rows=1 width=1)
-> Seq Scan on bools (cost=0.00..1443.01 rows=100001 width=1)
Total runtime: 29.736 ms

After:
db=# explain analyze select bool_and(b) from bools;
Result (cost=0.03..0.04 rows=1 width=0)
InitPlan 1 (returns $0)
-> Limit (cost=0.00..0.03 rows=1 width=1)
-> Index Scan using bools_b_idx on bools
(cost=0.00..3300.28 rows=100001 width=1)
Index Cond: (b IS NOT NULL)
Total runtime: 0.109 ms

Original discussion here:
http://archives.postgresql.org/message-id/CABRT9RAGwQEP+EFhVpZ6=B4cJEcUE2-QCpb_ZdrNPgQNa8xKuA@mail.gmail.com

PS: It seems that the min/max optimization isn't documented in the
manual (apart from release notes), so I didn't include any doc changes
in this patch.

Regards,
Marti


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Marti Raudsepp <marti(at)juffo(dot)org>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] Enable min/max optimization for bool_and/bool_or/every
Date: 2011-12-22 16:41:32
Message-ID: CA+TgmoZffQ6J0kCTU-V7ceEoQk8N6AjgqGfZ1NAju8vudcQPFw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Dec 19, 2011 at 5:16 AM, Marti Raudsepp <marti(at)juffo(dot)org> wrote:
> PS: It seems that the min/max optimization isn't documented in the
> manual (apart from release notes), so I didn't include any doc changes
> in this patch.

I don't see a patch attached to this email, so either you forgot to
attach it, or the list ate it somehow.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Marti Raudsepp <marti(at)juffo(dot)org>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] Enable min/max optimization for bool_and/bool_or/every
Date: 2011-12-22 16:52:48
Message-ID: CABRT9RCqCVb9uenzmyxymSEeRH=OpF+BUiXdN7VK0198HZMZkg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Dec 22, 2011 at 18:41, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Mon, Dec 19, 2011 at 5:16 AM, Marti Raudsepp <marti(at)juffo(dot)org> wrote:
>> PS: It seems that the min/max optimization isn't documented in the
>> manual (apart from release notes), so I didn't include any doc changes
>> in this patch.
>
> I don't see a patch attached to this email, so either you forgot to
> attach it, or the list ate it somehow.

I forgot to attach it, sorry. Here it is.

Regards,
Marti

Attachment Content-Type Size
bool-minmax-optimization.patch text/x-patch 3.0 KB

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Marti Raudsepp <marti(at)juffo(dot)org>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] Enable min/max optimization for bool_and/bool_or/every
Date: 2011-12-22 17:00:52
Message-ID: CA+TgmoagcW7_YY7cnXECmj_+7RzGHvagsR5Jcp+VOqZ_BgXp1g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Dec 22, 2011 at 11:52 AM, Marti Raudsepp <marti(at)juffo(dot)org> wrote:
> On Thu, Dec 22, 2011 at 18:41, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> On Mon, Dec 19, 2011 at 5:16 AM, Marti Raudsepp <marti(at)juffo(dot)org> wrote:
>>> PS: It seems that the min/max optimization isn't documented in the
>>> manual (apart from release notes), so I didn't include any doc changes
>>> in this patch.
>>
>> I don't see a patch attached to this email, so either you forgot to
>> attach it, or the list ate it somehow.
>
> I forgot to attach it, sorry. Here it is.

Nice. It doesn't get much simpler than that.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Marti Raudsepp <marti(at)juffo(dot)org>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] Enable min/max optimization for bool_and/bool_or/every
Date: 2012-02-08 17:48:20
Message-ID: 28056.1328723300@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Marti Raudsepp <marti(at)juffo(dot)org> writes:
> On Thu, Dec 22, 2011 at 18:41, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> On Mon, Dec 19, 2011 at 5:16 AM, Marti Raudsepp <marti(at)juffo(dot)org> wrote:
>>> PS: It seems that the min/max optimization isn't documented in the
>>> manual (apart from release notes), so I didn't include any doc changes
>>> in this patch.

>> I don't see a patch attached to this email, so either you forgot to
>> attach it, or the list ate it somehow.

> I forgot to attach it, sorry. Here it is.

I applied this patch, since I was busy applying catalog changes from you
anyway ;-).

I did think of a possible reason to reject the patch: with this change,
the planner will take longer to plan queries involving bool_and() et al,
since planagg.c will spend time looking (usually fruitlessly) for an
index-based plan. I tried this simple test case:

create table t (f1 bool);
\timing
explain select bool_and(f1) from t;

Best-case timings for the EXPLAIN were about 0.480 ms without the patch
and 0.500 ms with, so about a 4% penalty. On more complicated queries
I think the fractional cost would be less. This seemed acceptable to
me, so I went ahead and applied the change, but if anyone wants to
argue about it now's the time.

regards, tom lane


From: Marti Raudsepp <marti(at)juffo(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] Enable min/max optimization for bool_and/bool_or/every
Date: 2012-02-08 21:13:11
Message-ID: CABRT9RD4J8HW9vZd7NQjcUbt+Ni1g-B-cQtWNFRw_FvOuB=egg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Feb 8, 2012 at 19:48, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> I applied this patch, since I was busy applying catalog changes from you
> anyway ;-).

Thanks :)

> I did think of a possible reason to reject the patch: with this change,
> the planner will take longer to plan queries involving bool_and() et al,
> since planagg.c will spend time looking (usually fruitlessly) for an
> index-based plan.

Good point, I should have done those measurements up front. Anyway,
since I've often noticed \timing to be unreliable for short queries, I
decided to retry your test with pgbench.

Long story short, I measured 27% overhead in the un-indexed column
case and 33% overhead for an indexed column. That's a lot more than I
expected. I even rebuilt and retried a few times to make sure I hadn't
botched something. The benchmark script is attached.

UNPATCHED
select bool_and(b) from unindexed;
tps = 13787.023113 (excluding connections establishing)
tps = 13880.484788 (excluding connections establishing)
tps = 13784.654542 (excluding connections establishing)
select bool_and(b) from indexed;
tps = 12536.650703 (excluding connections establishing)
tps = 12647.767993 (excluding connections establishing)
tps = 12500.956407 (excluding connections establishing)

PATCHED
select bool_and(b) from unindexed;
tps = 10096.834678 (excluding connections establishing)
tps = 10110.182425 (excluding connections establishing)
tps = 10103.904500 (excluding connections establishing)
select bool_and(b) from indexed;
tps = 8373.631407 (excluding connections establishing)
tps = 8659.917173 (excluding connections establishing)
tps = 8473.899896 (excluding connections establishing)

Regards,
Marti

Attachment Content-Type Size
bench_bool_and.sh application/x-sh 664 bytes