Lists: | pgsql-hackers |
---|
From: | Marti Raudsepp <marti(at)juffo(dot)org> |
---|---|
To: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | [PATCH] Enable min/max optimization for bool_and/bool_or/every |
Date: | 2011-12-19 10:16:19 |
Message-ID: | CABRT9RCtunuH5RZB=uy1OcOwZHYLDHshuOc29eexVBmZz7m6PQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
Hi list,
As discussed on the pgsql-general list, the bool_and() and bool_or()
aggregate functions behave exactly like min() and max() would over
booleans. While it's not likely that people would have an appropriate
index on a boolean column, it seems it wouldn't cost us anything to
take advantage of this optimization, as it requires no code changes at
all, simply value changes in the pg_aggregate catalog.
Before:
db=# explain analyze select bool_and(b) from bools;
Aggregate (cost=1693.01..1693.02 rows=1 width=1)
-> Seq Scan on bools (cost=0.00..1443.01 rows=100001 width=1)
Total runtime: 29.736 ms
After:
db=# explain analyze select bool_and(b) from bools;
Result (cost=0.03..0.04 rows=1 width=0)
InitPlan 1 (returns $0)
-> Limit (cost=0.00..0.03 rows=1 width=1)
-> Index Scan using bools_b_idx on bools
(cost=0.00..3300.28 rows=100001 width=1)
Index Cond: (b IS NOT NULL)
Total runtime: 0.109 ms
Original discussion here:
http://archives.postgresql.org/message-id/CABRT9RAGwQEP+EFhVpZ6=B4cJEcUE2-QCpb_ZdrNPgQNa8xKuA@mail.gmail.com
PS: It seems that the min/max optimization isn't documented in the
manual (apart from release notes), so I didn't include any doc changes
in this patch.
Regards,
Marti
From: | Robert Haas <robertmhaas(at)gmail(dot)com> |
---|---|
To: | Marti Raudsepp <marti(at)juffo(dot)org> |
Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: [PATCH] Enable min/max optimization for bool_and/bool_or/every |
Date: | 2011-12-22 16:41:32 |
Message-ID: | CA+TgmoZffQ6J0kCTU-V7ceEoQk8N6AjgqGfZ1NAju8vudcQPFw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Mon, Dec 19, 2011 at 5:16 AM, Marti Raudsepp <marti(at)juffo(dot)org> wrote:
> PS: It seems that the min/max optimization isn't documented in the
> manual (apart from release notes), so I didn't include any doc changes
> in this patch.
I don't see a patch attached to this email, so either you forgot to
attach it, or the list ate it somehow.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From: | Marti Raudsepp <marti(at)juffo(dot)org> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: [PATCH] Enable min/max optimization for bool_and/bool_or/every |
Date: | 2011-12-22 16:52:48 |
Message-ID: | CABRT9RCqCVb9uenzmyxymSEeRH=OpF+BUiXdN7VK0198HZMZkg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Thu, Dec 22, 2011 at 18:41, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Mon, Dec 19, 2011 at 5:16 AM, Marti Raudsepp <marti(at)juffo(dot)org> wrote:
>> PS: It seems that the min/max optimization isn't documented in the
>> manual (apart from release notes), so I didn't include any doc changes
>> in this patch.
>
> I don't see a patch attached to this email, so either you forgot to
> attach it, or the list ate it somehow.
I forgot to attach it, sorry. Here it is.
Regards,
Marti
Attachment | Content-Type | Size |
---|---|---|
bool-minmax-optimization.patch | text/x-patch | 3.0 KB |
From: | Robert Haas <robertmhaas(at)gmail(dot)com> |
---|---|
To: | Marti Raudsepp <marti(at)juffo(dot)org> |
Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: [PATCH] Enable min/max optimization for bool_and/bool_or/every |
Date: | 2011-12-22 17:00:52 |
Message-ID: | CA+TgmoagcW7_YY7cnXECmj_+7RzGHvagsR5Jcp+VOqZ_BgXp1g@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Thu, Dec 22, 2011 at 11:52 AM, Marti Raudsepp <marti(at)juffo(dot)org> wrote:
> On Thu, Dec 22, 2011 at 18:41, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> On Mon, Dec 19, 2011 at 5:16 AM, Marti Raudsepp <marti(at)juffo(dot)org> wrote:
>>> PS: It seems that the min/max optimization isn't documented in the
>>> manual (apart from release notes), so I didn't include any doc changes
>>> in this patch.
>>
>> I don't see a patch attached to this email, so either you forgot to
>> attach it, or the list ate it somehow.
>
> I forgot to attach it, sorry. Here it is.
Nice. It doesn't get much simpler than that.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Marti Raudsepp <marti(at)juffo(dot)org> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: [PATCH] Enable min/max optimization for bool_and/bool_or/every |
Date: | 2012-02-08 17:48:20 |
Message-ID: | 28056.1328723300@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
Marti Raudsepp <marti(at)juffo(dot)org> writes:
> On Thu, Dec 22, 2011 at 18:41, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> On Mon, Dec 19, 2011 at 5:16 AM, Marti Raudsepp <marti(at)juffo(dot)org> wrote:
>>> PS: It seems that the min/max optimization isn't documented in the
>>> manual (apart from release notes), so I didn't include any doc changes
>>> in this patch.
>> I don't see a patch attached to this email, so either you forgot to
>> attach it, or the list ate it somehow.
> I forgot to attach it, sorry. Here it is.
I applied this patch, since I was busy applying catalog changes from you
anyway ;-).
I did think of a possible reason to reject the patch: with this change,
the planner will take longer to plan queries involving bool_and() et al,
since planagg.c will spend time looking (usually fruitlessly) for an
index-based plan. I tried this simple test case:
create table t (f1 bool);
\timing
explain select bool_and(f1) from t;
Best-case timings for the EXPLAIN were about 0.480 ms without the patch
and 0.500 ms with, so about a 4% penalty. On more complicated queries
I think the fractional cost would be less. This seemed acceptable to
me, so I went ahead and applied the change, but if anyone wants to
argue about it now's the time.
regards, tom lane
From: | Marti Raudsepp <marti(at)juffo(dot)org> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: [PATCH] Enable min/max optimization for bool_and/bool_or/every |
Date: | 2012-02-08 21:13:11 |
Message-ID: | CABRT9RD4J8HW9vZd7NQjcUbt+Ni1g-B-cQtWNFRw_FvOuB=egg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Wed, Feb 8, 2012 at 19:48, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> I applied this patch, since I was busy applying catalog changes from you
> anyway ;-).
Thanks :)
> I did think of a possible reason to reject the patch: with this change,
> the planner will take longer to plan queries involving bool_and() et al,
> since planagg.c will spend time looking (usually fruitlessly) for an
> index-based plan.
Good point, I should have done those measurements up front. Anyway,
since I've often noticed \timing to be unreliable for short queries, I
decided to retry your test with pgbench.
Long story short, I measured 27% overhead in the un-indexed column
case and 33% overhead for an indexed column. That's a lot more than I
expected. I even rebuilt and retried a few times to make sure I hadn't
botched something. The benchmark script is attached.
UNPATCHED
select bool_and(b) from unindexed;
tps = 13787.023113 (excluding connections establishing)
tps = 13880.484788 (excluding connections establishing)
tps = 13784.654542 (excluding connections establishing)
select bool_and(b) from indexed;
tps = 12536.650703 (excluding connections establishing)
tps = 12647.767993 (excluding connections establishing)
tps = 12500.956407 (excluding connections establishing)
PATCHED
select bool_and(b) from unindexed;
tps = 10096.834678 (excluding connections establishing)
tps = 10110.182425 (excluding connections establishing)
tps = 10103.904500 (excluding connections establishing)
select bool_and(b) from indexed;
tps = 8373.631407 (excluding connections establishing)
tps = 8659.917173 (excluding connections establishing)
tps = 8473.899896 (excluding connections establishing)
Regards,
Marti
Attachment | Content-Type | Size |
---|---|---|
bench_bool_and.sh | application/x-sh | 664 bytes |