Re: Improving avg performance for numeric

Lists: pgsql-hackers
From: Hadi Moshayedi <hadi(at)moshayedi(dot)net>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Cc: Mark Kirkwood <markir(at)paradise(dot)net(dot)nz>
Subject: Improving avg performance for numeric
Date: 2013-03-16 04:15:11
Message-ID: CAK=1=WrmCkWc_xQXs_bpUyswCPr7O9zkLmm8Oa7_nT2vybvBEQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Revisiting:
http://www.postgresql.org/message-id/45661BE7.4050205@paradise.net.nz

I think the reasons which the numeric average was slow were:
(1) Using Numeric for count, which is slower than int8 to increment,
(2) Constructing/deconstructing arrays at each transition step.

This is also discussed at:
http://www.citusdata.com/blog/53-postgres-performance-to-avg-or-to-sum-divided-by-count

So, I think we can improve the speed of numeric average by keeping the
transition state as an struct in the aggregate context, and just passing
the pointer to that struct from/to the aggregate transition function.

The attached patch uses this method.

I tested it using the data generated using:
CREATE TABLE avg_test AS SELECT (random() * 999)::decimal(5,2) as d FROM
generate_series(1, 10000000) s;

After applying this patch, run time of "SELECT avg(d) FROM avg_test;"
improves from 10.701 seconds to 5.204 seconds, which seems to be a huge
improvement.

I think we may also be able to use a similar method to improve performance
of some other numeric aggregates (like stddev). But I want to know your
feedback first.

Is this worth working on?

Thanks,
-- Hadi

Attachment Content-Type Size
numeric-avg-optimize.patch application/octet-stream 9.0 KB

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Hadi Moshayedi <hadi(at)moshayedi(dot)net>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Mark Kirkwood <markir(at)paradise(dot)net(dot)nz>
Subject: Re: Improving avg performance for numeric
Date: 2013-03-16 04:56:10
Message-ID: CAFj8pRA6LH+RE7OGsP8VvzTGvT=j3Tg9C8XZWHuhsnz8cSZqLA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hello

2013/3/16 Hadi Moshayedi <hadi(at)moshayedi(dot)net>:
> Revisiting:
> http://www.postgresql.org/message-id/45661BE7.4050205@paradise.net.nz
>
> I think the reasons which the numeric average was slow were:
> (1) Using Numeric for count, which is slower than int8 to increment,
> (2) Constructing/deconstructing arrays at each transition step.
>
> This is also discussed at:
> http://www.citusdata.com/blog/53-postgres-performance-to-avg-or-to-sum-divided-by-count
>
> So, I think we can improve the speed of numeric average by keeping the
> transition state as an struct in the aggregate context, and just passing the
> pointer to that struct from/to the aggregate transition function.
>
> The attached patch uses this method.
>
> I tested it using the data generated using:
> CREATE TABLE avg_test AS SELECT (random() * 999)::decimal(5,2) as d FROM
> generate_series(1, 10000000) s;
>
> After applying this patch, run time of "SELECT avg(d) FROM avg_test;"
> improves from 10.701 seconds to 5.204 seconds, which seems to be a huge
> improvement.
>
> I think we may also be able to use a similar method to improve performance
> of some other numeric aggregates (like stddev). But I want to know your
> feedback first.
>
> Is this worth working on?

nice

+1

Regards

Pavel

>
> Thanks,
> -- Hadi
>
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Hadi Moshayedi <hadi(at)moshayedi(dot)net>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Improving avg performance for numeric
Date: 2013-03-18 08:36:25
Message-ID: CAFj8pRDphM54dKU8bJWEqUK_kNPrhR2EMi8bWNe-cG_XSuYpEg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2013/3/16 Hadi Moshayedi <hadi(at)moshayedi(dot)net>:
> Revisiting:
> http://www.postgresql.org/message-id/45661BE7.4050205@paradise.net.nz
>
> I think the reasons which the numeric average was slow were:
> (1) Using Numeric for count, which is slower than int8 to increment,
> (2) Constructing/deconstructing arrays at each transition step.
>
> This is also discussed at:
> http://www.citusdata.com/blog/53-postgres-performance-to-avg-or-to-sum-divided-by-count
>
> So, I think we can improve the speed of numeric average by keeping the
> transition state as an struct in the aggregate context, and just passing the
> pointer to that struct from/to the aggregate transition function.
>
> The attached patch uses this method.
>
> I tested it using the data generated using:
> CREATE TABLE avg_test AS SELECT (random() * 999)::decimal(5,2) as d FROM
> generate_series(1, 10000000) s;
>
> After applying this patch, run time of "SELECT avg(d) FROM avg_test;"
> improves from 10.701 seconds to 5.204 seconds, which seems to be a huge
> improvement.
>
> I think we may also be able to use a similar method to improve performance
> of some other numeric aggregates (like stddev). But I want to know your
> feedback first.
>
> Is this worth working on?

I checked this patch and it has a interesting speedup - and a price of
this methoud should not be limited to numeric type only

Pavel

>
> Thanks,
> -- Hadi
>
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


From: Hadi Moshayedi <hadi(at)moshayedi(dot)net>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, mark(dot)kirkwood(at)catalyst(dot)net(dot)nz
Subject: Re: Improving avg performance for numeric
Date: 2013-03-18 08:55:04
Message-ID: CAK=1=WrD-67MNLksSGauEyREuHbWd_xZqJeANOnmw3GxxGB1RQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi Pavel,

Thanks a lot for your feedback.

I'll work more on this patch this week, and will send a more complete patch
later this week.

I'll also try to see how much is the speed up of this method for other
types.

Thanks,
-- Hadi

On Mon, Mar 18, 2013 at 10:36 AM, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>wrote:

> 2013/3/16 Hadi Moshayedi <hadi(at)moshayedi(dot)net>:
> > Revisiting:
> > http://www.postgresql.org/message-id/45661BE7.4050205@paradise.net.nz
> >
> > I think the reasons which the numeric average was slow were:
> > (1) Using Numeric for count, which is slower than int8 to increment,
> > (2) Constructing/deconstructing arrays at each transition step.
> >
> > This is also discussed at:
> >
> http://www.citusdata.com/blog/53-postgres-performance-to-avg-or-to-sum-divided-by-count
> >
> > So, I think we can improve the speed of numeric average by keeping the
> > transition state as an struct in the aggregate context, and just passing
> the
> > pointer to that struct from/to the aggregate transition function.
> >
> > The attached patch uses this method.
> >
> > I tested it using the data generated using:
> > CREATE TABLE avg_test AS SELECT (random() * 999)::decimal(5,2) as d FROM
> > generate_series(1, 10000000) s;
> >
> > After applying this patch, run time of "SELECT avg(d) FROM avg_test;"
> > improves from 10.701 seconds to 5.204 seconds, which seems to be a huge
> > improvement.
> >
> > I think we may also be able to use a similar method to improve
> performance
> > of some other numeric aggregates (like stddev). But I want to know your
> > feedback first.
> >
> > Is this worth working on?
>
> I checked this patch and it has a interesting speedup - and a price of
> this methoud should not be limited to numeric type only
>
> Pavel
>
> >
> > Thanks,
> > -- Hadi
> >
> >
> >
> > --
> > Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> > To make changes to your subscription:
> > http://www.postgresql.org/mailpref/pgsql-hackers
> >
>


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Hadi Moshayedi <hadi(at)moshayedi(dot)net>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, mark(dot)kirkwood(at)catalyst(dot)net(dot)nz
Subject: Re: Improving avg performance for numeric
Date: 2013-03-18 15:25:37
Message-ID: CAFj8pRCNkmt_92BxbMEDV8W+83k4yeCJuAtM4E8vCdv3tUC9oA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hello

I played with sum(numeric) optimization

Now it is based on generic numeric_add function - this code is
relative old - now we can design aggregates with internal transition
buffers, and probably we can do this work more effective.

I just removed useles palloc/free operations and I got a 30% better
performance! My patch is ugly - because I used a generic add_var
function. Because Sum, Avg and almost all aggregates functions is
limited by speed of sum calculation I thing so we need a new numeric
routines optimized for calculation "sum", that use a only preallocated
buffers. A speed of numeric is more important now, because there are
more and more warehouses, where CPU is botleneck.

Regards

Pavel

2013/3/18 Hadi Moshayedi <hadi(at)moshayedi(dot)net>:
> Hi Pavel,
>
> Thanks a lot for your feedback.
>
> I'll work more on this patch this week, and will send a more complete patch
> later this week.
>
> I'll also try to see how much is the speed up of this method for other
> types.
>
> Thanks,
> -- Hadi
>
>
> On Mon, Mar 18, 2013 at 10:36 AM, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
> wrote:
>>
>> 2013/3/16 Hadi Moshayedi <hadi(at)moshayedi(dot)net>:
>> > Revisiting:
>> > http://www.postgresql.org/message-id/45661BE7.4050205@paradise.net.nz
>> >
>> > I think the reasons which the numeric average was slow were:
>> > (1) Using Numeric for count, which is slower than int8 to increment,
>> > (2) Constructing/deconstructing arrays at each transition step.
>> >
>> > This is also discussed at:
>> >
>> > http://www.citusdata.com/blog/53-postgres-performance-to-avg-or-to-sum-divided-by-count
>> >
>> > So, I think we can improve the speed of numeric average by keeping the
>> > transition state as an struct in the aggregate context, and just passing
>> > the
>> > pointer to that struct from/to the aggregate transition function.
>> >
>> > The attached patch uses this method.
>> >
>> > I tested it using the data generated using:
>> > CREATE TABLE avg_test AS SELECT (random() * 999)::decimal(5,2) as d FROM
>> > generate_series(1, 10000000) s;
>> >
>> > After applying this patch, run time of "SELECT avg(d) FROM avg_test;"
>> > improves from 10.701 seconds to 5.204 seconds, which seems to be a huge
>> > improvement.
>> >
>> > I think we may also be able to use a similar method to improve
>> > performance
>> > of some other numeric aggregates (like stddev). But I want to know your
>> > feedback first.
>> >
>> > Is this worth working on?
>>
>> I checked this patch and it has a interesting speedup - and a price of
>> this methoud should not be limited to numeric type only
>>
>> Pavel
>>
>> >
>> > Thanks,
>> > -- Hadi
>> >
>> >
>> >
>> > --
>> > Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
>> > To make changes to your subscription:
>> > http://www.postgresql.org/mailpref/pgsql-hackers
>> >
>
>

Attachment Content-Type Size
numeric-sum-optimize.patch application/octet-stream 13.3 KB

From: Hadi Moshayedi <hadi(at)moshayedi(dot)net>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, mark(dot)kirkwood(at)catalyst(dot)net(dot)nz
Subject: Re: Improving avg performance for numeric
Date: 2013-03-19 06:43:23
Message-ID: CAK=1=WoQLNiGF0WL9E821iN_m1mAFK0MUJFzuhO_3uCOzJEMDg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hello,

I updated the patch by taking ideas from your patch, and unifying the
transition struct and update function for different aggregates. The speed
of avg improved even more. It now has 60% better performance than the
current committed version.

This patch optimizes numeric/int8 sum, avg, stddev_pop, stddev_samp,
var_pop, var_samp.

I also noticed that this patch makes matview test fail. It seems that it
just changes the ordering of rows for queries like "SELECT * FROM tv;".
Does this seem like a bug in my patch, or should we add "ORDER BY" clauses
to this test to make it more deterministic?

I also agree with you that adding sum functions to use preallocated buffers
will make even more optimization. I'll try to see if I can find a simple
way to do this.

Thanks,
-- Hadi

On Mon, Mar 18, 2013 at 5:25 PM, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>wrote:

> Hello
>
> I played with sum(numeric) optimization
>
> Now it is based on generic numeric_add function - this code is
> relative old - now we can design aggregates with internal transition
> buffers, and probably we can do this work more effective.
>
> I just removed useles palloc/free operations and I got a 30% better
> performance! My patch is ugly - because I used a generic add_var
> function. Because Sum, Avg and almost all aggregates functions is
> limited by speed of sum calculation I thing so we need a new numeric
> routines optimized for calculation "sum", that use a only preallocated
> buffers. A speed of numeric is more important now, because there are
> more and more warehouses, where CPU is botleneck.
>
> Regards
>
> Pavel
>
>
> 2013/3/18 Hadi Moshayedi <hadi(at)moshayedi(dot)net>:
> > Hi Pavel,
> >
> > Thanks a lot for your feedback.
> >
> > I'll work more on this patch this week, and will send a more complete
> patch
> > later this week.
> >
> > I'll also try to see how much is the speed up of this method for other
> > types.
> >
> > Thanks,
> > -- Hadi
> >
> >
> > On Mon, Mar 18, 2013 at 10:36 AM, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com
> >
> > wrote:
> >>
> >> 2013/3/16 Hadi Moshayedi <hadi(at)moshayedi(dot)net>:
> >> > Revisiting:
> >> > http://www.postgresql.org/message-id/45661BE7.4050205@paradise.net.nz
> >> >
> >> > I think the reasons which the numeric average was slow were:
> >> > (1) Using Numeric for count, which is slower than int8 to increment,
> >> > (2) Constructing/deconstructing arrays at each transition step.
> >> >
> >> > This is also discussed at:
> >> >
> >> >
> http://www.citusdata.com/blog/53-postgres-performance-to-avg-or-to-sum-divided-by-count
> >> >
> >> > So, I think we can improve the speed of numeric average by keeping the
> >> > transition state as an struct in the aggregate context, and just
> passing
> >> > the
> >> > pointer to that struct from/to the aggregate transition function.
> >> >
> >> > The attached patch uses this method.
> >> >
> >> > I tested it using the data generated using:
> >> > CREATE TABLE avg_test AS SELECT (random() * 999)::decimal(5,2) as d
> FROM
> >> > generate_series(1, 10000000) s;
> >> >
> >> > After applying this patch, run time of "SELECT avg(d) FROM avg_test;"
> >> > improves from 10.701 seconds to 5.204 seconds, which seems to be a
> huge
> >> > improvement.
> >> >
> >> > I think we may also be able to use a similar method to improve
> >> > performance
> >> > of some other numeric aggregates (like stddev). But I want to know
> your
> >> > feedback first.
> >> >
> >> > Is this worth working on?
> >>
> >> I checked this patch and it has a interesting speedup - and a price of
> >> this methoud should not be limited to numeric type only
> >>
> >> Pavel
> >>
> >> >
> >> > Thanks,
> >> > -- Hadi
> >> >
> >> >
> >> >
> >> > --
> >> > Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> >> > To make changes to your subscription:
> >> > http://www.postgresql.org/mailpref/pgsql-hackers
> >> >
> >
> >
>

Attachment Content-Type Size
numeric-optimize-v2.patch application/octet-stream 24.6 KB

From: Kevin Grittner <kgrittn(at)ymail(dot)com>
To: Hadi Moshayedi <hadi(at)moshayedi(dot)net>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, "mark(dot)kirkwood(at)catalyst(dot)net(dot)nz" <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>
Subject: Re: Improving avg performance for numeric
Date: 2013-03-19 15:38:10
Message-ID: 1363707490.38208.YahooMailNeo@web162906.mail.bf1.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hadi Moshayedi <hadi(at)moshayedi(dot)net> wrote:

> I updated the patch by taking ideas from your patch, and unifying
> the transition struct and update function for different
> aggregates. The speed of avg improved even more. It now has 60%
> better performance than the current committed version.

Outstanding!

> I also noticed that this patch makes matview test fail. It seems
> that it just changes the ordering of rows for queries like
> "SELECT * FROM tv;". Does this seem like a bug in my patch, or
> should we add "ORDER BY" clauses to this test to make it more
> deterministic?

I added some ORDER BY clauses.  That is probably a good thing
anyway for purposes of code coverage.  Does that fix it for you?

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Kevin Grittner <kgrittn(at)ymail(dot)com>
Cc: Hadi Moshayedi <hadi(at)moshayedi(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, "mark(dot)kirkwood(at)catalyst(dot)net(dot)nz" <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>
Subject: Re: Improving avg performance for numeric
Date: 2013-03-19 16:12:12
Message-ID: CAFj8pRDjD0Wc+eUaK4H2caOA_VKpG-K+u-UM8hFYfdiRv9HTZw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2013/3/19 Kevin Grittner <kgrittn(at)ymail(dot)com>:
> Hadi Moshayedi <hadi(at)moshayedi(dot)net> wrote:
>
>> I updated the patch by taking ideas from your patch, and unifying
>> the transition struct and update function for different
>> aggregates. The speed of avg improved even more. It now has 60%
>> better performance than the current committed version.
>
> Outstanding!

I did some tests ala OLAP queries and I am thinking so ~ 40% speedup
for queries with AVG is realistic. Depends on other conditions.

But there are lot of situation when data are in shared buffers or file
system memory and then this patch can carry significant speedup - and
probably can be better if some better algorithm for sum two numeric
numbers in aggregate.

Regards

Pavel

>
>> I also noticed that this patch makes matview test fail. It seems
>> that it just changes the ordering of rows for queries like
>> "SELECT * FROM tv;". Does this seem like a bug in my patch, or
>> should we add "ORDER BY" clauses to this test to make it more
>> deterministic?
>
> I added some ORDER BY clauses. That is probably a good thing
> anyway for purposes of code coverage. Does that fix it for you?
>
> --
> Kevin Grittner
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Kevin Grittner <kgrittn(at)ymail(dot)com>
Cc: Hadi Moshayedi <hadi(at)moshayedi(dot)net>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, "mark(dot)kirkwood(at)catalyst(dot)net(dot)nz" <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>
Subject: Re: Improving avg performance for numeric
Date: 2013-03-19 16:25:47
Message-ID: 6095.1363710347@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Kevin Grittner <kgrittn(at)ymail(dot)com> writes:
> Hadi Moshayedi <hadi(at)moshayedi(dot)net> wrote:
>> I also noticed that this patch makes matview test fail. It seems
>> that it just changes the ordering of rows for queries like
>> "SELECT * FROM tv;". Does this seem like a bug in my patch, or
>> should we add "ORDER BY" clauses to this test to make it more
>> deterministic?

> I added some ORDER BY clauses. That is probably a good thing
> anyway for purposes of code coverage. Does that fix it for you?

Uh, what? Fooling around with the implementation of avg() should surely
not change any planning decisions.

regards, tom lane


From: Hadi Moshayedi <hadi(at)moshayedi(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Kevin Grittner <kgrittn(at)ymail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, "mark(dot)kirkwood(at)catalyst(dot)net(dot)nz" <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>
Subject: Re: Improving avg performance for numeric
Date: 2013-03-19 16:37:18
Message-ID: CAK=1=Wpj2snL2XUU+15dQ_DozpF8JygjQ3uo_0AJ_A50D4Dxug@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I am not sure how this works, but I also changed numeric sum(), and the
views in question had a numeric sum() column. Can that have any impact?

I am going to dig deeper to see why this happens.

On Tue, Mar 19, 2013 at 6:25 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Kevin Grittner <kgrittn(at)ymail(dot)com> writes:
> > Hadi Moshayedi <hadi(at)moshayedi(dot)net> wrote:
> >> I also noticed that this patch makes matview test fail. It seems
> >> that it just changes the ordering of rows for queries like
> >> "SELECT * FROM tv;". Does this seem like a bug in my patch, or
> >> should we add "ORDER BY" clauses to this test to make it more
> >> deterministic?
>
> > I added some ORDER BY clauses. That is probably a good thing
> > anyway for purposes of code coverage. Does that fix it for you?
>
> Uh, what? Fooling around with the implementation of avg() should surely
> not change any planning decisions.
>
> regards, tom lane
>


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Hadi Moshayedi <hadi(at)moshayedi(dot)net>
Cc: Kevin Grittner <kgrittn(at)ymail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, "mark(dot)kirkwood(at)catalyst(dot)net(dot)nz" <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>
Subject: Re: Improving avg performance for numeric
Date: 2013-03-19 16:45:48
Message-ID: 6577.1363711548@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

[ please do not top-reply ]

Hadi Moshayedi <hadi(at)moshayedi(dot)net> writes:
> On Tue, Mar 19, 2013 at 6:25 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Uh, what? Fooling around with the implementation of avg() should surely
>> not change any planning decisions.

> I am not sure how this works, but I also changed numeric sum(), and the
> views in question had a numeric sum() column. Can that have any impact?

[ looks at patch... ] Oh, I see what's affecting the plan: you changed
the aggtranstypes to internal for a bunch of aggregates. That's not
very good, because right now the planner takes that to mean that the
aggregate could eat a lot of space. We don't want that to happen for
these aggregates, I think.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Hadi Moshayedi <hadi(at)moshayedi(dot)net>
Cc: Kevin Grittner <kgrittn(at)ymail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, "mark(dot)kirkwood(at)catalyst(dot)net(dot)nz" <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>
Subject: Re: Improving avg performance for numeric
Date: 2013-03-19 17:58:09
Message-ID: 8165.1363715889@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I wrote:
> [ looks at patch... ] Oh, I see what's affecting the plan: you changed
> the aggtranstypes to internal for a bunch of aggregates. That's not
> very good, because right now the planner takes that to mean that the
> aggregate could eat a lot of space. We don't want that to happen for
> these aggregates, I think.

After thinking about that for awhile: if we pursue this type of
optimization, what would probably be appropriate is to add an aggregate
property (stored in pg_aggregate) that allows direct specification of
the size that the planner should assume for the aggregate's transition
value. We were getting away with a hardwired assumption of 8K for
"internal" because the existing aggregates that used that transtype all
had similar properties, but it was always really a band-aid not a proper
solution. A per-aggregate override could be useful in other cases too.

This was looking like 9.4 material already, but adding such a property
would definitely put it over the top of what we could think about
squeezing into 9.3, IMO.

regards, tom lane


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Hadi Moshayedi <hadi(at)moshayedi(dot)net>, Kevin Grittner <kgrittn(at)ymail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, "mark(dot)kirkwood(at)catalyst(dot)net(dot)nz" <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>
Subject: Re: Improving avg performance for numeric
Date: 2013-03-20 11:02:59
Message-ID: CAFj8pRCpkdU16yZByicA6jhwrv4kyqZVy-=oGanKg=FLg5OoSw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hello

2013/3/19 Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>:
> I wrote:
>> [ looks at patch... ] Oh, I see what's affecting the plan: you changed
>> the aggtranstypes to internal for a bunch of aggregates. That's not
>> very good, because right now the planner takes that to mean that the
>> aggregate could eat a lot of space. We don't want that to happen for
>> these aggregates, I think.
>
> After thinking about that for awhile: if we pursue this type of
> optimization, what would probably be appropriate is to add an aggregate
> property (stored in pg_aggregate) that allows direct specification of
> the size that the planner should assume for the aggregate's transition
> value. We were getting away with a hardwired assumption of 8K for
> "internal" because the existing aggregates that used that transtype all
> had similar properties, but it was always really a band-aid not a proper
> solution. A per-aggregate override could be useful in other cases too.
>
> This was looking like 9.4 material already, but adding such a property
> would definitely put it over the top of what we could think about
> squeezing into 9.3, IMO.
>

Postgres is not a "in memory" OLAP database, but lot of companies use
it for OLAP queries due pg comfortable usage. This feature can be very
interesting for these users - and can introduce interesting speedup
with relative low price.

Regards

Pavel

> regards, tom lane


From: Hadi Moshayedi <hadi(at)moshayedi(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Kevin Grittner <kgrittn(at)ymail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, "mark(dot)kirkwood(at)catalyst(dot)net(dot)nz" <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>
Subject: Re: Improving avg performance for numeric
Date: 2013-03-20 16:44:54
Message-ID: CAK=1=Wpn=jc3kULtiXXLi6D5nR4P8DvnGb90kXRRdhgMiObu9g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi Tom,

Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> After thinking about that for awhile: if we pursue this type of
> optimization, what would probably be appropriate is to add an aggregate
> property (stored in pg_aggregate) that allows direct specification of
> the size that the planner should assume for the aggregate's transition
> value. We were getting away with a hardwired assumption of 8K for
> "internal" because the existing aggregates that used that transtype all
> had similar properties, but it was always really a band-aid not a proper
> solution. A per-aggregate override could be useful in other cases too.

Cool.

I created a patch which adds an aggregate property to pg_aggregate, so
the transition space is can be overridden. This patch doesn't contain
the numeric optimizations. It uses "0" (meaning not-set) for all
existing aggregates.

I manual-tested it a bit, by changing this value for aggregates and
observing the changes in plan. I also updated some docs and pg_dump.

Does this look like something along the lines of what you meant?

Thanks,
-- Hadi

Attachment Content-Type Size
aggregate-transspace.patch application/octet-stream 29.3 KB

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Hadi Moshayedi <hadi(at)moshayedi(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <kgrittn(at)ymail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, "mark(dot)kirkwood(at)catalyst(dot)net(dot)nz" <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>
Subject: Re: Improving avg performance for numeric
Date: 2013-07-03 06:45:27
Message-ID: CAFj8pRC44Rw2s88vF=MGdirTpo1GLRnDKZHiVj6=E0iQn+9rJg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hello

2013/3/20 Hadi Moshayedi <hadi(at)moshayedi(dot)net>:
> Hi Tom,
>
> Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> After thinking about that for awhile: if we pursue this type of
>> optimization, what would probably be appropriate is to add an aggregate
>> property (stored in pg_aggregate) that allows direct specification of
>> the size that the planner should assume for the aggregate's transition
>> value. We were getting away with a hardwired assumption of 8K for
>> "internal" because the existing aggregates that used that transtype all
>> had similar properties, but it was always really a band-aid not a proper
>> solution. A per-aggregate override could be useful in other cases too.
>
> Cool.
>
> I created a patch which adds an aggregate property to pg_aggregate, so
> the transition space is can be overridden. This patch doesn't contain
> the numeric optimizations. It uses "0" (meaning not-set) for all
> existing aggregates.
>
> I manual-tested it a bit, by changing this value for aggregates and
> observing the changes in plan. I also updated some docs and pg_dump.
>
> Does this look like something along the lines of what you meant?

please, can you subscribe your patch to next commitfest?

I tested this patch, and it increase performance about 20% what is
interesting. More - it allows more comfortable custom aggregates for
custom types with better hash agg support.

Regards

Pavel

>
> Thanks,
> -- Hadi


From: Hadi Moshayedi <hadi(at)moshayedi(dot)net>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <kgrittn(at)ymail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, "mark(dot)kirkwood(at)catalyst(dot)net(dot)nz" <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>
Subject: Re: Improving avg performance for numeric
Date: 2013-07-08 04:14:28
Message-ID: CAK=1=Wr0_mJoTXgOTHZhnEQCspGihYwZE_ZG9HHo=Oq_tPUYTg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I am attaching the updated the patch, which also fixes a bug which
caused one of the regression tests failed.

I'll subscribe this patch to the commitfest in the next hour.

Can you please review the patch?

Thanks,
-- Hadi

Attachment Content-Type Size
numeric-optimize-v3.patch application/octet-stream 47.9 KB

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Hadi Moshayedi <hadi(at)moshayedi(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <kgrittn(at)ymail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, "mark(dot)kirkwood(at)catalyst(dot)net(dot)nz" <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>
Subject: Re: Improving avg performance for numeric
Date: 2013-07-08 09:13:50
Message-ID: CAFj8pRArJtJGih0MstA-G+Hz-ONRrEDKiwr9rqSQFUm-SdMNJg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2013/7/8 Hadi Moshayedi <hadi(at)moshayedi(dot)net>:
> I am attaching the updated the patch, which also fixes a bug which
> caused one of the regression tests failed.
>
> I'll subscribe this patch to the commitfest in the next hour.
>
> Can you please review the patch?

sure, :)

Pavel

>
> Thanks,
> -- Hadi


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Hadi Moshayedi <hadi(at)moshayedi(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <kgrittn(at)ymail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, "mark(dot)kirkwood(at)catalyst(dot)net(dot)nz" <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>
Subject: Re: Improving avg performance for numeric
Date: 2013-07-08 14:05:31
Message-ID: CAFj8pRDvpB4NJkUx+iTxVfMcxD3O9KxB6vYTAffh5kdicoDfEw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hello

I am testing your code, and It increase speed of sum about 24% faster
then original implementation.

But I am surprise of AVG speed - it should have same speed like sum in
new implementation, but it is 2x slower, than sum - what signalize
some strange and there is used wrong transition function

I am sending fixed version

postgres=# create table bubu(a int, b float, c numeric);
CREATE TABLE
postgres=# insert into bubu select i, i+1, i+1.122 from
generate_series(1,1000000) g(i);
INSERT 0 1000000

After fixing a speed of sum and avg for numeric is similar

postgres=# select avg(c) from bubu;
avg
---------------------
500001.622000000000
(1 row)

Time: 228.483 ms
postgres=# select sum(c) from bubu;
sum
------------------
500001622000.000
(1 row)

Time: 222.791 ms

Regards

Pavel

2013/7/8 Hadi Moshayedi <hadi(at)moshayedi(dot)net>:
> I am attaching the updated the patch, which also fixes a bug which
> caused one of the regression tests failed.
>
> I'll subscribe this patch to the commitfest in the next hour.
>
> Can you please review the patch?
>
> Thanks,
> -- Hadi

Attachment Content-Type Size
numeric-optimize-v4.patch application/octet-stream 53.9 KB

From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Hadi Moshayedi <hadi(at)moshayedi(dot)net>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <kgrittn(at)ymail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, "mark(dot)kirkwood(at)catalyst(dot)net(dot)nz" <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>
Subject: Re: Improving avg performance for numeric
Date: 2013-07-08 17:17:56
Message-ID: 51DAF444.5000709@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 07/07/2013 09:14 PM, Hadi Moshayedi wrote:
> I am attaching the updated the patch, which also fixes a bug which
> caused one of the regression tests failed.
>
> I'll subscribe this patch to the commitfest in the next hour.
>
> Can you please review the patch?

I'm afraid that, since this patch wasn't included anywhere near the
first week of the CommitFest, I can't possibly include it in the June
commitfest now. Accordingly, I have moved it to the September
commitfest. Hopefully someone can look at it before then.

Sorry for missing this in my "patch sweep" at the beginning of the CF.
Searching email for patches is, at best, inexact.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: Hadi Moshayedi <hadi(at)moshayedi(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <kgrittn(at)ymail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, "mark(dot)kirkwood(at)catalyst(dot)net(dot)nz" <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>
Subject: Re: Improving avg performance for numeric
Date: 2013-07-08 17:32:26
Message-ID: CAFj8pRAF5Yx=MzZR=pEv=OD5OSgqoinsuanHnxS8R9zd2o--ZA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hello

2013/7/8 Josh Berkus <josh(at)agliodbs(dot)com>:
> On 07/07/2013 09:14 PM, Hadi Moshayedi wrote:
>> I am attaching the updated the patch, which also fixes a bug which
>> caused one of the regression tests failed.
>>
>> I'll subscribe this patch to the commitfest in the next hour.
>>
>> Can you please review the patch?
>
> I'm afraid that, since this patch wasn't included anywhere near the
> first week of the CommitFest, I can't possibly include it in the June
> commitfest now. Accordingly, I have moved it to the September
> commitfest. Hopefully someone can look at it before then.
>
> Sorry for missing this in my "patch sweep" at the beginning of the CF.
> Searching email for patches is, at best, inexact.
>

sure, it should be in September CF. It is relative simple patch
without global impacts. But I like it, it increase speed for
sum(numeric) about 25% and avg(numeric) about 50%

Regards

Pavel

> --
> Josh Berkus
> PostgreSQL Experts Inc.
> http://pgexperts.com


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Hadi Moshayedi <hadi(at)moshayedi(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <kgrittn(at)ymail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, "mark(dot)kirkwood(at)catalyst(dot)net(dot)nz" <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>
Subject: Re: Improving avg performance for numeric
Date: 2013-07-08 17:37:40
Message-ID: 51DAF8E4.20402@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Pavel,

>
> sure, it should be in September CF. It is relative simple patch
> without global impacts. But I like it, it increase speed for
> sum(numeric) about 25% and avg(numeric) about 50%

Do you think you could give this a review after CF1 ends, but before
September? I hate to make Hadi wait just because I didn't see his patch.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: Hadi Moshayedi <hadi(at)moshayedi(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <kgrittn(at)ymail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, "mark(dot)kirkwood(at)catalyst(dot)net(dot)nz" <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>
Subject: Re: Improving avg performance for numeric
Date: 2013-07-08 17:54:42
Message-ID: CAFj8pRDQLPuG3_ZOO83bkxfhkwhnmWzemRrR=6JpjAJGE6cMtw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2013/7/8 Josh Berkus <josh(at)agliodbs(dot)com>:
> Pavel,
>
>>
>> sure, it should be in September CF. It is relative simple patch
>> without global impacts. But I like it, it increase speed for
>> sum(numeric) about 25% and avg(numeric) about 50%
>
> Do you think you could give this a review after CF1 ends, but before
> September? I hate to make Hadi wait just because I didn't see his patch.

yes, I can.

Regards

Pavel

>
> --
> Josh Berkus
> PostgreSQL Experts Inc.
> http://pgexperts.com


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Hadi Moshayedi <hadi(at)moshayedi(dot)net>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Improving avg performance for numeric
Date: 2013-08-26 20:10:47
Message-ID: CAFj8pRDUFoa1wMc7vGFeQogvm3Og4Kq+kku-svtX9ZO8L5Yj_A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hello

here is a rebased patch. Hadi, please, can verify this version?

Regards

Pavel

p.s. Performance tests

postgres=# create table foo(a int, b float, c double precision, d numeric,
gr int);
CREATE TABLE
postgres=#
postgres=# insert into foo select 1, 2.0, 3.0, 3.14, random()*10000 from
generate_series(1,10000000);

postgres=# \d foo
Table "public.foo"
Column | Type | Modifiers
--------+------------------+-----------
a | integer |
b | double precision |
c | double precision |
d | numeric |
gr | integer |

set work_mem to '2MB';

postgres=# show debug_assertions;
debug_assertions
------------------
off
(1 row)

postgres=# explain (analyze, timing off) select sum(a) from foo;
QUERY
PLAN
---------------------------------------------------------------------------------------------------
Aggregate (cost=208332.23..208332.24 rows=1 width=4) (actual rows=1
loops=1)
-> Seq Scan on foo (cost=0.00..183332.58 rows=9999858 width=4) (actual
rows=10000000 loops=1)
Total runtime: 1210.321 ms (1195.117 ms) -- patched (original)
(3 rows)

Time: 1210.709 ms
postgres=# explain (analyze, timing off) select sum(a) from foo group by gr;
QUERY
PLAN
---------------------------------------------------------------------------------------------------
HashAggregate (cost=233331.87..233431.71 rows=9984 width=8) (actual
rows=10001 loops=1)
-> Seq Scan on foo (cost=0.00..183332.58 rows=9999858 width=8) (actual
rows=10000000 loops=1)
Total runtime: 2923.987 ms (2952.292 ms)
(3 rows)

Time: 2924.384 ms

postgres=# explain (analyze, timing off) select avg(a) from foo;
QUERY
PLAN
---------------------------------------------------------------------------------------------------
Aggregate (cost=208332.23..208332.24 rows=1 width=4) (actual rows=1
loops=1)
-> Seq Scan on foo (cost=0.00..183332.58 rows=9999858 width=4) (actual
rows=10000000 loops=1)
Total runtime: 1331.627 ms (1312.140 ms)
(3 rows)

postgres=# explain (analyze, timing off) select avg(a) from foo group by gr;
QUERY
PLAN
---------------------------------------------------------------------------------------------------
HashAggregate (cost=233331.87..233456.67 rows=9984 width=8) (actual
rows=10001 loops=1)
-> Seq Scan on foo (cost=0.00..183332.58 rows=9999858 width=8) (actual
rows=10000000 loops=1)
Total runtime: 3139.296 ms (3079.479 ms)
(3 rows)

postgres=# explain (analyze, timing off) select sum(b) from foo;
QUERY
PLAN
---------------------------------------------------------------------------------------------------
Aggregate (cost=208332.23..208332.24 rows=1 width=8) (actual rows=1
loops=1)
-> Seq Scan on foo (cost=0.00..183332.58 rows=9999858 width=8) (actual
rows=10000000 loops=1)
Total runtime: 1327.841 ms (1339.214 ms)
(3 rows)

postgres=# explain (analyze, timing off) select sum(b) from foo group by gr;
QUERY
PLAN
----------------------------------------------------------------------------------------------------
HashAggregate (cost=233331.87..233431.71 rows=9984 width=12) (actual
rows=10001 loops=1)
-> Seq Scan on foo (cost=0.00..183332.58 rows=9999858 width=12)
(actual rows=10000000 loops=1)
Total runtime: 3047.893 ms (3095.591 ms)
(3 rows)

postgres=# explain (analyze, timing off) select avg(b) from foo;
QUERY
PLAN
---------------------------------------------------------------------------------------------------
Aggregate (cost=208332.23..208332.24 rows=1 width=8) (actual rows=1
loops=1)
-> Seq Scan on foo (cost=0.00..183332.58 rows=9999858 width=8) (actual
rows=10000000 loops=1)
Total runtime: 1454.665 ms (1471.413 ms)
(3 rows)

postgres=# explain (analyze, timing off) select avg(b) from foo group by gr;
QUERY
PLAN
----------------------------------------------------------------------------------------------------
HashAggregate (cost=233331.87..233456.67 rows=9984 width=12) (actual
rows=10001 loops=1)
-> Seq Scan on foo (cost=0.00..183332.58 rows=9999858 width=12)
(actual rows=10000000 loops=1)
Total runtime: 3282.838 ms (3187.157 ms)
(3 rows)

postgres=# explain (analyze, timing off) select sum(c) from foo;
QUERY
PLAN
---------------------------------------------------------------------------------------------------
Aggregate (cost=208332.23..208332.24 rows=1 width=8) (actual rows=1
loops=1)
-> Seq Scan on foo (cost=0.00..183332.58 rows=9999858 width=8) (actual
rows=10000000 loops=1)
Total runtime: 1348.555 ms (1364.585 ms)
(3 rows)

postgres=# explain (analyze, timing off) select sum(c) from foo group by gr;
QUERY
PLAN
----------------------------------------------------------------------------------------------------
HashAggregate (cost=233331.87..233431.71 rows=9984 width=12) (actual
rows=10001 loops=1)
-> Seq Scan on foo (cost=0.00..183332.58 rows=9999858 width=12)
(actual rows=10000000 loops=1)
Total runtime: 3028.663 ms (3069.710 ms)
(3 rows)

postgres=# explain (analyze, timing off) select avg(c) from foo;
QUERY
PLAN
---------------------------------------------------------------------------------------------------
Aggregate (cost=208332.23..208332.24 rows=1 width=8) (actual rows=1
loops=1)
-> Seq Scan on foo (cost=0.00..183332.58 rows=9999858 width=8) (actual
rows=10000000 loops=1)
Total runtime: 1488.980 ms (1463.813 ms)
(3 rows)

postgres=# explain (analyze, timing off) select avg(c) from foo group by gr;
QUERY
PLAN
----------------------------------------------------------------------------------------------------
HashAggregate (cost=233331.87..233456.67 rows=9984 width=12) (actual
rows=10001 loops=1)
-> Seq Scan on foo (cost=0.00..183332.58 rows=9999858 width=12)
(actual rows=10000000 loops=1)
Total runtime: 3252.972 ms (3149.986 ms)
(3 rows)

postgres=# explain (analyze, timing off) select sum(d) from foo;
QUERY
PLAN
---------------------------------------------------------------------------------------------------
Aggregate (cost=208332.23..208332.24 rows=1 width=7) (actual rows=1
loops=1)
-> Seq Scan on foo (cost=0.00..183332.58 rows=9999858 width=7) (actual
rows=10000000 loops=1)
Total runtime: 2301.769 ms (2784.430 ms)
(3 rows)

postgres=# explain (analyze, timing off) select sum(d) from foo group by gr;
QUERY
PLAN
----------------------------------------------------------------------------------------------------
HashAggregate (cost=233331.87..233456.67 rows=9984 width=11) (actual
rows=10001 loops=1)
-> Seq Scan on foo (cost=0.00..183332.58 rows=9999858 width=11)
(actual rows=10000000 loops=1)
Total runtime: 4189.272 ms (4440.335 ms)
(3 rows)

postgres=# explain (analyze, timing off) select avg(d) from foo;
QUERY
PLAN
---------------------------------------------------------------------------------------------------
Aggregate (cost=208332.23..208332.24 rows=1 width=7) (actual rows=1
loops=1)
-> Seq Scan on foo (cost=0.00..183332.58 rows=9999858 width=7) (actual
rows=10000000 loops=1)
Total runtime: 2308.493 ms (5195.970 ms)
(3 rows)

postgres=# explain (analyze, timing off) select avg(d) from foo group by gr;
QUERY
PLAN
----------------------------------------------------------------------------------------------------
HashAggregate (cost=233331.87..233456.67 rows=9984 width=11) (actual
rows=10001 loops=1)
-> Seq Scan on foo (cost=0.00..183332.58 rows=9999858 width=11)
(actual rows=10000000 loops=1)
Total runtime: 4179.978 ms (6828.398 ms)
(3 rows)

int, float, double 26829 ms (26675 ms) -- 0.5% slower .. statistic error ..
cleaner code
numeric sum 6490 ms (7224 ms) -- 10% faster
numeric avg 6487 ms (12023 ms) -- 46% faster

2013/8/22 Hadi Moshayedi <hadi(at)moshayedi(dot)net>

> Hello Pavel,
>
> > > Do you think you could give this a review after CF1 ends, but before
> > > September? I hate to make Hadi wait just because I didn't see his
> patch.
> >
> > yes, I can.
>
> When do you think you will have time to review this patch?
>
> Thanks,
> -- Hadi
>

Attachment Content-Type Size
numeric-optimize-v5.patch.gz application/x-gzip 9.1 KB

From: Hadi Moshayedi <hadi(at)moshayedi(dot)net>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Improving avg performance for numeric
Date: 2013-08-29 05:09:15
Message-ID: CAK=1=Wqo+6GwEhiiTjkJyrTrRvdj9nEBS+ygWW=i0-a7hmN=uQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hello,

> int, float, double 26829 ms (26675 ms) -- 0.5% slower .. statistic error ..
> cleaner code
> numeric sum 6490 ms (7224 ms) -- 10% faster
> numeric avg 6487 ms (12023 ms) -- 46% faster

I also got very similar results.

On the other hand, initially I was receiving sigsegv's whenever I
wanted to try to run an aggregate function. gdb was telling that this
was happening somewhere in nodeAgg.c at ExecInitAgg. While trying to
find the reason, I had to reboot my computer at some point, after the
reboot the sigsegv's went away. I want to look into this and find the
reason, I think I have missed something here. Any thoughts about why
this would happen?

--Hadi


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Hadi Moshayedi <hadi(at)moshayedi(dot)net>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Improving avg performance for numeric
Date: 2013-08-29 05:48:45
Message-ID: CAFj8pRD14a4nW1_5UE2cQt6nSFOkczfpjjxt79rBSzv9Z3TvdQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2013/8/29 Hadi Moshayedi <hadi(at)moshayedi(dot)net>

> Hello,
>
> > int, float, double 26829 ms (26675 ms) -- 0.5% slower .. statistic error
> ..
> > cleaner code
> > numeric sum 6490 ms (7224 ms) -- 10% faster
> > numeric avg 6487 ms (12023 ms) -- 46% faster
>
> I also got very similar results.
>
> On the other hand, initially I was receiving sigsegv's whenever I
> wanted to try to run an aggregate function. gdb was telling that this
> was happening somewhere in nodeAgg.c at ExecInitAgg. While trying to
> find the reason, I had to reboot my computer at some point, after the
> reboot the sigsegv's went away. I want to look into this and find the
> reason, I think I have missed something here. Any thoughts about why
> this would happen?
>

I found a few bugs, that I fixed. There was a issue with empty sets. Other
issues I didn't find.

Regards

Pavel

>
> --Hadi
>


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Hadi Moshayedi <hadi(at)moshayedi(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <kgrittn(at)ymail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, "mark(dot)kirkwood(at)catalyst(dot)net(dot)nz" <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>
Subject: Re: Improving avg performance for numeric
Date: 2013-09-04 16:06:35
Message-ID: 52275A8B.7060706@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 7/8/13 10:05 AM, Pavel Stehule wrote:
> I am testing your code, and It increase speed of sum about 24% faster
> then original implementation.

This patch needs to be rebased (and/or the later version registered in
the commit fest).


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: Hadi Moshayedi <hadi(at)moshayedi(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <kgrittn(at)ymail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, "mark(dot)kirkwood(at)catalyst(dot)net(dot)nz" <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>
Subject: Re: Improving avg performance for numeric
Date: 2013-09-04 18:26:11
Message-ID: CAFj8pRDMKmJOSOXES9yT_zLEM2xxKh6YxqkAkTxX1Xh06z5gSQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2013/9/4 Peter Eisentraut <peter_e(at)gmx(dot)net>

> On 7/8/13 10:05 AM, Pavel Stehule wrote:
> > I am testing your code, and It increase speed of sum about 24% faster
> > then original implementation.
>
> This patch needs to be rebased (and/or the later version registered in
> the commit fest).
>
>
I updated a commit fest info

Regards

Pavel


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Hadi Moshayedi <hadi(at)moshayedi(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <kgrittn(at)ymail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, "mark(dot)kirkwood(at)catalyst(dot)net(dot)nz" <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>
Subject: Re: Improving avg performance for numeric
Date: 2013-09-04 21:11:39
Message-ID: 5227A20B.6070603@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 9/4/13 2:26 PM, Pavel Stehule wrote:
>
>
>
> 2013/9/4 Peter Eisentraut <peter_e(at)gmx(dot)net <mailto:peter_e(at)gmx(dot)net>>
>
> On 7/8/13 10:05 AM, Pavel Stehule wrote:
> > I am testing your code, and It increase speed of sum about 24% faster
> > then original implementation.
>
> This patch needs to be rebased (and/or the later version registered in
> the commit fest).
>
>
> I updated a commit fest info

The new patch also needs to be rebased.


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: Hadi Moshayedi <hadi(at)moshayedi(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <kgrittn(at)ymail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, "mark(dot)kirkwood(at)catalyst(dot)net(dot)nz" <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>
Subject: Re: Improving avg performance for numeric
Date: 2013-09-05 08:42:29
Message-ID: CAFj8pRDQhG7Pqmf8XqXY0PnHfakkPQLPHnoRLJ_=EKFSbOAWeA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2013/9/4 Peter Eisentraut <peter_e(at)gmx(dot)net>

> On 9/4/13 2:26 PM, Pavel Stehule wrote:
> >
> >
> >
> > 2013/9/4 Peter Eisentraut <peter_e(at)gmx(dot)net <mailto:peter_e(at)gmx(dot)net>>
> >
> > On 7/8/13 10:05 AM, Pavel Stehule wrote:
> > > I am testing your code, and It increase speed of sum about 24%
> faster
> > > then original implementation.
> >
> > This patch needs to be rebased (and/or the later version registered
> in
> > the commit fest).
> >
> >
> > I updated a commit fest info
>
> The new patch also needs to be rebased.
>
>
>
> rebased

Attachment Content-Type Size
numeric-optimize-v6.patch application/octet-stream 56.8 KB