Re: Improving avg performance for numeric

From: Hadi Moshayedi <hadi(at)moshayedi(dot)net>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, mark(dot)kirkwood(at)catalyst(dot)net(dot)nz
Subject: Re: Improving avg performance for numeric
Date: 2013-03-19 06:43:23
Message-ID: CAK=1=WoQLNiGF0WL9E821iN_m1mAFK0MUJFzuhO_3uCOzJEMDg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello,

I updated the patch by taking ideas from your patch, and unifying the
transition struct and update function for different aggregates. The speed
of avg improved even more. It now has 60% better performance than the
current committed version.

This patch optimizes numeric/int8 sum, avg, stddev_pop, stddev_samp,
var_pop, var_samp.

I also noticed that this patch makes matview test fail. It seems that it
just changes the ordering of rows for queries like "SELECT * FROM tv;".
Does this seem like a bug in my patch, or should we add "ORDER BY" clauses
to this test to make it more deterministic?

I also agree with you that adding sum functions to use preallocated buffers
will make even more optimization. I'll try to see if I can find a simple
way to do this.

Thanks,
-- Hadi

On Mon, Mar 18, 2013 at 5:25 PM, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>wrote:

> Hello
>
> I played with sum(numeric) optimization
>
> Now it is based on generic numeric_add function - this code is
> relative old - now we can design aggregates with internal transition
> buffers, and probably we can do this work more effective.
>
> I just removed useles palloc/free operations and I got a 30% better
> performance! My patch is ugly - because I used a generic add_var
> function. Because Sum, Avg and almost all aggregates functions is
> limited by speed of sum calculation I thing so we need a new numeric
> routines optimized for calculation "sum", that use a only preallocated
> buffers. A speed of numeric is more important now, because there are
> more and more warehouses, where CPU is botleneck.
>
> Regards
>
> Pavel
>
>
> 2013/3/18 Hadi Moshayedi <hadi(at)moshayedi(dot)net>:
> > Hi Pavel,
> >
> > Thanks a lot for your feedback.
> >
> > I'll work more on this patch this week, and will send a more complete
> patch
> > later this week.
> >
> > I'll also try to see how much is the speed up of this method for other
> > types.
> >
> > Thanks,
> > -- Hadi
> >
> >
> > On Mon, Mar 18, 2013 at 10:36 AM, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com
> >
> > wrote:
> >>
> >> 2013/3/16 Hadi Moshayedi <hadi(at)moshayedi(dot)net>:
> >> > Revisiting:
> >> > http://www.postgresql.org/message-id/45661BE7.4050205@paradise.net.nz
> >> >
> >> > I think the reasons which the numeric average was slow were:
> >> > (1) Using Numeric for count, which is slower than int8 to increment,
> >> > (2) Constructing/deconstructing arrays at each transition step.
> >> >
> >> > This is also discussed at:
> >> >
> >> >
> http://www.citusdata.com/blog/53-postgres-performance-to-avg-or-to-sum-divided-by-count
> >> >
> >> > So, I think we can improve the speed of numeric average by keeping the
> >> > transition state as an struct in the aggregate context, and just
> passing
> >> > the
> >> > pointer to that struct from/to the aggregate transition function.
> >> >
> >> > The attached patch uses this method.
> >> >
> >> > I tested it using the data generated using:
> >> > CREATE TABLE avg_test AS SELECT (random() * 999)::decimal(5,2) as d
> FROM
> >> > generate_series(1, 10000000) s;
> >> >
> >> > After applying this patch, run time of "SELECT avg(d) FROM avg_test;"
> >> > improves from 10.701 seconds to 5.204 seconds, which seems to be a
> huge
> >> > improvement.
> >> >
> >> > I think we may also be able to use a similar method to improve
> >> > performance
> >> > of some other numeric aggregates (like stddev). But I want to know
> your
> >> > feedback first.
> >> >
> >> > Is this worth working on?
> >>
> >> I checked this patch and it has a interesting speedup - and a price of
> >> this methoud should not be limited to numeric type only
> >>
> >> Pavel
> >>
> >> >
> >> > Thanks,
> >> > -- Hadi
> >> >
> >> >
> >> >
> >> > --
> >> > Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> >> > To make changes to your subscription:
> >> > http://www.postgresql.org/mailpref/pgsql-hackers
> >> >
> >
> >
>

Attachment Content-Type Size
numeric-optimize-v2.patch application/octet-stream 24.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro HORIGUCHI 2013-03-19 07:54:23 Re: Re: proposal: a width specification for s specifier (format function), fix behave when positional and ordered placeholders are used
Previous Message Craig Ringer 2013-03-19 06:14:49 Re: [HACKERS] Trust intermediate CA for client certificates