From: | Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> |
---|---|
To: | Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Reduce palloc's in numeric operations. |
Date: | 2012-09-19 12:20:10 |
Message-ID: | 5059B87A.2070305@vmware.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 14.09.2012 11:25, Kyotaro HORIGUCHI wrote:
> Hello, I will propose reduce palloc's in numeric operations.
>
> The numeric operations are slow by nature, but usually it is not
> a problem for on-disk operations. Altough the slowdown is
> enhanced on on-memory operations.
>
> I inspcted them and found some very short term pallocs. These
> palloc's are used for temporary storage for digits of unpaked
> numerics.
>
> The formats of numeric digits in packed and unpaked forms are
> same. So we can kicked out a part of palloc's using digits in
> packed numeric in-place to make unpakced one.
>
> In this patch, I added new function set_var_from_num_nocopy() to
> do this. And make use of it for operands which won't modified.
Have to be careful to really not modify the operands. numeric_out() and
numeric_out_sci() are wrong; they call get_str_from_var(), which
modifies the argument. Same with numeric_expr(): it passes the argument
to numericvar_to_double_no_overflow(), which passes it to
get_str_from_var(). numericvar_to_int8() also modifies its argument, so
all the functions that use that, directly or indirectly, must make a copy.
Perhaps get_str_from_var(), and the other functions that currently
scribble on the arguments, should be modified to not do so. They could
easily make a copy of the argument within the function. Then the callers
could safely use set_var_from_num_nocopy(). The performance would be the
same, you would have the same number of pallocs, but you would get rid
of the surprising argument-modifying behavior of those functions.
> The performance gain seems quite moderate....
>
> 'SELECT SUM(numeric_column) FROM on_memory_table' for ten million
> rows and about 8 digits numeric runs for 3480 ms aganst original
> 3930 ms. It's 11% gain. 'SELECT SUM(int_column) FROM
> on_memory_table' needed 1570 ms.
>
> Similary 8% gain for about 30 - 50 digits numeric. Performance of
> avg(numeric) made no gain in contrast.
>
> Do you think this worth doing?
Yes, I think this is worthwhile. I'm seeing an even bigger gain, with
smaller numerics. I created a table with this:
CREATE TABLE numtest AS SELECT a::numeric AS col FROM generate_series(1,
10000000) a;
And repeated this query with \timing:
SELECT SUM(col) FROM numtest;
The execution time of that query fell from about 5300 ms to 4300 ms, ie.
about 20%.
- Heikki
From | Date | Subject | |
---|---|---|---|
Next Message | Heikki Linnakangas | 2012-09-19 12:32:03 | Re: ToDo: allow to get a number of processed rows by COPY statement |
Previous Message | Shigeru HANADA | 2012-09-19 09:51:13 | Re: proposal - assign result of query to psql variable |