Re: Performance Improvement by reducing WAL for Update Operation

From: Amit Kapila <amit(dot)kapila(at)huawei(dot)com>
To: "'Amit Kapila'" <amit(dot)kapila(at)huawei(dot)com>, "'Heikki Linnakangas'" <hlinnakangas(at)vmware(dot)com>
Cc: "'Craig Ringer'" <craig(at)2ndquadrant(dot)com>, <simon(at)2ndquadrant(dot)com>, "'Alvaro Herrera'" <alvherre(at)2ndquadrant(dot)com>, <noah(at)leadboat(dot)com>, <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Performance Improvement by reducing WAL for Update Operation
Date: 2013-03-13 12:20:26
Message-ID: 004601ce1fe5$257bb840$707328c0$@kapila@huawei.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Friday, March 08, 2013 9:22 PM Amit Kapila wrote:
> On Wednesday, March 06, 2013 2:57 AM Heikki Linnakangas wrote:
> > On 04.03.2013 06:39, Amit Kapila wrote:
> > > On Sunday, March 03, 2013 8:19 PM Craig Ringer wrote:
> > >> On 02/05/2013 11:53 PM, Amit Kapila wrote:
> > >>>> Performance data for the patch is attached with this mail.
> > >>>> Conclusions from the readings (these are same as my previous
> > patch):
> > >>>>
> >
> > I've been doing investigating the pglz option further, and doing
> > performance comparisons of the pglz approach and this patch. I'll
> > begin with some numbers:
> >
>
> Based on your patch, I have tried some more optimizations:
>
> Fixed bug in your patch (pglz-with-micro-optimizations-2):
> 1. There were some problems in recovery due to wrong length of oldtuple
> passed in decode which I have corrected.
>
> Approach -1 (pglz-with-micro-optimizations-2_roll10_32)
> 1. Move strategy min length (32) check in log_heap_update 2. Rolling 10
> for hash as suggested by you is added.
>
> Approach -2 (pglz-with-micro-optimizations-2_roll10_32_1hashkey)
> 1. This is done on top of Approach-1 changes 2. Used 1 byte data as the
> hash key.
>
> Approach-3
> (pglz-with-micro-optimizations-2_roll10_32_1hashkey_batch_literal)
> 1. This is done on top of Approach-1 and Approach-2 changes 2. Instead
> of doing copy of literal byte when it founds as non match with history,
> do all in a batch.
>
> Data for all above approaches is in attached file "test_readings"
> (Apart from your tests, I have added one more test " hundred tiny
> fields, first 10
> changed")
>
> Summary -
> After changes of Approach-1, CPU utilization for all except 2 tests
> ("hundred tiny fields, all changed", "hundred tiny fields, half
> changed") is either same or less. The best case CPU utilization has
> decreased (which is better), but WAL reduction has little bit increased
> (which is as per expectation due 10 consecutive rollup's).
>
> Approach-2 modifications was done to see if there is any overhead of
> hash calculation.
> Approach-2 & Approach-3 doesn't result into any improvements.
>
> I have investigated the reason for CPU utilization for 2 tests and the
> reason is that there is nothing to compress in the new tuple and that
> information it will come to know only after it processes 75%
> (compression
> ratio) of tuple bytes.
> I think any compression algorithm will have this drawback that if data
> is not compressible, it can consume time inspite of the fact that it
> will not be able to compress the data.
> I think most updates will update some part of tuple which will always
> yield positive results.
>
> Apart from above tests, I have run your patch against my old tests, it
> yields quite positive results, WAL Reduction is more as compare to my
> patch and CPU utilization is almost similar or my patch is slightly
> better.
> The results are in attached file "pgbench_pg_lz_mod"
>
> The above all data is for synchronous_commit = off. I can collect the
> data for synchronous_commit = on and Performance of recovery.

Data for synchronous_commit = on is as follows:

Find the data for heikki's test in file "test_readings_on.txt"

Result and observation is same as for synchronous_commit =off. In short,
Approach-1
as mentioned in above mail seems to be best.

Find the data for pg_bench based test's used in my previous tests in
"pgbench_pg_lz_mod_sync_commit_on.htm"
This has been done for Heikki's original patch and Approach-1.
It shows that there is very minor cpu dip (0.1%) in some cases and WAL
Reduction of (2~3%).
WAL reduction is not much as operations performed are less.

Recovery Performance
----------------------
pgbench org:

./pgbench -i -s 75 -F 80 postgres
./pgbench -c 4 -j 4 -T 600 postgres

pgbench 1800(rec size=1800):

./pgbench -i -s 10 -F 80 postgres
./pgbench -c 4 -j 4 -T 600 postgres

Recovery benchmark:

postgres org postgres pg lz
optimization
Recovery(sec) Recovery(sec)
pgbench org 11 11
pgbench 1800 16 11

This shows that with your patch recovery performance is also improved.

There is one more defect in recovery which is fixed in attached patch
pglz-with-micro-optimizations-3.patch.
In pglz_find_match(), it was going beyond maxlen for comparision due to
which encoded data was not properly written to WAL.

Finally, as per my work further to your patch, the best patch will be by
fixing recovery defects and changes for Approach-1.

With Regards,
Amit Kapila.

Attachment Content-Type Size
test_readings_on.txt text/plain 4.4 KB
pgbench_pg_lz_mod_sync_commit_on.htm text/html 82.5 KB
pglz-with-micro-optimizations-3.patch application/octet-stream 38.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2013-03-13 12:34:44 Re: Writable foreign tables: how to identify rows
Previous Message Robert Haas 2013-03-13 12:17:13 Re: Duplicate JSON Object Keys