Re: Performance Improvement by reducing WAL for Update Operation

From: Amit Kapila <amit(dot)kapila(at)huawei(dot)com>
To: "'Heikki Linnakangas'" <hlinnakangas(at)vmware(dot)com>
Cc: "'Craig Ringer'" <craig(at)2ndquadrant(dot)com>, <simon(at)2ndquadrant(dot)com>, "'Alvaro Herrera'" <alvherre(at)2ndquadrant(dot)com>, <noah(at)leadboat(dot)com>, <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Performance Improvement by reducing WAL for Update Operation
Date: 2013-03-29 11:14:39
Message-ID: 009001ce2c6e$9bea4790$d3bed6b0$@kapila@huawei.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wednesday, March 13, 2013 5:50 PM Amit Kapila wrote:
> On Friday, March 08, 2013 9:22 PM Amit Kapila wrote:
> > On Wednesday, March 06, 2013 2:57 AM Heikki Linnakangas wrote:
> > > On 04.03.2013 06:39, Amit Kapila wrote:
> > > > On Sunday, March 03, 2013 8:19 PM Craig Ringer wrote:
> > > >> On 02/05/2013 11:53 PM, Amit Kapila wrote:
> > > >>>> Performance data for the patch is attached with this mail.
> > > >>>> Conclusions from the readings (these are same as my previous
> > > patch):
> > > >>>>
> > >
> > > I've been doing investigating the pglz option further, and doing
> > > performance comparisons of the pglz approach and this patch. I'll
> > > begin with some numbers:
> > >
> >
> > Based on your patch, I have tried some more optimizations:
> >

Based on numbers provided by Daniel for compression methods, I tried Snappy
Algorithm for encoding
and it addresses most of your concerns that it should not degrade
performance for majority cases.

postgres orginal:

testname | wal_generated | duration
-----------------------------------------+---------------+------------------
two short fields, no change | 1232916160 | 34.0338308811188
two short fields, one changed | 1232909704 | 32.8722319602966
two short fields, both changed | 1236770128 | 35.445415019989
one short and one long field, no change | 1053000144 | 23.2983899116516
ten tiny fields, all changed | 1397452584 | 40.2718069553375
hundred tiny fields, first 10 changed | 622082664 | 21.7642788887024
hundred tiny fields, all changed | 626461528 | 20.964781999588
hundred tiny fields, half changed | 621900472 | 21.6473519802094
hundred tiny fields, half nulled | 557714752 | 19.0088789463043
(9 rows)

postgres encode wal using snappy:

testname | wal_generated | duration
-----------------------------------------+---------------+------------------
two short fields, no change | 1232915128 | 34.6910920143127
two short fields, one changed | 1238902520 | 34.2287850379944
two short fields, both changed | 1233882056 | 35.3292708396912
one short and one long field, no change | 733095168 | 20.3494939804077
ten tiny fields, all changed | 1314959744 | 38.969575881958
hundred tiny fields, first 10 changed | 483275136 | 19.6973309516907
hundred tiny fields, all changed | 481755280 | 19.7665288448334
hundred tiny fields, half changed | 488693616 | 19.7246761322021
hundred tiny fields, half nulled | 483425712 | 18.6299569606781
(9 rows)

Changes are to call snappy compress and decompress for encoding and decoding
in patch.
I am doing encoding for tup length greater than 32, as for too small tuples
it might not make much sense for encoding.

On my m/c while using snapy compress/decompress, it was giving stack
corruption for first 4 bytes, so I put below fix to proceed.
I am looking into reason of same.
1. snappy_compress - Increment the encoded data buffer with 4 bytes before
encryption starts.
2. snappy_uncompress - Decrement the 4 bytes increment done during compress.

3. snappy_uncompressed_length - Decrement the 4 bytes increment done during
compress.

For LZ compression patch, there was small problem in identifying max length
which I have corrected in separate patch
'pglz-with-micro-optimizations-4.patch'

In my opinion, there can be following ways for this patch:
1. Use LZ compression, but provide a way to user so that it can be avoided
for cases where much compression is not possible.
I see this as a viable way because most updates will update only have few
columns and rest data would be same.
2. Use snappy API's, do anyone know of standard library of snappy?
3. Provide multiple compression ways, so depending on usage, user can use
appropriate one.

Feedback?

With Regards,
Amit Kapila.

Attachment Content-Type Size
snappy_algo_v1.patch application/octet-stream 43.2 KB
wal_update_snappy_concat_oldandnew_tuple_v1.patch application/octet-stream 20.5 KB
pglz-with-micro-optimizations-4.patch application/octet-stream 38.0 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2013-03-29 12:59:31 Re: Changing recovery.conf parameters into GUCs
Previous Message Dimitri Fontaine 2013-03-29 09:44:00 Re: in-catalog Extension Scripts and Control parameters (templates?)