Re: Performance Improvement by reducing WAL for Update Operation

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Greg Smith <greg(at)2ndquadrant(dot)com>, Hari Babu <haribabu(dot)kommi(at)huawei(dot)com>, Mike Blackwell <mike(dot)blackwell(at)rrd(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Performance Improvement by reducing WAL for Update Operation
Date: 2013-12-05 15:45:47
Message-ID: CAA4eK1JeUbY16uwrDA2TaBkk+rLRL3Giyyqy1mVh_6CThmDR8w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Nov 29, 2013 at 3:05 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Wed, Nov 27, 2013 at 9:31 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>> Sure, but to explore (a), the scope is bit bigger. We have below
>> options to explore (a):
>> 1. try to optimize existing algorithm as used in patch, which we have
>> tried but ofcourse we can spend some more time to see if anything more
>> can be tried out.
>> 2. try fingerprint technique as suggested by you above.
>> 3. try some other standard methods like vcdiff, lz4 etc.
>
> Well, obviously, I'm hot on idea #2 and think that would be worth
> spending some time on. If we can optimize the algorithm used in the
> patch some more (option #1), that would be fine, too, but the code
> looks pretty tight to me, so I'm not sure how successful that's likely
> to be. But if you have an idea, sure.

I have been experimenting chunk wise delta encoding (by using
technique similar to rabin fingerprint method) from last few days and
here are results of my investigation.

Performance Data
----------------------------
Non-default settings:
autovacuum =off
checkpoint_segments =128
checkpoint_timeout = 10min

unpatched

testname | wal_generated |
duration
-----------------------------------------+---------------+------------------
one short and one long field, no change | 1054921328 | 25.5855557918549
hundred tiny fields, all changed | 634483328 | 20.8992719650269
hundred tiny fields, half changed | 635948640 | 19.8670389652252
hundred tiny fields, half nulled | 571388552 |
18.9413228034973

lz-delta-encoding

testname | wal_generated | duration
-----------------------------------------+---------------+------------------
one short and one long field, no change | 662984384 | 21.7335519790649
hundred tiny fields, all changed | 633944320 | 24.1207830905914
hundred tiny fields, half changed | 633944344 | 24.4657719135284
hundred tiny fields, half nulled | 492200208 |
22.0337791442871

rabin-delta-encoding

testname | wal_generated | duration
-----------------------------------------+---------------+------------------
one short and one long field, no change | 662235752 | 20.1823079586029
hundred tiny fields, all changed | 633950080 | 22.0473308563232
hundred tiny fields, half changed | 633950880 | 21.8351459503174
hundred tiny fields, half nulled | 508943072 |
20.9554698467255

Results Summarization
-------------------------------------
1. With Chunkwise approach, WAL reduction is almost same as with LZ
barring half nulled case which can be improved.
2. With Chunkwise approach, CPU usage is reduced to 50% in most cases
where there is less or no compression,
still there is 5~10% overhead for cases where data is not
compressible. I think there will certainly a small
overhead of forming hash table and scanning to conclude data is
non-compressible.
3. I have not tested other tests which will anyway return from top of
encoding function due to tuple length less than 32.

Main reasons of improvement
---------------------------------------------
1. lesser hash entries for old tuple and lesser calculations during
compressing of new tuple.
2. memset for data structure related to hash table for lesser size
3. Don't copy into output buffer untill we found match.

Further Actions
------------------------
1. Need to decide if this reduction in CPU usage is acceptable, do we
need enable/disable flag at table level.
2. We can do further micro-optimisations in chunk wise approach like
hash function improvement.
3. Some code improvements are pending like for cases where data to be
compressed is non-contiguous.

Attached files
---------------------
1. pgrb_delta_encoding_v1 - In heaptuple.c, there is a parameter
rabin_fingerprint_comp, set it to true for
chunkwise delta encoding and set it to false for lz encoding. By
default it is true. I wanted to provide
better way to enable both modes and tried as well but end up with this way.
2. wal-update-testsuite.sh - test script developed by Heikki to test this patch.

Note -
a. Performance is data is taken on my laptop, needs to be tested on
some better m/c
b. Attached Patch is just a prototype of chunkwise concept, code needs
to be improved and decode
handling/test is pending.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Attachment Content-Type Size
pgrb_delta_encoding_v1.patch application/octet-stream 48.8 KB
wal-update-testsuite.sh application/x-sh 11.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2013-12-05 15:46:33 Re: [RFC] Shouldn't we remove annoying FATAL messages from server log?
Previous Message Andres Freund 2013-12-05 15:43:14 Re: same-address mappings vs. relative pointers