Re: Performance Improvement by reducing WAL for Update Operation

From: Amit kapila <amit(dot)kapila(at)huawei(dot)com>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, "hlinnakangas(at)vmware(dot)com" <hlinnakangas(at)vmware(dot)com>, "noah(at)leadboat(dot)com" <noah(at)leadboat(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Performance Improvement by reducing WAL for Update Operation
Date: 2013-01-09 08:05:02
Message-ID: 6C0B27F7206C9E4CA54AE035729E9C383BEAAE32@szxeml509-mbx
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On Friday, January 04, 2013 8:03 PM Simon Riggs wrote:
On 4 January 2013 13:53, Amit Kapila <amit(dot)kapila(at)huawei(dot)com> wrote:
> On Friday, December 28, 2012 3:52 PM Simon Riggs wrote:
>> On 28 December 2012 08:07, Kyotaro HORIGUCHI
>> <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>>
>> > Hello, I saw this patch and confirmed that
>> >
>> > - Coding style looks good.
>> > - Appliable onto HEAD.
>> > - Some mis-codings are fixed.
>>
>> I've had a quick review of the patch to see how close we're getting.
>> The perf tests look to me like we're getting what we wanted from this
>> and I'm happy with the recovery performance trade-offs. Well done to
>> both author and testers.
>>
Update patch contains handling of below Comments
>
>* There is a fixed 75% heuristic in the patch. Can we document where
>that came from? Can we have a parameter that sets that please? This
>can be used to have further tests to confirm the useful setting of
>this. I expect it to be removed before we release, but it will help
>during beta.

Added a guc variable wal_update_compression_ratio to set the compression ratio.
It can be removed during beta.

>* The compression algorithm depends completely upon new row length
>savings. If the new row is short, it would seem easier to just skip
>the checks and include it anyway. We can say if old and new vary in
>length by > 50% of each other, just include new as-is, since the rows
>very clearly differ in a big way.

Added a check in heap_delta_encode to identify whether the tuples are differ in length by 50%.

>* If full page writes is on and the page is very old, we are just
>going to copy the whole block. So why not check for that rather than
>do all these push ups and then just copy the page anyway?

Added a function which is used to identify whether the page needs a backup block or not.
based on the result the optimization is applied.

>* I'd like to see a specific test in regression that is designed to
>exercise the code here. That way we will be certain that the code is
>getting regularly tested.

Added the regression tests which covers all the changes done for the optimization except recovery.

>* The internal docs are completely absent. We need at least a whole
>page of descriptive comment, discussing trade-offs and design
>decisions. This is very important because it will help locate bugs
>much faster if these things are clealry documented. It also helps
>reviewers. This is a big timewaster for committers because you have to
>read the whole patch and understand it before you can attempt to form
>opinions. Commits happen quicker and easier with good comments.
>* Need to mention the WAL format change, or include the change within
>the patch so we can see

backend/access/transam/README is updated with details.

>* Lots of typos in comments. Many comments say nothing more than the
>words already used in the function name itself

corrected the typos and removed unnecessary comments.

>* "flags" variables are almost always int or uint in PG source.
>
>* PGLZ_HISTORY_SIZE needs to be documented in the place it is defined,
>not the place its used. The test if (oldtuplen < PGLZ_HISTORY_SIZE)
>really needs to be a test inside the compression module to maintain
>better modularity, so the value itself needn't be exported

(oldtuplen < PGLZ_HISTORY_SIZE) validation is moved inside the heap_delta_encode
and updated the flags variable also.

Test results with modified pgbench (1800 record size) on the latest patch:

-Patch- -tps(at)-c1- -WAL(at)-c1- -tps(at)-c2- -WAL(at)-c2-
Head 831 4.17 GB 1416 7.13 GB
WAL modification 846 2.36 GB 1712 3.31 GB

-Patch- -tps(at)-c4- -WAL(at)-c4- -tps(at)-c8- -WAL(at)-c8-
Head 2196 11.01 GB 2758 13.88 GB
WAL modification 3295 5.87 GB 5472 9.02 GB

With Regards,
Amit Kapila.

Attachment Content-Type Size
wal_update_changes_v7.patch application/octet-stream 60.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2013-01-09 08:18:08 Re: PL/perl should fail on configure, not make
Previous Message Benedikt Grundmann 2013-01-09 07:56:12 Re: Improve compression speeds in pg_lzcompress.c