Re: Performance Improvement by reducing WAL for Update Operation

From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Mike Blackwell <mike(dot)blackwell(at)rrd(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Peter Geoghegan <pg(at)heroku(dot)com>, Claudio Freire <klaussfreire(at)gmail(dot)com>
Subject: Re: Performance Improvement by reducing WAL for Update Operation
Date: 2014-03-12 21:30:27
Message-ID: 5320D1F3.4050709@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 03/04/2014 01:58 PM, Amit Kapila wrote:
> On Mon, Mar 3, 2014 at 7:57 PM, Heikki Linnakangas
> <hlinnakangas(at)vmware(dot)com> wrote:
>> On 02/16/2014 01:51 PM, Amit Kapila wrote:
>>>
>>> On Wed, Feb 5, 2014 at 5:29 PM, Heikki Linnakangas
>>> <hlinnakangas(at)vmware(dot)com> wrote:
>>
>> Thanks. I have to agree with Robert though that using the pglz encoding when
>> we're just checking for a common prefix/suffix is a pretty crappy way of
>> going about it [1].
>>
>> As the patch stands, it includes the NULL bitmap when checking for a common
>> prefix. That's probably not a good idea, because it defeats the prefix
>> detection in a the common case that you update a field from NULL to not-NULL
>> or vice versa.
>>
>> Attached is a rewritten version, which does the prefix/suffix tests directly
>> in heapam.c, and adds the prefix/suffix lengths directly as fields in the
>> WAL record. If you could take one more look at this version, to check if
>> I've missed anything.
>
> I had verified the patch and found few minor points:
> ...

Fixed those.

> One Question:
> + rdata[1].data = (char *) &xlrec;
> Earlier it seems to store record hearder as first segment rdata[0],
> whats the reason of changing it?

I found the code easier to read that way. The order of rdata entries
used to be:

0: xl_heap_update struct
1: full-page reference to oldbuf (no data)
2: xl_heap_header_len struct for the new tuple
3-7: logical decoding stuff

The prefix/suffix fields made that order a bit awkward, IMHO. They are
logically part of the header, even though they're not part of the struct
(they are documented in comments inside the struct). So they ought to
stay together with the xl_heap_update struct. Another option would've
been to move it after the xl_heap_header_len struct.

Note that this doesn't affect the on-disk format of the WAL record,
because the moved rdata entry is just a full-page reference, with no
payload of its own.

> I have verified the patch by doing crash recovery for below scenario's
> and it worked fine:
> a. no change in old and new tuple
> b. all changed in new tuple
> c. half changed (update half of the values to NULLS) in new tuple
> d. only prefix same in new tuple
> e. only suffix same in new tuple
> f. prefix-suffix same, other columns values changed in new tuple.

Thanks!

> Conclusion is that patch shows good WAL reduction and performance
> improvement for favourable cases without CPU overhead for non-favourable
> cases.

Ok, great. Committed!

I left out the regression tests. It was good to have them while
developing this, but I don't think there's a lot of value in including
them permanently in the regression suite. Low-level things like the
alignment-sensitive test are fragile, and can easily stop testing the
thing it's supposed to test, depending on the platform and future
changes in the code. And the current algorithm doesn't care much about
alignment anyway.

- Heikki

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2014-03-12 21:30:54 Re: jsonb and nested hstore
Previous Message Rukh Meski 2014-03-12 21:12:26 Re: 9.5: UPDATE/DELETE .. ORDER BY .. LIMIT ..