Re: Performance Improvement by reducing WAL for Update Operation

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Mike Blackwell <mike(dot)blackwell(at)rrd(dot)com>
Subject: Re: Performance Improvement by reducing WAL for Update Operation
Date: 2014-02-10 15:02:21
Message-ID: CAA4eK1LOR9OYiOuDwg0SfgrQ0_Zw1q5MckB6THyUUK=2oY4rKA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Feb 6, 2014 at 5:57 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> On Thu, Feb 6, 2014 at 9:13 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> Considering above change as correct, I have tried to see the worst
> case overhead for this patch by having new tuple such that after
> 25% or so of suffix/prefix match, there is a small change in tuple
> and kept rest of tuple same as old tuple and it shows overhead
> for this patch as well.
>
> Updated test script is attached.
>
> Unpatched
> testname | wal_generated | duration
> ----------------------------------+---------------+------------------
> ten long fields, 8 bytes changed | 348843824 | 5.56866788864136
> ten long fields, 8 bytes changed | 348844800 | 5.84434294700623
> ten long fields, 8 bytes changed | 350500000 | 5.92329406738281
> (3 rows)
>
>
>
> wal-update-prefix-suffix-encode-1.patch
>
> testname | wal_generated | duration
> ----------------------------------+---------------+------------------
> ten long fields, 8 bytes changed | 348845624 | 6.92243480682373
> ten long fields, 8 bytes changed | 348847000 | 8.35828399658203
> ten long fields, 8 bytes changed | 350204752 | 7.61826491355896
> (3 rows)
>
> One minor point, can we avoid having prefix tag if prefixlen is 0.
>
> + /* output prefix as a tag */
> + pgrb_out_tag(ctrlp, ctrlb, ctrl, bp, prefixlen, hlen);

I think generating out tag for suffix/prefix has one bug i.e it doesn't
consider the max length of 273 bytes (PGLZ_MAX_MATCH ) which
is mandatory for LZ format.

One more point about this patch is that in function pgrb_delta_encode(),
is it mandatory to return at end in below check:

if (result_size > result_max)
return false;

I mean to say as before starting for copying literal bytes we have
already ensured that the compressed data is greater than >25%, so
may be we can avoid this check. I have tried to take the data by
removing this check and found that it reduces overhead and improves
WAL reduction as well. The data is as below (compare this with data
in above mail for unpatched version data):

testname | wal_generated | duration
----------------------------------+---------------+------------------
ten long fields, 8 bytes changed | 300705552 | 6.51416897773743
ten long fields, 8 bytes changed | 300703816 | 6.85267090797424
ten long fields, 8 bytes changed | 300701840 | 7.15832996368408
(3 rows)

If we want to go with this approach, then I think apart from above
points there is no major change required (may be some comments,
function names etc. can be improved).

> But if we really just want to do prefix/suffix compression, this is a
> crappy and expensive way to do it. We needn't force everything
> through the pglz tag format just because we elide a common prefix or
> suffix.

Here are you bothered about below code where the patch is
doing byte-by-byte copy after prefix/suffix match?

/* output bytes between prefix and suffix as literals */
dp = &source[prefixlen];
dend = &source[slen - suffixlen];
while (dp < dend)
{
pgrb_out_literal(ctrlp, ctrlb, ctrl, bp, *dp);
dp++; /* Do not do this ++ in the line above! */
}

I think if we want to change LZ format, it will be bit more work and
verification for decoding has to be done much more strenuously.

Note - During performance test, I have focussed mainly on worst case,
because we already know that this idea is good for best and average cases.
However if we decide that this is better and good to proceed, I can take
the data for other cases as well.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2014-02-10 15:13:57 Re: jsonb and nested hstore
Previous Message Marios Vodas 2014-02-10 14:59:15 GiST, getting the record in table that the leaf entry points to