Re: Performance Improvement by reducing WAL for Update Operation

From: Gavin Flower <GavinFlower(at)archidevsys(dot)co(dot)nz>
To: Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Mike Blackwell <mike(dot)blackwell(at)rrd(dot)com>
Subject: Re: Performance Improvement by reducing WAL for Update Operation
Date: 2014-02-06 05:11:44
Message-ID: 52F31990.6060003@archidevsys.co.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 06/02/14 16:59, Robert Haas wrote:
> On Wed, Feb 5, 2014 at 6:43 AM, Heikki Linnakangas
> <hlinnakangas(at)vmware(dot)com> wrote:
>> So, I came up with the attached worst case test, modified from your latest
>> test suite.
>>
>> unpatched:
>>
>>
>> testname | wal_generated | duration
>> --------------------------------------+---------------+------------------
>> ten long fields, all but one changed | 343385312 | 2.20806908607483
>> ten long fields, all but one changed | 336263592 | 2.18997097015381
>> ten long fields, all but one changed | 336264504 | 2.17843413352966
>> (3 rows)
>>
>>
>> pgrb_delta_encoding_v8.patch:
>>
>> testname | wal_generated | duration
>> --------------------------------------+---------------+------------------
>> ten long fields, all but one changed | 338356944 | 3.33501315116882
>> ten long fields, all but one changed | 344059272 | 3.37364101409912
>> ten long fields, all but one changed | 336257840 | 3.36244201660156
>> (3 rows)
>>
>> So with this test, the overhead is very significant.
> Yuck. Well that sucks.
>
>> With the skipping logic, another kind of "worst case" case is that you have
>> a lot of similarity between the old and new tuple, but you miss it because
>> you skip. For example, if you change the first few columns, but leave a
>> large text column at the end of the tuple unchanged.
> I suspect there's no way to have our cake and eat it, too. Most of
> the work that Amit has done on this patch in the last few revs is to
> cut back CPU overhead in the cases where the patch can't help because
> the tuple has been radically modified. If we're trying to get maximum
> compression, we need to go the other way: for example, we could just
> feed both the old and new tuples through pglz (or snappy, or
> whatever). That would allow us to take advantage not only of
> similarity between the old and new tuples but also internal
> duplication within either the old or the new tuple, but it would also
> cost more CPU. The concern with minimizing overhead in cases where
> the compression doesn't help has thus far pushed us in the opposite
> direction, namely passing over compression opportunities that a more
> aggressive algorithm could find in order to keep the overhead low.
>
> Off-hand, I'm wondering why we shouldn't apply the same skipping
> algorithm that Amit is using at the beginning of the string for the
> rest of it as well. It might be a little too aggressive (maybe the
> skip distance shouldn't increase by quite as much as doubling every
> time, or not beyond 16/32 bytes?) but I don't see why the general
> principle isn't sound wherever we are in the tuple.
>
> Unfortunately, despite changing things to make a history entry only
> every 4th character, building the history is still pretty expensive.
> By the time we even begin looking at the tuple we're gonna compress,
> we've already spent something like half the total effort, and of
> course we have to go further than that before we know whether our
> attempt to compress is actually going anywhere. I think that's the
> central problem here. pglz has several safeguards to ensure that it
> doesn't do too much work in vain: we abort if we find nothing
> compressible within first_success_by bytes, or if we emit enough total
> output to be certain that we won't meet the need_rate threshold.
> Those safeguards are a lot less effective here because they can't be
> applied until *after* we've already paid the cost of building the
> history. If we could figure out some way to apply those guards, or
> other guards, earlier in the algorithm, we could do a better job
> mitigating the worst-case scenarios, but I don't have a good idea.
>
Surely the weighting should be done according to the relative scarcity
of processing power vs I/O bandwidth? I get the impression that
different workloads and hardware configurations may favour conserving
one of processor or I/O resources. Would it be feasible to have
different logic, depending on the trade-offs identified?

Cheers,
Gavin

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message 坂本 翼 2014-02-06 07:43:33 client encoding that psql command sets
Previous Message Tom Lane 2014-02-06 05:08:45 Re: Add CREATE support to event triggers