Re: Performance Improvement by reducing WAL for Update Operation

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Peter Geoghegan <pg(at)heroku(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Mike Blackwell <mike(dot)blackwell(at)rrd(dot)com>
Subject: Re: Performance Improvement by reducing WAL for Update Operation
Date: 2014-02-12 04:32:32
Message-ID: CAA4eK1LRXQ8TJ2gGUA3kip0gAZsMpu5AR4CpNpP=r1kAvYMMTw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Feb 11, 2014 at 10:07 PM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> On Wed, Feb 5, 2014 at 10:57:57AM -0800, Peter Geoghegan wrote:
>> On Wed, Feb 5, 2014 at 12:50 AM, Heikki Linnakangas
>> <hlinnakangas(at)vmware(dot)com> wrote:
>> >> I think there's zero overlap. They're completely complimentary features.
>> >> It's not like normal WAL records have an irrelevant volume.
>> >
>> >
>> > Correct. Compressing a full-page image happens on the first update after a
>> > checkpoint, and the diff between old and new tuple is not used in that case.
>>
>> Uh, I really just meant that one thing that might overlap is
>> considerations around the choice of compression algorithm. I think
>> that there was some useful discussion of that on the other thread as
>> well.
>
> Yes, that was my point. I though the compression of full-page images
> was a huge win and that compression was pretty straight-forward, except
> for the compression algorithm. If the compression algorithm issue is
> resolved,

By issue, I assume you mean to say, which compression algorithm is
best for this patch.
For this patch, currently we have 2 algorithm's for which results have been
posted. As far as I understand Heikki is pretty sure that the latest algorithm
(compression using prefix-suffix match in old and new tuple) used for this
patch is better than the other algorithm in terms of CPU gain or overhead.
The performance data taken by me for the worst case for this algorithm
shows there is a CPU overhead for this algorithm as well.

OTOH the another algorithm (compression using old tuple as history) can be
a bigger win in terms I/O reduction in more number of cases.

In short, it is still not decided which algorithm to choose and whether
it can be enabled by default or it is better to have table level switch
to enable/disable it.

So I think the decision to be taken here is about below points:
1. Are we okay with I/O reduction at the expense of CPU for *worst* cases
and I/O reduction without impacting CPU (better overall tps) for
*favourable* cases?
2. If we are not okay with worst case behaviour, then can we provide
a table-level switch, so that it can be decided by user?
3. If none of above, then is there any other way to mitigate the worst
case behaviour or shall we just reject this patch and move on.

Given a choice to me, I would like to go with option-2, because I think
for most cases UPDATE statement will have same data for old and
new tuples except for some part of tuple (generally column's having large
text data are not modified), so we will be end up mostly in favourable cases
and surely for worst cases we don't want user to suffer from CPU overhead,
so a table-level switch is also required.

I think here one might argue that for some users it is not feasible to
decide whether their tuples data for UPDATE is going to be similar
or completely different and they are not at all ready for any risk for
CPU overhead, but they would be happy to see I/O reduction in which
case it is difficult to decide what should be the value of table-level
switch. Here I think the only answer is "nothing is free" in this world,
so either make sure about the application's behaviour for UPDATE
statement before going to production or just don't enable this switch and
be happy with the current behaviour.

On the other side there will be users who will be pretty certain about their
usage of UPDATE statement or atleast are ready to evaluate their
application if they can get such a huge gain, so it would be quite useful
feature for such users.

>can we move move forward with the full-page compression patch?

In my opinion, it is not certain that whatever compression algorithm got
decided for this patch (if any) can be directly used for full-page
compression, some ideas could be used or may be the algorithm could be
tweaked a bit to make it usable for full-page compression.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2014-02-12 04:41:28 Re: narwhal and PGDLLIMPORT
Previous Message Inoue, Hiroshi 2014-02-12 03:28:14 Re: narwhal and PGDLLIMPORT