Quick Links

Re: Performance Improvement by reducing WAL for Update Operation

From:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Mike Blackwell <mike(dot)blackwell(at)rrd(dot)com>
Subject:	Re: Performance Improvement by reducing WAL for Update Operation
Date:	2014-02-05 11:59:47
Message-ID:	52F227B3.5080301@vmware.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 01/30/2014 08:53 AM, Amit Kapila wrote:
> On Wed, Jan 29, 2014 at 8:13 PM, Heikki Linnakangas
> <hlinnakangas(at)vmware(dot)com> wrote:
>> On 01/29/2014 02:21 PM, Amit Kapila wrote:
>>> The main reason to process in chunks as much as possible is to save
>>> cpu cycles. For example if we build hash table byte-by-byte, then even
>>> for best case where most of tuple has a match, it will have reasonable
>>> overhead due to formation of hash table.
>>
>> Hmm. One very simple optimization we could do is to just compare the two
>> strings byte by byte, before doing anything else, to find any common prefix
>> they might have. Then output a tag for the common prefix, and run the normal
>> algorithm on the rest of the strings. In many real-world tables, the 1-2
>> first columns are a key that never changes, so that might work pretty well
>> in practice. Maybe it would also be worthwhile to do the same for any common
>> suffix the tuples might have.
>
> Is it possible to do for both prefix and suffix together, basically
> the question I
> have in mind is what will be deciding factor for switching from hash table
> mechanism to string comparison mode for suffix. Do we switch when we find
> long enough match?

I think you got it backwards. You don't switch from hash table mechanism
to string comparison. You do the prefix/suffix comparison *first*, and
run the hash table algorithm only on the "middle" part, between the
common prefix and suffix.

> Can we do this optimization after the basic version is acceptable?

I would actually suggest doing that first. Perhaps even ditch the whole
history table approach and do *only* the scan for prefix and suffix.
That's very cheap, and already covers a large fraction of UPDATEs that
real applications do. In particular, it's optimal for the case that you
update only a single column, something like "UPDATE foo SET bar = bar + 1".

I'm pretty sure the overhead of that would be negligible, so we could
always enable it. There are certainly a lot of scenarios where
prefix/suffix detection alone wouldn't help, but so what.

Attached is a quick patch for that, if you want to test it.

- Heikki

Attachment	Content-Type	Size
wal-update-prefix-suffix-encode-1.patch	text/x-diff	29.1 KB

In response to

Re: Performance Improvement by reducing WAL for Update Operation at 2014-01-30 06:53:45 from Amit Kapila

Responses

Re: Performance Improvement by reducing WAL for Update Operation at 2014-02-05 14:48:52 from Amit Kapila
Re: Performance Improvement by reducing WAL for Update Operation at 2014-02-06 03:37:20 from Robert Haas
Re: Performance Improvement by reducing WAL for Update Operation at 2014-02-16 11:51:03 from Amit Kapila

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Amit Kapila	2014-02-05 14:48:52	Re: Performance Improvement by reducing WAL for Update Operation
Previous Message	Heikki Linnakangas	2014-02-05 11:43:28	Re: Performance Improvement by reducing WAL for Update Operation