Re: [WIP] Performance Improvement by reducing WAL for Update Operation

Lists: pgsql-hackers
From: Amit kapila <amit(dot)kapila(at)huawei(dot)com>
To: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Cc: "'Heikki Linnakangas'" <heikki(dot)linnakangas(at)enterprisedb(dot)com>, "noah(at)leadboat(dot)com" <noah(at)leadboat(dot)com>
Subject: Re: [WIP] Performance Improvement by reducing WAL for Update Operation
Date: 2012-09-24 10:57:02
Message-ID: 6C0B27F7206C9E4CA54AE035729E9C3828533ECC@szxeml509-mbs
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

From: Heikki Linnakangas [mailto:heikki(dot)linnakangas(at)enterprisedb(dot)com]
Sent: Monday, August 27, 2012 5:58 PM
To: Amit kapila
On 27.08.2012 15:18, Amit kapila wrote:
>>> I have implemented the WAL Reduction Patch for the case of HOT Update as
pointed out by Simon and Robert. In this patch it only goes for Optimized
WAL in case of HOT Update with other restrictions same as in previous patch.
>>
>>> The performance numbers for this patch are attached in this mail. It has
improved by 90% if the page has fillfactor 80.
>>
>>> Now going forward I have following options:
>>> a. Upload the patch in Open CF for WAL Reduction which contains
reductution for HOT and non-HOT updates.
>>> b. Upload the patch in Open CF for WAL Reduction which contains
reductution for HOT updates.
>>> c. Upload both the patches as different versions.

>> Let's do it for HOT updates only. Simon & Robert made good arguments on
>> why this is a bad idea for non-HOT updates.

>Okay, I shall do it that way.
>So now I shall send information about all the testing I have done for this
>Patch and then Upload it in CF.

Rebased version of patch based on latest code.

With Regards,

Amit Kapila.

Attachment Content-Type Size
wal_update_changes_v2.patch application/octet-stream 29.3 KB

From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Amit kapila <amit(dot)kapila(at)huawei(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, "noah(at)leadboat(dot)com" <noah(at)leadboat(dot)com>
Subject: Re: Re: [WIP] Performance Improvement by reducing WAL for Update Operation
Date: 2012-09-25 14:00:01
Message-ID: 5061B8E1.9010209@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 24.09.2012 13:57, Amit kapila wrote:
> Rebased version of patch based on latest code.

When HOT was designed, we decided that heap_update needs to compare the
old and new attributes directly, with memcmp(), to determine whether any
of the indexed columns have changed. It was not deemed infeasible to
pass down that information from the executor. I don't remember the
details of why that was, but you seem to trying to same thing in this
patch, and pass the bitmap of modified cols from the executor to
heap_update(). I'm pretty sure that won't work, for the same reasons we
didn't do it for HOT.

I still feel that it would probably be better to use a generic delta
encoding scheme, instead of inventing one. How about VCDIFF
(http://tools.ietf.org/html/rfc3284), for example? Or you could reuse
the LZ compressor that we already have in the source tree. You can use
LZ for delta compression by initializing the history buffer of the
algorithm with the old tuple, and then compressing the new tuple as
usual. Or you could still use the knowledge of where the attributes
begin and end and which attributes were updated, and do the encoding
similar to how you did in the patch, but use LZ as the output format.
That way the decoding would be the same as LZ decompression.

- Heikki


From: Amit Kapila <amit(dot)kapila(at)huawei(dot)com>
To: "'Heikki Linnakangas'" <hlinnakangas(at)vmware(dot)com>
Cc: <pgsql-hackers(at)postgresql(dot)org>, <noah(at)leadboat(dot)com>
Subject: Re: Re: [WIP] Performance Improvement by reducing WAL for Update Operation
Date: 2012-09-25 15:27:06
Message-ID: 009001cd9b32$394a4090$abdec1b0$@kapila@huawei.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> On Tuesday, September 25, 2012 7:30 PM Heikki Linnakangas wrote:
> On 24.09.2012 13:57, Amit kapila wrote:
> > Rebased version of patch based on latest code.
>
> When HOT was designed, we decided that heap_update needs to compare the
> old and new attributes directly, with memcmp(), to determine whether
> any
> of the indexed columns have changed. It was not deemed infeasible to
> pass down that information from the executor. I don't remember the
> details of why that was, but you seem to trying to same thing in this
> patch, and pass the bitmap of modified cols from the executor to
> heap_update(). I'm pretty sure that won't work, for the same reasons we
> didn't do it for HOT.

I think the reason of not relying on modified columns can be some such case
where modified columns might not give the correct information.
It may be due to Before triggers can change the modified columns that's why
for HOT update we need to do
Comparison. In our case we have taken care of such a case by not doing
optimization, so not relying on modified columns.

If you feel it is must to do the comparison, we can do it in same way as we
identify for HOT?

> I still feel that it would probably be better to use a generic delta
> encoding scheme, instead of inventing one. How about VCDIFF
> (http://tools.ietf.org/html/rfc3284), for example? Or you could reuse
> the LZ compressor that we already have in the source tree. You can use
> LZ for delta compression by initializing the history buffer of the
> algorithm with the old tuple, and then compressing the new tuple as
> usual.

>Or you could still use the knowledge of where the attributes
> begin and end and which attributes were updated, and do the encoding
> similar to how you did in the patch, but use LZ as the output format.
> That way the decoding would be the same as LZ decompression.

Can you please explain me why you think that after doing encoding doing LZ
compression on it is better, as already we have reduced the amount of WAL
for update by only storing changed column information?

a. is it to further reduce the size of WAL
b. storing diff WAL in some standard format
c. or does it give any other kind of benefit

With Regards,
Amit Kapila.


From: Noah Misch <noah(at)leadboat(dot)com>
To: Amit kapila <amit(dot)kapila(at)huawei(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, 'Heikki Linnakangas' <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Subject: Re: [WIP] Performance Improvement by reducing WAL for Update Operation
Date: 2012-09-27 03:41:36
Message-ID: 20120927034136.GA25353@tornado.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Sep 24, 2012 at 10:57:02AM +0000, Amit kapila wrote:
> Rebased version of patch based on latest code.

I like the direction you're taking with this patch; the gains are striking,
especially considering the isolation of the changes.

You cannot assume executor-unmodified columns are also unmodified from
heap_update()'s perspective. Expansion in one column may instigate TOAST
compression of a logically-unmodified column, and that counts as a change for
xlog delta purposes. You do currently skip the optimization for relations
having a TOAST table, but TOAST compression can still apply. Observe this
with text columns of storage mode PLAIN. I see two ways out: skip the new
behavior when need_toast=true, or compare all inline column data, not just
what the executor modified. One can probably construct a benchmark favoring
either choice. I'd lean toward the latter; wide tuples are the kind this
change can most help. If the marginal advantage of ignoring known-unmodified
columns proves important, we can always bring it back after designing a way to
track which columns changed in the toaster.

Given that, why not treat the tuple as an opaque series of bytes and not worry
about datum boundaries? When several narrow columns change together, say a
sequence of sixteen smallint columns, you will use fewer binary delta commands
by representing the change with a single 32-byte substitution. If an UPDATE
changes just part of a long datum, the delta encoding algorithm will still be
able to save considerable space. That case arises in many forms: changing
one word in a long string, changing one element in a long array, changing one
field of a composite-typed column. Granted, this makes the choice of delta
encoding algorithm more important.

Like Heikki, I'm left wondering why your custom delta encoding is preferable
to an encoding from the literature. Your encoding has much in common with
VCDIFF, even sharing two exact command names. If a custom encoding is the
right thing, code comments or a README section should at least discuss the
advantages over an established alternative. Idle thought: it might pay off to
use 1-byte sizes and offsets most of the time. Tuples shorter than 256 bytes
are common; for longer tuples, we can afford wider offsets.

The benchmarks you posted upthread were helpful. I think benchmarking with
fsync=off is best if you don't have a battery-backed write controller or SSD.
Otherwise, fsync time dominates a pgbench run. Please benchmark recovery. To
do so, set up WAL archiving and take a base backup from a fresh cluster. Run
pgbench for awhile. Finally, observe the elapsed time to recover your base
backup to the end of archived WAL.

> *** a/src/backend/access/common/heaptuple.c
> --- b/src/backend/access/common/heaptuple.c

> + /*
> + * encode_xlog_update
> + * Forms a diff tuple from old and new tuple with the modified columns.
> + *
> + * att - attribute list.
> + * oldtup - pointer to the old tuple.
> + * heaptup - pointer to the modified tuple.
> + * wal_tup - pointer to the wal record which needs to be formed from old
> + and new tuples by using the modified columns list.
> + * modifiedCols - modified columns list by the update command.
> + */
> + void
> + encode_xlog_update(Form_pg_attribute *att, HeapTuple oldtup,
> + HeapTuple heaptup, HeapTuple wal_tup,
> + Bitmapset *modifiedCols)

This name is too generic for an extern function. Maybe "heap_delta_encode"?

> + void
> + decode_xlog_update(HeapTupleHeader htup, uint32 old_tup_len, char *data,
> + uint32 *new_tup_len, char *waldata, uint32 wal_len)

Likwise, maybe "heap_delta_decode" here.

> *** a/src/backend/access/heap/heapam.c
> --- b/src/backend/access/heap/heapam.c
> ***************
> *** 71,77 ****
> #include "utils/syscache.h"
> #include "utils/tqual.h"
>
> -
> /* GUC variable */
> bool synchronize_seqscans = true;
>

Spurious whitespace change.

> ***************
> *** 3195,3204 **** l2:
> /* XLOG stuff */
> if (RelationNeedsWAL(relation))
> {
> ! XLogRecPtr recptr = log_heap_update(relation, buffer, oldtup.t_self,
> ! newbuf, heaptup,
> ! all_visible_cleared,
> ! all_visible_cleared_new);
>
> if (newbuf != buffer)
> {
> --- 3203,3233 ----
> /* XLOG stuff */
> if (RelationNeedsWAL(relation))
> {
> ! XLogRecPtr recptr;
> !
> ! /*
> ! * Apply the xlog diff update algorithm only for hot updates.
> ! */
> ! if (modifiedCols && use_hot_update)

Why HOT in particular? I understand the arguments upthread for skipping the
optimization when the update crosses pages, but the other condition for HOT
(no changes to indexed columns) seems irrelevant here. Why not retest "newbuf
== buffer", instead?

In any event, the comment should memorialize rationale behind any excluded
cases, not merely restate the obvious fact that the code excludes them.

For the record, I think that if this pays off for intra-page updates, we
should eventually extend it to cross-page updates under full_page_writes=on.
If we were already logging deltas for all updates, I doubt we would adopt a
proposal to add complete-tuple logging as a disaster recovery aid. When
something corrupts a block, all bets are off.

> ***************
> *** 5239,5252 **** heap_xlog_update(XLogRecPtr lsn, XLogRecord *record, bool hot_update)
> } tbuf;
> xl_heap_header xlhdr;
> int hsize;
> ! uint32 newlen;
> Size freespace;
>
> /*
> * The visibility map may need to be fixed even if the heap page is
> * already up-to-date.
> */
> ! if (xlrec->all_visible_cleared)
> {
> Relation reln = CreateFakeRelcacheEntry(xlrec->target.node);
> BlockNumber block = ItemPointerGetBlockNumber(&xlrec->target.tid);
> --- 5276,5293 ----
> } tbuf;
> xl_heap_header xlhdr;
> int hsize;
> ! uint32 new_tup_len = 0;

This variable rename looks spurious.

> Size freespace;
>
> + /* Initialize the buffer, used to frame the new tuple */
> + MemSet((char *) &tbuf.hdr, 0, sizeof(HeapTupleHeaderData));
> + hsize = SizeOfHeapUpdate + SizeOfHeapHeader;
> +
> /*
> * The visibility map may need to be fixed even if the heap page is
> * already up-to-date.
> */
> ! if (xlrec->new_all_visible_cleared & 0x0F)
> {
> Relation reln = CreateFakeRelcacheEntry(xlrec->target.node);
> BlockNumber block = ItemPointerGetBlockNumber(&xlrec->target.tid);
> ***************
> *** 5266,5272 **** heap_xlog_update(XLogRecPtr lsn, XLogRecord *record, bool hot_update)
> }
>
> /* Deal with old tuple version */
> -
> buffer = XLogReadBuffer(xlrec->target.node,
> ItemPointerGetBlockNumber(&(xlrec->target.tid)),
> false);

Spurious whitespace change.

> --- 5307,5312 ----
> ***************
> *** 5291,5296 **** heap_xlog_update(XLogRecPtr lsn, XLogRecord *record, bool hot_update)
> --- 5331,5359 ----
>
> htup = (HeapTupleHeader) PageGetItem(page, lp);
>
> + if (xlrec->diff_update)
> + {
> + char *data = (char *) &tbuf.hdr + htup->t_hoff;
> + uint32 old_tup_len;
> + uint32 wal_len;
> + char *waldata = (char *) xlrec + hsize + htup->t_hoff
> + - offsetof(HeapTupleHeaderData, t_bits);
> +
> + wal_len = record->xl_len - hsize;
> + Assert(wal_len <= MaxHeapTupleSize);
> +
> + wal_len -= (htup->t_hoff - offsetof(HeapTupleHeaderData, t_bits));
> +
> + old_tup_len = ItemIdGetLength(lp) - htup->t_hoff;
> +
> + /* copy exactly the tuple header present in the WAL to new tuple */
> + memcpy((char *) &tbuf.hdr + offsetof(HeapTupleHeaderData, t_bits),
> + (char *) xlrec + hsize,
> + (htup->t_hoff - offsetof(HeapTupleHeaderData, t_bits)));
> +
> + decode_xlog_update(htup, old_tup_len, data, &new_tup_len, waldata, wal_len);

I think the above code should appear later, with treatment of the new tuple.

encode_xlog_update() and decode_xlog_update() should be essentially-inverse
APIs. Instead, you have encode_xlog_update() working with HeapTuple arguments
while decode_xlog_update() works with opaque pointers. encode_xlog_update()
clones the header, but decode_xlog_update() leaves that to its caller. Those
decisions are convenient enough for these heapam.c callers, but I think
heap_xlog_update() should work harder so the decode_xlog_update() argument
list need not appear ad hoc.

> ***************
> *** 5317,5322 **** heap_xlog_update(XLogRecPtr lsn, XLogRecord *record, bool hot_update)
> --- 5380,5386 ----
> */
> if (samepage)
> goto newsame;
> +
> PageSetLSN(page, lsn);
> PageSetTLI(page, ThisTimeLineID);
> MarkBufferDirty(buffer);

Spurious whitespace change.

> *** a/src/backend/executor/nodeModifyTable.c
> --- b/src/backend/executor/nodeModifyTable.c

> ***************
> *** 554,559 **** lreplace:;
> --- 557,571 ----
> if (resultRelationDesc->rd_att->constr)
> ExecConstraints(resultRelInfo, slot, estate);
>
> + /* check whether the xlog diff update can be applied or not? */
> + if ((resultRelationDesc->rd_toastoid == InvalidOid)
> + && (tuple_bf_trigger == tuple)
> + && (tuple->t_len > MinHeapTupleSizeForDiffUpdate))

Having the executor apply these tests introduces a modularity violation.

If any of these restrictions are to remain, a comment at code enforcing them
should give rationale for each.

> *** a/src/include/access/heapam_xlog.h
> --- b/src/include/access/heapam_xlog.h
> ***************
> *** 142,149 **** typedef struct xl_heap_update
> {
> xl_heaptid target; /* deleted tuple id */
> ItemPointerData newtid; /* new inserted tuple id */
> ! bool all_visible_cleared; /* PD_ALL_VISIBLE was cleared */
> ! bool new_all_visible_cleared; /* same for the page of newtid */
> /* NEW TUPLE xl_heap_header AND TUPLE DATA FOLLOWS AT END OF STRUCT */
> } xl_heap_update;
>
> --- 142,155 ----
> {
> xl_heaptid target; /* deleted tuple id */
> ItemPointerData newtid; /* new inserted tuple id */
> ! bool diff_update; /* optimized update or not */
> ! /*
> ! * To keep the structure size same all_visible_cleared is merged with
> ! * new_all_visible_cleared.
> ! */
> ! bool new_all_visible_cleared; /* MSB 4 bits tells PD_ALL_VISIBLE was
> ! cleared of new page and rest 4 bits
> ! for the old page */

In place of these two fields, store three flags in a uint8 field.

> /* NEW TUPLE xl_heap_header AND TUPLE DATA FOLLOWS AT END OF STRUCT */
> } xl_heap_update;
>
> *** a/src/include/access/htup_details.h
> --- b/src/include/access/htup_details.h
> ***************
> *** 528,533 **** struct MinimalTupleData
> --- 528,546 ----
> HeapTupleHeaderSetOid((tuple)->t_data, (oid))
>
>
> + /* WAL Diff update options */
> + #define HEAP_UPDATE_WAL_OPT_COPY 0
> + #define HEAP_UPDATE_WAL_OPT_ADD 1
> + #define HEAP_UPDATE_WAL_OPT_IGN 2
> + #define HEAP_UPDATE_WAL_OPT_PAD 3

These defines can remain private to the file implementing the encoding.

> +
> + /*
> + * Minimum tuple length required by the tuple during update operation for doing
> + * WAL optimization of update operation.
> + */
> + #define MinHeapTupleSizeForDiffUpdate 128

It's not at all clear to me what threshold to use, and 128 feels high. If you
want to select a threshold, I suggest benchmarking through a binary search of
small tuple sizes. That being said, though I have no doubt the algorithm will
lose when updating a single one-byte column, it will also finish darn quickly.
Might it be enough to just run the delta algorithm every time but discard any
diff wider than the complete new tuple?

Thanks,
nm


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: Amit kapila <amit(dot)kapila(at)huawei(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, "'Heikki Linnakangas'" <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Subject: Re: Re: [WIP] Performance Improvement by reducing WAL for Update Operation
Date: 2012-09-27 04:48:39
Message-ID: 15929.1348721319@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Noah Misch <noah(at)leadboat(dot)com> writes:
> You cannot assume executor-unmodified columns are also unmodified from
> heap_update()'s perspective. Expansion in one column may instigate TOAST
> compression of a logically-unmodified column, and that counts as a change for
> xlog delta purposes.

Um ... what about BEFORE triggers?

Frankly, I think that expecting the executor to tell you which columns
have been modified is a non-starter. We have a solution for HOT and
it's silly to do the same thing differently just a few lines away.

regards, tom lane


From: Amit Kapila <amit(dot)kapila(at)huawei(dot)com>
To: "'Tom Lane'" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "'Noah Misch'" <noah(at)leadboat(dot)com>
Cc: <pgsql-hackers(at)postgresql(dot)org>, "'Heikki Linnakangas'" <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Subject: Re: Re: [WIP] Performance Improvement by reducing WAL for Update Operation
Date: 2012-09-27 05:50:56
Message-ID: 008b01cd9c74$10cad4c0$32607e40$@kapila@huawei.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thursday, September 27, 2012 10:19 AM
> Noah Misch <noah(at)leadboat(dot)com> writes:
> > You cannot assume executor-unmodified columns are also unmodified
> from
> > heap_update()'s perspective. Expansion in one column may instigate
> TOAST
> > compression of a logically-unmodified column, and that counts as a
> change for
> > xlog delta purposes.
>
> Um ... what about BEFORE triggers?

This optimization will not apply in case Before triggers updates the tuple.

>
> Frankly, I think that expecting the executor to tell you which columns
> have been modified is a non-starter. We have a solution for HOT and
> it's silly to do the same thing differently just a few lines away.
>
My apprehension is that it can hit the performance advantage if we compare
all attributes to check which have been modified and that to under Buffer
Exclusive Lock.
In case of HOT only the index attributes get compared.

I agree that doing things differently at 2 nearby places is not good.
So I will do it same way as for HOT and then take the performance data again
and if there is no big impact then
we can do it that way.

With Regards,
Amit Kapila.


From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Amit Kapila <amit(dot)kapila(at)huawei(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, noah(at)leadboat(dot)com
Subject: Re: Re: [WIP] Performance Improvement by reducing WAL for Update Operation
Date: 2012-09-27 10:41:46
Message-ID: 50642D6A.5080608@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 25.09.2012 18:27, Amit Kapila wrote:
> If you feel it is must to do the comparison, we can do it in same way as we
> identify for HOT?

Yeah. (But as discussed, I think it would be even better to just treat
the old and new tuple as an opaque chunk of bytes, and run them through
a generic delta algorithm).

> Can you please explain me why you think that after doing encoding doing LZ
> compression on it is better, as already we have reduced the amount of WAL
> for update by only storing changed column information?
>
> a. is it to further reduce the size of WAL
> b. storing diff WAL in some standard format
> c. or does it give any other kind of benefit

Potentially all of those. I don't know if it'd be better or worse, but
my gut feeling is that it would be simpler, and produce even more
compact WAL.

Attached is a simple patch to apply LZ compression to update WAL
records. I modified the LZ compressor so that it can optionally use a
separate "history" data, and the same history data must then be passed
to the decompression function. That makes it work as a pretty efficient
delta encoder, when you use the old tuple as the history data.

I ran some performance tests with the modified version of pgbench that
you posted earlier:

Current PostgreSQL master
-------------------------

tps = 941.601924 (excluding connections establishing)
pg_xlog_location_diff
-----------------------
721227944

pglz_wal_update_records.patch
-----------------------------

tps = 1039.792527 (excluding connections establishing)
pg_xlog_location_diff
-----------------------
419395208

pglz_wal_update_records.patch, COMPRESS_ONLY
--------------------------------------------

tps = 1009.682002 (excluding connections establishing)
pg_xlog_location_diff
-----------------------
422505104

Amit's wal_update_changes_hot_update.patch
------------------------------------------

tps = 1092.703883 (excluding connections establishing)
pg_xlog_location_diff
-----------------------
436031544

The COMPRESS_ONLY result is with the attached patch, but it just uses LZ
to compress the new tuple, without taking advantage of the old tuple.
The pg_xlog_location_diff value is the amount of WAL generated during
the pgbench run. Attached is also the shell script I used to run these
tests.

The conclusion is that there isn't very much difference among the
patches. They all squeeze the WAL to about the same size, and the
increase in TPS is roughly the same.

I think more performance testing is required. The modified pgbench test
isn't necessarily very representative of a real-life application. The
gain (or loss) of this patch is going to depend a lot on how many
columns are updated, and in what ways. Need to test more scenarios, with
many different database schemas.

The LZ approach has the advantage that it can take advantage of all
kinds of similarities between old and new tuple. For example, if you
swap the values of two columns, LZ will encode that efficiently. Or if
you insert a character in the middle of a long string. On the flipside,
it's probably more expensive. Then again, you have to do a memcmp() to
detect which columns have changed with your approach, and that's not
free either. That was not yet included in the patch version I tested.
Another consideration is that when you compress the record more, you
have less data to calculate CRC for. CRC calculation tends to be quite
expensive, so even quite aggressive compression might be a win. Yet
another consideration is that the compression/encoding is done while
holding a lock on the buffer. For the sake of concurrency, you want to
keep the duration the lock is held as short as possible.

- Heikki

Attachment Content-Type Size
pglz_wal_update_records.patch text/x-diff 12.5 KB
pgbench-xlogtest.sh application/x-sh 423 bytes

From: Amit Kapila <amit(dot)kapila(at)huawei(dot)com>
To: "'Heikki Linnakangas'" <hlinnakangas(at)vmware(dot)com>
Cc: <pgsql-hackers(at)postgresql(dot)org>, <noah(at)leadboat(dot)com>
Subject: Re: Re: [WIP] Performance Improvement by reducing WAL for Update Operation
Date: 2012-09-27 13:08:39
Message-ID: 00aa01cd9cb1$36c29a40$a447cec0$@kapila@huawei.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> On Thursday, September 27, 2012 4:12 PM Heikki Linnakangas wrote:
> On 25.09.2012 18:27, Amit Kapila wrote:
> > If you feel it is must to do the comparison, we can do it in same way
> > as we identify for HOT?
>
> Yeah. (But as discussed, I think it would be even better to just treat
> the old and new tuple as an opaque chunk of bytes, and run them through
> a generic delta algorithm).
>

Thank you for the modified patch.

> The conclusion is that there isn't very much difference among the
> patches. They all squeeze the WAL to about the same size, and the
> increase in TPS is roughly the same.
>
> I think more performance testing is required. The modified pgbench test
> isn't necessarily very representative of a real-life application. The
> gain (or loss) of this patch is going to depend a lot on how many
> columns are updated, and in what ways. Need to test more scenarios,
> with many different database schemas.
>
> The LZ approach has the advantage that it can take advantage of all
> kinds of similarities between old and new tuple. For example, if you
> swap the values of two columns, LZ will encode that efficiently. Or if
> you insert a character in the middle of a long string. On the flipside,
> it's probably more expensive. Then again, you have to do a memcmp() to
> detect which columns have changed with your approach, and that's not
> free either. That was not yet included in the patch version I tested.
> Another consideration is that when you compress the record more, you
> have less data to calculate CRC for. CRC calculation tends to be quite
> expensive, so even quite aggressive compression might be a win. Yet
> another consideration is that the compression/encoding is done while
> holding a lock on the buffer. For the sake of concurrency, you want to
> keep the duration the lock is held as short as possible.

Now I shall do the various tests for following and post it here:
a. Attached Patch in the mode where it takes advantage of history tuple
b. By changing the logic for modified column calculation to use calculation
for memcmp()

With Regards,
Amit Kapila.


From: Amit Kapila <amit(dot)kapila(at)huawei(dot)com>
To: "'Noah Misch'" <noah(at)leadboat(dot)com>
Cc: <pgsql-hackers(at)postgresql(dot)org>, "'Heikki Linnakangas'" <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Subject: Re: [WIP] Performance Improvement by reducing WAL for Update Operation
Date: 2012-09-27 13:27:54
Message-ID: 00ab01cd9cb3$e720add0$b5620970$@kapila@huawei.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> On Thursday, September 27, 2012 9:12 AM Noah Misch wrote:
> On Mon, Sep 24, 2012 at 10:57:02AM +0000, Amit kapila wrote:
> > Rebased version of patch based on latest code.
>
> I like the direction you're taking with this patch; the gains are
> striking,
> especially considering the isolation of the changes.

Thank you for a detailed review of the patch.

> You cannot assume executor-unmodified columns are also unmodified from
> heap_update()'s perspective. Expansion in one column may instigate
> TOAST
> compression of a logically-unmodified column, and that counts as a
> change for
> xlog delta purposes. You do currently skip the optimization for
> relations
> having a TOAST table, but TOAST compression can still apply. Observe
> this
> with text columns of storage mode PLAIN. I see two ways out: skip the
> new
> behavior when need_toast=true, or compare all inline column data, not
> just
> what the executor modified. One can probably construct a benchmark
> favoring
> either choice. I'd lean toward the latter; wide tuples are the kind
> this
> change can most help. If the marginal advantage of ignoring known-
> unmodified
> columns proves important, we can always bring it back after designing a
> way to
> track which columns changed in the toaster.

You are right that it can give benefit for both ways, but we should also see
which approach can
give better results for most of the scenario's.
As in most cases of Update I have observed, the change in values will not
increase the length of value to too much.
OTOH I am not sure may be there are many more scenario's which change the
length of updated value which can lead to scenario explained by you above.

>
> Given that, why not treat the tuple as an opaque series of bytes and
> not worry
> about datum boundaries? When several narrow columns change together,
> say a
> sequence of sixteen smallint columns, you will use fewer binary delta
> commands
> by representing the change with a single 32-byte substitution. If an
> UPDATE
> changes just part of a long datum, the delta encoding algorithm will
> still be
> able to save considerable space. That case arises in many forms:
> changing
> one word in a long string, changing one element in a long array,
> changing one
> field of a composite-typed column. Granted, this makes the choice of
> delta
> encoding algorithm more important.
>
> Like Heikki, I'm left wondering why your custom delta encoding is
> preferable
> to an encoding from the literature. Your encoding has much in common
> with
> VCDIFF, even sharing two exact command names. If a custom encoding is
> the
> right thing, code comments or a README section should at least discuss
> the
> advantages over an established alternative. Idle thought: it might pay
> off to
> use 1-byte sizes and offsets most of the time. Tuples shorter than 256
> bytes
> are common; for longer tuples, we can afford wider offsets.

My apprehension was that it can affect the performance if do more work by
holding the lock.
If we use any standard technique like LZ of VCDiff, it has overhead of
comparison
and other things pertaining to their algorithm.
However using updated patch by Heikki, I can run the various performance
tests both for update operation as well as recovery.

With Regards,
Amit Kapila.


From: Amit Kapila <amit(dot)kapila(at)huawei(dot)com>
To: "'Heikki Linnakangas'" <hlinnakangas(at)vmware(dot)com>, <noah(at)leadboat(dot)com>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: [WIP] Performance Improvement by reducing WAL for Update Operation
Date: 2012-09-28 13:33:27
Message-ID: 007e01cd9d7d$d7f34690$87d9d3b0$@kapila@huawei.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> On Thursday, September 27, 2012 6:39 PM Amit Kapila wrote:
> > On Thursday, September 27, 2012 4:12 PM Heikki Linnakangas wrote:
> > On 25.09.2012 18:27, Amit Kapila wrote:
> > > If you feel it is must to do the comparison, we can do it in same
> way
> > > as we identify for HOT?
> >
> > Yeah. (But as discussed, I think it would be even better to just
> treat
> > the old and new tuple as an opaque chunk of bytes, and run them
> through
> > a generic delta algorithm).
> >
>
> Thank you for the modified patch.
>
> > The conclusion is that there isn't very much difference among the
> > patches. They all squeeze the WAL to about the same size, and the
> > increase in TPS is roughly the same.
> >
> > I think more performance testing is required. The modified pgbench
> test
> > isn't necessarily very representative of a real-life application. The
> > gain (or loss) of this patch is going to depend a lot on how many
> > columns are updated, and in what ways. Need to test more scenarios,
> > with many different database schemas.

I have done for few and planning for doing more.

> Now I shall do the various tests for following and post it here:
> a. Attached Patch in the mode where it takes advantage of history tuple
> b. By changing the logic for modified column calculation to use
> calculation
> for memcmp()

Attached documents contain data for following scenarios for both 'a' (LZ
compression patch) and 'b' (modified wal patch) patches:

1. Using fixed string (last few bytes are random) to update the column
values.
Total record length = 1800
Updated columns length = 250
2. Using random string to update the column values
Total record length = 1800
Updated columns length = 250

Observations -
1. With both patches performance increase is very good .
2. Almost same performance increase with both patches with slightly more
for LZ compression patch.
3. TPS is varying with LZ patch, but if we take average it is equivalent to
other patch.

Other Performance tests I am planning to conduct
1. Using bigger random string to update the column values
Total record length = 1800
Updated columns length = 250
2. Using fixed string (last few bytes are random) to update the column
values.
Total record length = 1800
Updated columns length = 50, 100, 500, 750, 1000, 1500, 1800
3. Recovery performance test as suggested by Noah
4. Complete testing for LZ compression patch using testcases defined for
original patch

Kindly suggest more performance test cases which can make findings concrete
or incase you feel
above is sufficient then please confirm.

With Regards,
Amit Kapila.

Attachment Content-Type Size
pgbench_wal_modified_and_lz_fixed_test.htm text/html 27.2 KB
pgbench_fixed.c application/octet-stream 63.4 KB
pgbench_random.c application/octet-stream 63.3 KB
pgbench_wal_modified_and_lz_random_test.htm text/html 27.5 KB

From: Amit Kapila <amit(dot)kapila(at)huawei(dot)com>
To: "'Heikki Linnakangas'" <hlinnakangas(at)vmware(dot)com>, <noah(at)leadboat(dot)com>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: [WIP] Performance Improvement by reducing WAL for Update Operation
Date: 2012-10-03 16:03:25
Message-ID: 001d01cda180$9f1e47a0$dd5ad6e0$@kapila@huawei.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Friday, September 28, 2012 7:03 PM Amit Kapila wrote:
> > On Thursday, September 27, 2012 6:39 PM Amit Kapila wrote:
> > > On Thursday, September 27, 2012 4:12 PM Heikki Linnakangas wrote:
> > > On 25.09.2012 18:27, Amit Kapila wrote:
> > > > If you feel it is must to do the comparison, we can do it in same
> > way
> > > > as we identify for HOT?
> > >
>
>
> > Now I shall do the various tests for following and post it here:
> > a. Attached Patch in the mode where it takes advantage of history
> > tuple b. By changing the logic for modified column calculation to use
> > calculation for memcmp()
>
>
> Attached documents contain data for following scenarios for both 'a' (LZ
> compression patch) and 'b' (modified wal patch) patches:
>
> 1. Using fixed string (last few bytes are random) to update the column
> values.
> Total record length = 1800
> Updated columns length = 250
> 2. Using random string to update the column values
> Total record length = 1800
> Updated columns length = 250
>
> Observations -
> 1. With both patches performance increase is very good .
> 2. Almost same performance increase with both patches with slightly
> more for LZ compression patch.
> 3. TPS is varying with LZ patch, but if we take average it is
> equivalent to other patch.

> Other Performance tests I am planning to conduct
> 1. Using bigger random string to update the column values
> Total record length = 1800
> Updated columns length = 250
> 2. Using fixed string (last few bytes are random) to update the column
> values.
> Total record length = 1800
> Updated columns length = 50, 100, 500, 750, 1000, 1500, 1800

1. Please find the results (pgbench_test.htm) for point -2 where there is
one fixed column updation (last few bytes are random) and second column
updation is 32 byte random string. The results for 50, 100 are still going
on others are attached with this mail.
2. Attached pgbench test code for a modification of 25 and 250 bytes record
size having total record length as 1800.
For the other record size modification tests, the schema is changed
accordingly.
3. Added a random string generation for updating some column data from 250
record modification test onwards.
CREATE OR REPLACE FUNCTION random_text_md5_v2(INTEGER)
RETURNS TEXT
LANGUAGE SQL
AS $$
select upper(
substring(
(
SELECT string_agg(md5(random()::TEXT), '')
FROM generate_series(1, CEIL($1 / 32.)::integer)
),
$1)
);
$$;
4. Observations
a. When the size of updated value is less, the performance is almost
same for both the patches.
b. When the size of updated value is more, the performance with LZ patch
is better.

> 3. Recovery performance test as suggested by Noah
Still not started.

> 4. Complete testing for LZ compression patch using testcases defined for
> original patch
a. During testing of LZ patch, few issues are found related to when the
updated record contains NULLS. Working on it to fix.

Any comments/suggestions regarding performance/functionality test?

With Regards,
Amit Kapila.

Attachment Content-Type Size
pgbench_test.htm text/html 96.3 KB
pgbench_25.c application/octet-stream 63.3 KB
pgbench_250.c application/octet-stream 63.5 KB

From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Amit Kapila <amit(dot)kapila(at)huawei(dot)com>
Cc: noah(at)leadboat(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: [WIP] Performance Improvement by reducing WAL for Update Operation
Date: 2012-10-04 07:24:19
Message-ID: 506D39A3.6090109@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 03.10.2012 19:03, Amit Kapila wrote:
> Any comments/suggestions regarding performance/functionality test?

Hmm. Doing a lot of UPDATEs concurrently can be limited by the
WALInsertLock, which each inserter holds while copying the WAL record to
the buffer. Reducing the size of the WAL records, by compression or
delta encoding, alleviates that bottleneck: when WAL records are
smaller, the lock needs to be held for a shorter duration. That improves
throughput, even if individual backends need to do more CPU work to
compress the records, because that work can be done in parallel. I
suspect much of the benefit you're seeing in these tests might be
because of that effect.

As it happens, I've been working on making WAL insertion scale better in
general:
http://archives.postgresql.org/message-id/5064779A.3050407@vmware.com.
That should also help most when inserting large WAL records. The
question is: assuming we commit the xloginsert-scale patch, how much
benefit is there left from the compression? It will surely still help to
reduce the size of WAL, which can certainly help if you're limited by
the WAL I/O, but I suspect the results from the pgbench tests you run
might look quite different.

So, could you rerun these tests with the xloginsert-scale patch applied?
Reducing the WAL size might still be a good idea even if the patch
doesn't have much effect on TPS, but I'd like to make sure that the
compression doesn't hurt performance. Also, it would be a good idea to
repeat the tests with just a single client; we don't want to hurt the
performance in that scenario either.

- Heikki


From: Amit Kapila <amit(dot)kapila(at)huawei(dot)com>
To: "'Heikki Linnakangas'" <hlinnakangas(at)vmware(dot)com>
Cc: <noah(at)leadboat(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: [WIP] Performance Improvement by reducing WAL for Update Operation
Date: 2012-10-04 10:56:30
Message-ID: 000c01cda21e$e9cfa2f0$bd6ee8d0$@kapila@huawei.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> On Thursday, October 04, 2012 12:54 PM Heikki Linnakangas
> On 03.10.2012 19:03, Amit Kapila wrote:
> > Any comments/suggestions regarding performance/functionality test?
>
> Hmm. Doing a lot of UPDATEs concurrently can be limited by the
> WALInsertLock, which each inserter holds while copying the WAL record to
> the buffer. Reducing the size of the WAL records, by compression or
> delta encoding, alleviates that bottleneck: when WAL records are
> smaller, the lock needs to be held for a shorter duration. That improves
> throughput, even if individual backends need to do more CPU work to
> compress the records, because that work can be done in parallel. I
> suspect much of the benefit you're seeing in these tests might be
> because of that effect.
>
> As it happens, I've been working on making WAL insertion scale better in
> general:
> http://archives.postgresql.org/message-id/5064779A.3050407@vmware.com.
> That should also help most when inserting large WAL records. The
> question is: assuming we commit the xloginsert-scale patch, how much
> benefit is there left from the compression? It will surely still help to
> reduce the size of WAL, which can certainly help if you're limited by
> the WAL I/O, but I suspect the results from the pgbench tests you run
> might look quite different.
>
> So, could you rerun these tests with the xloginsert-scale patch applied?

I shall take care of doing the performance test with xloginsert-scale patch
as well
both for single and multi-thread.

With Regards,
Amit Kapila.


From: Amit kapila <amit(dot)kapila(at)huawei(dot)com>
To: "'Heikki Linnakangas'" <hlinnakangas(at)vmware(dot)com>, "noah(at)leadboat(dot)com" <noah(at)leadboat(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: [WIP] Performance Improvement by reducing WAL for Update Operation
Date: 2012-10-04 14:33:53
Message-ID: 6C0B27F7206C9E4CA54AE035729E9C38285374E2@szxeml509-mbs
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wednesday, October 03, 2012 9:33 PM Amit Kapila wrote:
On Friday, September 28, 2012 7:03 PM Amit Kapila wrote:
> > On Thursday, September 27, 2012 6:39 PM Amit Kapila wrote:
> > > On Thursday, September 27, 2012 4:12 PM Heikki Linnakangas wrote:
> > > On 25.09.2012 18:27, Amit Kapila wrote:
> > > > If you feel it is must to do the comparison, we can do it in same
> > way
> > > > as we identify for HOT?
> > >
>
>
> > Now I shall do the various tests for following and post it here:
> > a. Attached Patch in the mode where it takes advantage of history
> > tuple b. By changing the logic for modified column calculation to use
> > calculation for memcmp()
>
>

> 1. Please find the results (pgbench_test.htm) for point -2 where there is
> one fixed column updation (last few bytes are random) and second column
> updation is 32 byte random string. The results for 50, 100 are still going
> on others are attached with this mail.

The results for updated record size (50,100) is attached with this mail

Observations
a. The performance is comparable for both approaches

>> 4. Complete testing for LZ compression patch using testcases defined for
>> original patch
> a. During testing of LZ patch, few issues are found related to when the updated record contains NULLS. Working on it to fix.

The problems were that
a. offset calculation during compression is based on input buffer [new tuple] and oldtuple [history]. Offset should be relative to history end.
In normal LZ compression always input buffer and history will be adjacent, so there is no problem for it.
b. The new tuple contents should not be added to history buffer as the addresses will be different for new tuple and history. So it will make offset
calculation go wrong.

Patch containing fix of above problems is attached with this mail.

With Regards,
Amit Kapila.

Attachment Content-Type Size
pgbench_test_50_and_100.htm text/html 36.6 KB
pglz_wal_update_v1.patch application/octet-stream 20.0 KB

From: Amit kapila <amit(dot)kapila(at)huawei(dot)com>
To: "'Heikki Linnakangas'" <hlinnakangas(at)vmware(dot)com>, "noah(at)leadboat(dot)com" <noah(at)leadboat(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: [WIP] Performance Improvement by reducing WAL for Update Operation
Date: 2012-10-06 14:04:27
Message-ID: 6C0B27F7206C9E4CA54AE035729E9C3828537853@szxeml509-mbs
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thursday, October 04, 2012 8:03 PM Heikki Linnakangas wrote:
On Wednesday, October 03, 2012 9:33 PM Amit Kapila wrote:
On Friday, September 28, 2012 7:03 PM Amit Kapila wrote:
> > On Thursday, September 27, 2012 6:39 PM Amit Kapila wrote:
> > > On Thursday, September 27, 2012 4:12 PM Heikki Linnakangas wrote:
> > > On 25.09.2012 18:27, Amit Kapila wrote:
> > > > If you feel it is must to do the comparison, we can do it in same
> > way
> > > > as we identify for HOT?
> > >
>
>
> > Now I shall do the various tests for following and post it here:
> > a. Attached Patch in the mode where it takes advantage of history
> > tuple b. By changing the logic for modified column calculation to use
> > calculation for memcmp()
>
>

> 1. Please find the results (pgbench_test.htm) for point -2 where there is
> one fixed column updation (last few bytes are random) and second column
> updation is 32 byte random string. The results for 50, 100 are still going
> on others are attached with this mail.

Please find the readings of LZ patch along with Xlog-Scale patch.
The comparison is between for Update operations
base code + Xlog Scale Patch
base code + Xlog Scale Patch + Update WAL Optimization (LZ compression)

The readings have been taken based on below data.
pgbench_xlog_scale_50 -
a. Updated Record size 50, Total Record size 1800
b. Threads 8, 1 ,2
c. Synchronous_commit - off, on

pgbench_xlog_scale_250 -
a. Updated Record size 250, Total Record size 1800
b. Threads 8, 1 ,2
c. Synchronous_commit - off, on

pgbench_xlog_scale_500-
a. Updated Record size 500, Total Record size 1800
b. Threads 8, 1 ,2
c. Synchronous_commit - off, on

Observations
--------------
a. There is still a good performance improvement even if we do Update WAL optimization on top of Xlog Sclaing Patch.
b. There is a slight performance dip for 1 thread (only in Sync mode = off) with Update WAL optimization (LZ compression)
but for 2 threads there is a performance increase.

With Regards,
Amit Kapila.


From: Amit kapila <amit(dot)kapila(at)huawei(dot)com>
To: "'Heikki Linnakangas'" <hlinnakangas(at)vmware(dot)com>, "noah(at)leadboat(dot)com" <noah(at)leadboat(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: [WIP] Performance Improvement by reducing WAL for Update Operation
Date: 2012-10-06 14:06:42
Message-ID: 6C0B27F7206C9E4CA54AE035729E9C3828537860@szxeml509-mbs
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Sorry, forgot to attach performance data. Its attached in this mail.

________________________________________
From: Amit kapila
Sent: Saturday, October 06, 2012 7:34 PM
To: 'Heikki Linnakangas'; noah(at)leadboat(dot)com
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: RE: [HACKERS] Re: [WIP] Performance Improvement by reducing WAL for Update Operation

On Thursday, October 04, 2012 8:03 PM Heikki Linnakangas wrote:
On Wednesday, October 03, 2012 9:33 PM Amit Kapila wrote:
On Friday, September 28, 2012 7:03 PM Amit Kapila wrote:
> > On Thursday, September 27, 2012 6:39 PM Amit Kapila wrote:
> > > On Thursday, September 27, 2012 4:12 PM Heikki Linnakangas wrote:
> > > On 25.09.2012 18:27, Amit Kapila wrote:
> > > > If you feel it is must to do the comparison, we can do it in same
> > way
> > > > as we identify for HOT?
> > >
>
>
> > Now I shall do the various tests for following and post it here:
> > a. Attached Patch in the mode where it takes advantage of history
> > tuple b. By changing the logic for modified column calculation to use
> > calculation for memcmp()
>
>

> 1. Please find the results (pgbench_test.htm) for point -2 where there is
> one fixed column updation (last few bytes are random) and second column
> updation is 32 byte random string. The results for 50, 100 are still going
> on others are attached with this mail.

Please find the readings of LZ patch along with Xlog-Scale patch.
The comparison is between for Update operations
base code + Xlog Scale Patch
base code + Xlog Scale Patch + Update WAL Optimization (LZ compression)

The readings have been taken based on below data.
pgbench_xlog_scale_50 -
a. Updated Record size 50, Total Record size 1800
b. Threads 8, 1 ,2
c. Synchronous_commit - off, on

pgbench_xlog_scale_250 -
a. Updated Record size 250, Total Record size 1800
b. Threads 8, 1 ,2
c. Synchronous_commit - off, on

pgbench_xlog_scale_500-
a. Updated Record size 500, Total Record size 1800
b. Threads 8, 1 ,2
c. Synchronous_commit - off, on

Observations
--------------
a. There is still a good performance improvement even if we do Update WAL optimization on top of Xlog Sclaing Patch.
b. There is a slight performance dip for 1 thread (only in Sync mode = off) with Update WAL optimization (LZ compression)
but for 2 threads there is a performance increase.

With Regards,
Amit Kapila.

Attachment Content-Type Size
pgbench_xlog_scale_50.htm text/html 51.1 KB
pgbench_xlog_scale_250.htm text/html 51.5 KB
pgbench_xlog_scale_500.htm text/html 51.1 KB

From: Amit kapila <amit(dot)kapila(at)huawei(dot)com>
To: "'Heikki Linnakangas'" <hlinnakangas(at)vmware(dot)com>, "noah(at)leadboat(dot)com" <noah(at)leadboat(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: [WIP] Performance Improvement by reducing WAL for Update Operation
Date: 2012-10-08 14:56:12
Message-ID: 6C0B27F7206C9E4CA54AE035729E9C3828538FB9@szxeml509-mbs
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Saturday, October 06, 2012 7:34 PM Amit kapila wrote:
On Thursday, October 04, 2012 8:03 PM Heikki Linnakangas wrote:
On Wednesday, October 03, 2012 9:33 PM Amit Kapila wrote:
On Friday, September 28, 2012 7:03 PM Amit Kapila wrote:
> > On Thursday, September 27, 2012 6:39 PM Amit Kapila wrote:
> > > On Thursday, September 27, 2012 4:12 PM Heikki Linnakangas wrote:
> > > On 25.09.2012 18:27, Amit Kapila wrote:

>> 1. Please find the results (pgbench_test.htm) for point -2 where there is
>> one fixed column updation (last few bytes are random) and second column
>> updation is 32 byte random string. The results for 50, 100 are still going
>> on others are attached with this mail.

> Please find the readings of LZ patch along with Xlog-Scale patch.
> The comparison is between for Update operations
> base code + Xlog Scale Patch
> base code + Xlog Scale Patch + Update WAL Optimization (LZ compression)

Please find attached the recovery performance data and
updated patch to handle the unaligned access of PGLZ_Header in decompress by copying the header part to the local aligned address.

Recovery Performance
----------------------------
1. The recovery performance is also better with LZ compression Patch.

Please do let me know if any more data or test is required for this patch.

With Regards,
Amit Kapila.

Attachment Content-Type Size
pgbench_recovery_benchmark.htm text/html 29.1 KB
pglz_wal_update_v2.patch application/octet-stream 20.3 KB

From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Amit kapila <amit(dot)kapila(at)huawei(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, 'Heikki Linnakangas' <heikki(dot)linnakangas(at)enterprisedb(dot)com>, "noah(at)leadboat(dot)com" <noah(at)leadboat(dot)com>
Subject: Re: Re: [WIP] Performance Improvement by reducing WAL for Update Operation
Date: 2012-10-23 18:45:09
Message-ID: 20121023184509.GB8724@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Amit kapila wrote:

> Rebased version of patch based on latest code.

Uhm, how can this patch change a caller of PageAddItem() by adding one
more argument, yet not touch bufpage.c at all? Are you sure this
compiles?

The email subject has a WIP tag; is that still the patch status? If so,
I assume it's okay to mark this Returned with Feedback and expect a
later version to be posted.

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Amit kapila <amit(dot)kapila(at)huawei(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: [WIP] Performance Improvement by reducing WAL for Update Operation
Date: 2012-10-24 06:06:45
Message-ID: 6C0B27F7206C9E4CA54AE035729E9C3828542FC6@szxeml509-mbx
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wednesday, October 24, 2012 12:15 AM Alvaro Herrera wrote:

Amit kapila wrote:

> Rebased version of patch based on latest code.

> Uhm, how can this patch change a caller of PageAddItem() by adding one
> more argument, yet not touch bufpage.c at all? Are you sure this
> compiles?

It compiles, the same is confirmed even with latest Head.
Can you please point me if you feel something is done wrong in the patch.

> The email subject has a WIP tag; is that still the patch status? If so,
> I assume it's okay to mark this Returned with Feedback and expect a
> later version to be posted.

The WIP word is from original mail chain discussion. The current status is as follows:
I have update the patch with all bug fixes and performance results were posted. Noah has also taken the performance data.
He believes that there is discrepency in performance data, but actually the reason according to me is just the way I have posted the data.

Currently there is no clear feedback on which I can work, So I would be very thankfull to you if you can wait for some conclusion of the discussion.

With Regards,
Amit Kapila.