Re: Performance Improvement by reducing WAL for Update Operation

From: Noah Misch <noah(at)leadboat(dot)com>
To: Amit Kapila <amit(dot)kapila(at)huawei(dot)com>
Cc: hlinnakangas(at)vmware(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Performance Improvement by reducing WAL for Update Operation
Date: 2012-10-27 18:04:01
Message-ID: 20121027180401.GA1870@tornado.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Oct 27, 2012 at 04:57:46PM +0530, Amit Kapila wrote:
> On Saturday, October 27, 2012 4:03 AM Noah Misch wrote:
> > Could you elaborate on your reason for continuing to treat TOAST as a
> > special
> > case? As best I recall, the only reason to do so before was the fact
> > that
> > TOAST can change the physical representation of a column even when
> > executor
> > did not change its logical content. Since you're no longer relying on
> > the
> > executor's opinion of what changed, a TOASTed value is not special.
>
> I thought for initial version of patch, without this change, patch will have
> less impact and less test.

Not that I'm aware. If you still think so, please explain.

> For this patch I am interested to go with delta encoding approach based on
> column boundaries.

Fair enough.

> > If you conclude that finding sub-column similarity is not worthwhile, at
> > least
> > teach your algorithm to aggregate runs of changing or unchanging columns
> > into
> > fewer delta instructions. If a table contains twenty unchanging bool
> > columns,
> > you currently use at least 80 bytes to encode that fact. By treating
> > the run
> > of columns as a unit for delta encoding purposes, you could encode it in
> > 23
> > bytes.
>
> Do you mean to say handle for non-continuous unchanged columns?

My statement above was a mess.

> I believe for continuous unchanged columns its already handled until there
> are any alignment changes. Example
>
> create table tbl(f1 int, f2 bool, f3 bool, f4 bool, f5 bool, f6 bool, f7
> bool,
> f8 bool, f9 bool, f10 bool, f11 bool, f12 bool, f13 bool,
> f14 bool, f15 bool, f16 bool, f17 bool, f18 bool, f19 bool,
>
> f20 bool, f21 bool);
>
> insert into tbl values(10,
> 't','t','t','t','t','t','t','t','t','t','t','t','t','t','t','t','t','t','t',
> 't');
>
> update tbl set f1 = 20;
>
> The delta algorithm for the above operation reduced the size of the tuple
> from 24 bytes to 12 bytes.
>
> 4 bytes - IGN command and LEN
> 4 bytes - ADD command and LEN
> 4 bytes - Data block

I now see that this case is already handled. Sorry for the noise.
Incidentally, I tried this variant:

create table tbl(f1 int, f2 bool, f3 bool, f4 bool, f5 bool, f6 bool, f7 bool,
f8 bool, f9 bool, f10 bool, f11 bool, f12 bool, f13 bool,
f14 bool, f15 bool, f16 bool, f17 bool, f18 bool, f19 bool,
f20 bool, f21 bool, f22 int, f23 int);
insert into tbl values(1,
't','t','t','t','t','t','t','t','t','t','t','t','t','t','t','t','t','t','t',
't', 2, 3);
update tbl set f1 = 2, f22 = 4, f23 = 6;

It yielded an erroneous delta: IGN 4, ADD 4, COPY 24, IGN 4, ADD 4, COPY 28,
IGN 4, ADD 4. (The delta happens to be longer than the data and goes unused).

nm

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2012-10-27 18:41:51 Re: Logical to physical page mapping
Previous Message Pavel Stehule 2012-10-27 17:23:38 Re: proposal - assign result of query to psql variable