Re: HOT patch - version 11

Lists: pgsql-hackerspgsql-patches
From: "Pavan Deolasee" <pavan(dot)deolasee(at)gmail(dot)com>
To: PostgreSQL-patches <pgsql-patches(at)postgresql(dot)org>
Subject: HOT patch - version 11
Date: 2007-08-01 09:06:06
Message-ID: 2e78013d0708010206v344fe552v47f6ad990877396@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Hi All,

Please see the attached version 11 of HOT patch

The concept of marking the pruned tuples with LP_DELETE and
reusing such tuples for subsequent UPDATEs has been removed
and replaced with a simpler mechanism of repairing the page
fragmentation as and when possible.

The logic of chain pruning has been simplified a lot. In addition, there
are fewer new WAL log records. We rather reuse the existing WAL
record types to log the operations.

Few 4 hour DBT2 runs have confirmed that this simplification hasn't
taken away any performance gains, rather we are seeing better performance
with the changes. The gain can be attributed to the fact that now more
HOT updates are possible even if the tuple length changes between
updates (since we do the complete page defragmentation)

The changes are mostly isolated in the above area apart from some
stray bug fixes.

Thanks,
Pavan

--
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com

Attachment Content-Type Size
PG83-HOT-combo-v11.0.patch.gz application/x-gzip 48.3 KB

From: "Simon Riggs" <simon(at)2ndquadrant(dot)com>
To: "Pavan Deolasee" <pavan(dot)deolasee(at)gmail(dot)com>
Cc: "PostgreSQL-patches" <pgsql-patches(at)postgresql(dot)org>
Subject: Re: HOT patch - version 11
Date: 2007-08-01 15:27:23
Message-ID: 1185982043.4174.47.camel@ebony.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Wed, 2007-08-01 at 14:36 +0530, Pavan Deolasee wrote:

> Please see the attached version 11 of HOT patch
>
> The concept of marking the pruned tuples with LP_DELETE and
> reusing such tuples for subsequent UPDATEs has been removed
> and replaced with a simpler mechanism of repairing the page
> fragmentation as and when possible.

OK, sounds good.

Some comments:

BufferIsLockedForCleanup() should be named BufferIsAvilableForCleanup().
There is no cleanup mode, what we mean is that there is only one pin;
the comments say "If we are lucky enough to have already acquired the
cleanup lock", which made me look for the point where we had executed
LockBufferForCleanup(), which we never do.

heap_page_prune() needs to comment what prune_hard means. I think you
mean true=prune page, false=prune just this hot chain. It might be
better to have a prune type so that the code reads as PRUNE_HOT_CHAIN or
PRUNE_WHOLE_PAGE. Does it means the same thing as in
heap_prune_tuplechain()?

I'm concerned that HeapTupleSatisfiesAbort() may be very expensive and
yet almost never return true. Do we need this code at this point?

It would be good to put some explanatory notes somewhere.
Do all calls to PageGetFreeSpace() get replaced by
PageHeapGetFreeSpace()?

To answer some of your XXXs:

XXX Should we be a bit conservative here ?
At the place you ask this scan->rs_pageatatime is true, so we're going
to scan the whole page. So any new updates would be setting hint bits
anyway, so we are going to end up with a dirty page whether you do this
or not, so I think yes, prune the whole page.

XXX Should we set PageSetPrunable on this page ? If the transaction
aborts, there will be a prunable tuple in this page.
Think you need to explain a little more...but we probably shouldn't tune
for aborts.

XXX Should we set hint on the newbuf as well ? If the transaction
aborts, there would be a prunable tuple in the newbuf.
Think you need to explain a little more...but we probably shouldn't tune
for aborts.

XXX Should we update the FSM information of this page ?
(after pruning)
No. Updating FSM would only take place if updates are taking place on
this block. We would make this page a target for other INSERTs and
UPDATEs, so would effectively bring more contention onto those pages.
Bad Thing.

XXX We may want to tweak the percentage value below:
(1.2 * RelationGetAvgFSM(relation)
Not sure what difference the 1.2 makes...

So that means I agree with all of your XXXs apart from the last one.

> The logic of chain pruning has been simplified a lot. In addition,
> there
> are fewer new WAL log records. We rather reuse the existing WAL
> record types to log the operations.

I can't find any mention of XLogInserts for the new record types.

e.g. XLOG_HEAP2_PRUNEHOT is only mentioned on one line in this patch.

I this a mistake, or could you explain?

> Few 4 hour DBT2 runs have confirmed that this simplification hasn't
> taken away any performance gains, rather we are seeing better
> performance
> with the changes. The gain can be attributed to the fact that now
> more
> HOT updates are possible even if the tuple length changes between
> updates (since we do the complete page defragmentation)
>
> The changes are mostly isolated in the above area apart from some
> stray bug fixes.

It would be good to see the results...

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com


From: "Pavan Deolasee" <pavan(dot)deolasee(at)gmail(dot)com>
To: "Simon Riggs" <simon(at)2ndquadrant(dot)com>
Cc: PostgreSQL-patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: HOT patch - version 11
Date: 2007-08-01 15:57:47
Message-ID: 2e78013d0708010857q6a51b54al3aff45332453c47b@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On 8/1/07, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>
> On Wed, 2007-08-01 at 14:36 +0530, Pavan Deolasee wrote:
>
>
> BufferIsLockedForCleanup() should be named BufferIsAvilableForCleanup().
> There is no cleanup mode, what we mean is that there is only one pin;
> the comments say "If we are lucky enough to have already acquired the
> cleanup lock", which made me look for the point where we had executed
> LockBufferForCleanup(), which we never do.

Ok. I am open to suggestions here as well as other parts of the
code about naming.

heap_page_prune() needs to comment what prune_hard means. I think you
> mean true=prune page, false=prune just this hot chain. It might be
> better to have a prune type so that the code reads as PRUNE_HOT_CHAIN or
> PRUNE_WHOLE_PAGE. Does it means the same thing as in
> heap_prune_tuplechain()?

No. prune_hard means remove all RECENTLY_DEAD tuples before
a DEFINITELY DEAD tuple. We use this only for VACUUM FULL.

The pruneoff is set to InvalidOffsetNumber to prune all tuples chains in
the page.

I'm concerned that HeapTupleSatisfiesAbort() may be very expensive and
> yet almost never return true. Do we need this code at this point?

We actually run SatisfiesVacuum anyways on all the tuples. We
should expect hint bits to be set and hence the call may not be
expensive. We need this to reclaims aborted heap-only tuples
which otherwise are not reachable.

It would be good to put some explanatory notes somewhere.
> Do all calls to PageGetFreeSpace() get replaced by
> PageHeapGetFreeSpace()?

Ok. Would do that. Right now, PageHeapGetFreeSpace is used
for heap pages.

To answer some of your XXXs:
>
>
>
> XXX Should we set PageSetPrunable on this page ? If the transaction
> aborts, there will be a prunable tuple in this page.
> Think you need to explain a little more...but we probably shouldn't tune
> for aborts.

If the INSERTing transaction aborts, we are left with a DEAD
tuple in the page which can be pruned and removed. So the
question is whether we should set the hint bit. If we don't set
the hint bit, the aborted DEAD tuple would stay there till the
next vacuum or some other tuple in the page is UPDATED/DELETEd
and the page is cleaned up.

XXX Should we set hint on the newbuf as well ? If the transaction
> aborts, there would be a prunable tuple in the newbuf.
> Think you need to explain a little more...but we probably shouldn't tune
> for aborts.

Same as above.

XXX Should we update the FSM information of this page ?
> (after pruning)
> No. Updating FSM would only take place if updates are taking place on
> this block. We would make this page a target for other INSERTs and
> UPDATEs, so would effectively bring more contention onto those pages.
> Bad Thing.

On the other hand, if we don't update FSM all COLD updates may
result in extending the relation even though there is some free space
available elsewhere in the relation. One option could be to leave
fillfactor worth of space and update FSM accordingly.

XXX We may want to tweak the percentage value below:
> (1.2 * RelationGetAvgFSM(relation)
> Not sure what difference the 1.2 makes...

Thats a quick hack. I thought of leaving some margin because
avg FSM request size is a moving average and there is a good
chance to err.

So that means I agree with all of your XXXs apart from the last one.
>
> > The logic of chain pruning has been simplified a lot. In addition,
> > there
> > are fewer new WAL log records. We rather reuse the existing WAL
> > record types to log the operations.
>
> I can't find any mention of XLogInserts for the new record types.
>
> e.g. XLOG_HEAP2_PRUNEHOT is only mentioned on one line in this patch.
>
> I this a mistake, or could you explain?

Sorry, thats just a left over from the previous version. I shall clean
that up.

> It would be good to see the results...

I shall post them soon.

Thanks,
Pavan

--
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com


From: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
To: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
Cc: PostgreSQL-patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: HOT patch - version 11
Date: 2007-08-01 20:09:06
Message-ID: 46B0E862.1030003@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Pavan Deolasee wrote:
> Please see the attached version 11 of HOT patch

Thanks!

One wrinkle in the patch is how the ResultRelInfo-struct is passed to
heap_update, and on to heap_check_idxupdate, to check any indexed
columns have changed. I think that's a modularity violation, heap_update
really shouldn't have to deal with ResultRelInfo, that belongs in the
executor. When we add support for expression and partial indexes,
heap_check_idxupdate will need even more executor machinery to be able
to evaluate expressions.

In heap_page_prune_defrag, it would be better to do the test for
BufferIsLockedForCleanup right after acquiring the lock. The longer the
delay between those steps, the bigger the chances that someone pins the
page and starts to wait for the buffer lock, making us think that we
didn't get the cleanup lock, though we actually did. Maybe a nicer
solution would be to have another version of ConditionalLockBuffer with
three different return values: didn't get lock, got exclusive lock, or
got cleanup lock.

It's not necessary to WAL-log the unused-array that
PageRepairFragmentation returns. In replay, a call to
PageRepairFragmentation will come to the same conclusion about which
line pointers are not used. It would also be better if we didn't emit a
separate WAL record for defraging a page, if we also prune it at the
same time. I'm not that worried about WAL usage in general, but that
seems simple enough to fix.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: "Pavan Deolasee" <pavan(dot)deolasee(at)gmail(dot)com>
To: "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>
Cc: PostgreSQL-patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: HOT patch - version 11
Date: 2007-08-02 06:53:26
Message-ID: 2e78013d0708012353n7740f68fodd1ba369f520d008@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On 8/2/07, Heikki Linnakangas <heikki(at)enterprisedb(dot)com> wrote:
>
> Pavan Deolasee wrote:
> > Please see the attached version 11 of HOT patch
>
> Thanks!
>
> One wrinkle in the patch is how the ResultRelInfo-struct is passed to
> heap_update, and on to heap_check_idxupdate, to check any indexed
> columns have changed. I think that's a modularity violation, heap_update
> really shouldn't have to deal with ResultRelInfo, that belongs in the
> executor. When we add support for expression and partial indexes,
> heap_check_idxupdate will need even more executor machinery to be able
> to evaluate expressions.

The reason I put it there because we wanted to do that check
as late as possible, once we confirm that update is possible and
there is enough space in the block to perform HOT update. But
I agree thats a modularity violation. Any suggestion to avoid that ?

In heap_page_prune_defrag, it would be better to do the test for
> BufferIsLockedForCleanup right after acquiring the lock. The longer the
> delay between those steps, the bigger the chances that someone pins the
> page and starts to wait for the buffer lock, making us think that we
> didn't get the cleanup lock, though we actually did. Maybe a nicer
> solution would be to have another version of ConditionalLockBuffer with
> three different return values: didn't get lock, got exclusive lock, or
> got cleanup lock.

Thats a good idea. I shall do that.

It's not necessary to WAL-log the unused-array that
> PageRepairFragmentation returns. In replay, a call to
> PageRepairFragmentation will come to the same conclusion about which
> line pointers are not used. It would also be better if we didn't emit a
> separate WAL record for defraging a page, if we also prune it at the
> same time. I'm not that worried about WAL usage in general, but that
> seems simple enough to fix.

Ah I see. I shall fix that.

Thanks,
Pavan

--
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com


From: "Pavan Deolasee" <pavan(dot)deolasee(at)gmail(dot)com>
To: "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>
Cc: PostgreSQL-patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: HOT patch - version 11
Date: 2007-08-02 07:33:08
Message-ID: 2e78013d0708020033r64bb7522ge3221a2c33fd4eab@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On 8/2/07, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com> wrote:
>
>
>
> On 8/2/07, Heikki Linnakangas <heikki(at)enterprisedb(dot)com> wrote:
> >
> > Maybe a nicer
> > solution would be to have another version of ConditionalLockBuffer with
> > three different return values: didn't get lock, got exclusive lock, or
> > got cleanup lock.
>
>
>
> Thats a good idea. I shall do that.
>
>

On a second thought, I feel its may not be such a good idea to change
the ConditionalLockBuffer return value. "boolean" is the most natural
way. And we don't save anything in terms on BufHdr locking.

So may be should just have two different functions and do the
BufferIsLockedForCleanup check immediately after acquiring the
exclusive lock.

Thanks,
Pavan

--
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com


From: "Simon Riggs" <simon(at)2ndquadrant(dot)com>
To: "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>
Cc: "Pavan Deolasee" <pavan(dot)deolasee(at)gmail(dot)com>, "PostgreSQL-patches" <pgsql-patches(at)postgresql(dot)org>
Subject: Re: HOT patch - version 11
Date: 2007-08-02 14:37:32
Message-ID: 1186065452.4161.2.camel@ebony.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Wed, 2007-08-01 at 21:09 +0100, Heikki Linnakangas wrote:

> In heap_page_prune_defrag, it would be better to do the test for
> BufferIsLockedForCleanup right after acquiring the lock. The longer the
> delay between those steps, the bigger the chances that someone pins the
> page and starts to wait for the buffer lock, making us think that we
> didn't get the cleanup lock, though we actually did. Maybe a nicer
> solution would be to have another version of ConditionalLockBuffer with
> three different return values: didn't get lock, got exclusive lock, or
> got cleanup lock.

Yeh, 3-value return seems neatest way.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com


From: "Pavan Deolasee" <pavan(dot)deolasee(at)gmail(dot)com>
To: "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>
Cc: PostgreSQL-patches <pgsql-patches(at)postgresql(dot)org>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: HOT patch - version 11
Date: 2007-08-07 18:01:44
Message-ID: 2e78013d0708071101g4aa0eb19we23807442098bddb@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On 8/2/07, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com> wrote:
>
>
>
>
> . It would also be better if we didn't emit a
> > separate WAL record for defraging a page, if we also prune it at the
> > same time. I'm not that worried about WAL usage in general, but that
> > seems simple enough to fix.
>
>
>
> Ah I see. I shall fix that.
>

When I started making this change, I realized that we need the
second WAL record because if the block is backed up in pruning
WAL write, we may never call PageRepairFragmentation during
the redo phase. Of course, we can fix that by making
heap_xlog_clean always repair page fragmentation irrespective
of whether the block was backed up, but that doesn't seem like
a good solution.

Thanks,
Pavan

--
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com