Quick Links

Re: Set visibility map bit after HOT prune

Lists:	pgsql-hackers

From:	Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
To:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Set visibility map bit after HOT prune
Date:	2012-12-15 12:42:13
Message-ID:	CABOikdN3_FFb4qbaSrNhn-SCV2kNS63aGVimRbUzvS9YUFVHNQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

We discussed this idea in the past [1] and Robert recently again
mentioned this in another thread [2]. Please see a rebased/revised
patch attached with the mail. This is mostly similar to what I's
submitted in [1] except some additions to also compute visibility
cut-off XID. I also removed a warning that I'd added to the previous
patch to report the case when the page's all-visible bit is already
set, but our HOT prune scan finds it otherwise. This is not to hide
the warning the earlier reviewer had reported, but I think its not
required because we do those consistency checks at other places
anyways.

I've run several rounds of pgbench (-s 10 -c 10 -T 300) and did not
find any issues. I don't see noticeable performance again/drop. But
again pgbench may not be the most suitable test benchmark to test
this. I think we will see positive differences in vacuum scans and
also queries that are benefited from index-only scans. An early
setting of the visibility map bit can help those two scenarios.

Thanks,
Pavan

1. http://archives.postgresql.org/pgsql-hackers/2010-02/msg02344.php
2. http://archives.postgresql.org/message-id/CA+TgmoZHWN1+N_CGD3hx=DJPHkd69c8x2r2EiQ5=c8yxNrc8wA@mail.gmail.com

--
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee

Attachment	Content-Type	Size
hot-prune-set-all-visible-v2.patch	application/octet-stream	7.4 KB

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Set visibility map bit after HOT prune
Date:	2012-12-15 16:45:36
Message-ID:	CA+U5nMLTWOUYmp_yr+-dTObLVy39+Z5aqWtc_8oFLhZREA8-zQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 15 December 2012 12:42, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com> wrote:
> We discussed this idea in the past [1] and Robert recently again
> mentioned this in another thread [2]. Please see a rebased/revised
> patch attached with the mail. This is mostly similar to what I's
> submitted in [1] except some additions to also compute visibility
> cut-off XID. I also removed a warning that I'd added to the previous
> patch to report the case when the page's all-visible bit is already
> set, but our HOT prune scan finds it otherwise. This is not to hide
> the warning the earlier reviewer had reported, but I think its not
> required because we do those consistency checks at other places
> anyways.

Doing that only makes sense when we're running a SELECT. Setting the
all visible bit immediately prior to an UPDATE that clears it again is
pointless effort, generating extra work for no reason.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc:	Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Set visibility map bit after HOT prune
Date:	2012-12-15 21:48:08
Message-ID:	29275.1355608088@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
> Doing that only makes sense when we're running a SELECT. Setting the
> all visible bit immediately prior to an UPDATE that clears it again is
> pointless effort, generating extra work for no reason.

On the other hand, the HOT prune operation itself is worthless when
we're running a SELECT. The only reason we do it that way is that we
have to prune before the query starts to use the page, else pruning
might invalidate pointers-to-tuples that are being held within the
query plan tree.

Maybe it's time to look at what it'd take for the low-level scan
operations to know whether they're scanning the target relation of
an UPDATE query, so that we could skip pruning altogether except
when a HOT update could conceivably ensue. I think this was discussed
back when HOT went in, but nobody wanted to make the patch more invasive
than it had to be.

regards, tom lane

From:	Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Set visibility map bit after HOT prune
Date:	2012-12-16 07:53:56
Message-ID:	CABOikdO0R8uVx5mS52M24PzPfp-RG3aa5SxCBe93g-QkkuKazQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sun, Dec 16, 2012 at 3:18 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
>> Doing that only makes sense when we're running a SELECT. Setting the
>> all visible bit immediately prior to an UPDATE that clears it again is
>> pointless effort, generating extra work for no reason.
>
> On the other hand, the HOT prune operation itself is worthless when
> we're running a SELECT. The only reason we do it that way is that we
> have to prune before the query starts to use the page, else pruning
> might invalidate pointers-to-tuples that are being held within the
> query plan tree.
>

Right. HOT prune may and often would be called in the SELECT path or
heap/index scan leading to UPDATE/DELETE. But whenever its called, it
looks like a good idea to set the visibility map bit. There is some
additional overhad to check if a LIVE tuple is all-visible or not,
but that doesn't look too much. I did run some pgbench tests for fully
cached tables and did not see any difference in tps.

Another idea could have been to NOT clear the visibility bit when a
HOT update happens. Such tuple can get pruned by HOT prune, so we
don't need vacuum per se, and the index-only scans are not affected
because the update was a HOT update, so the index keys did not change
either. So index-only scans would continue to return the same result.
Don't know if this would work with hot standby, probably not.

> Maybe it's time to look at what it'd take for the low-level scan
> operations to know whether they're scanning the target relation of
> an UPDATE query, so that we could skip pruning altogether except
> when a HOT update could conceivably ensue.

Yeah, we could do that. We may not be able to predict whether the
coming update is HOT or not, but I don't think that matters. OTOH its
probably not too bad to prune in any scan (like we do today) because
there is fairly high chance that the page will be dirtied for hint bit
updates. So may be its better to just prune as well. We have already
put in several optimisations to do so only when required and without
any unnecessary contention. Of course, benchmarks can prove me wrong.

Thanks,
Pavan

--
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Set visibility map bit after HOT prune
Date:	2012-12-16 09:40:42
Message-ID:	CA+U5nMKqqFcWx78issKgHXki83mTQVZYE=Lhdtj1n-1i5tgmRw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 16 December 2012 07:53, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com> wrote:
> On Sun, Dec 16, 2012 at 3:18 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
>>> Doing that only makes sense when we're running a SELECT. Setting the
>>> all visible bit immediately prior to an UPDATE that clears it again is
>>> pointless effort, generating extra work for no reason.
>>
>> On the other hand, the HOT prune operation itself is worthless when
>> we're running a SELECT. The only reason we do it that way is that we
>> have to prune before the query starts to use the page, else pruning
>> might invalidate pointers-to-tuples that are being held within the
>> query plan tree.
>>
>
> Right. HOT prune may and often would be called in the SELECT path or
> heap/index scan leading to UPDATE/DELETE. But whenever its called, it
> looks like a good idea to set the visibility map bit.

As explained above, I disagree that it looks like a good idea, and
you've shown no evidence it would be or is true.

I agree with Tom that cleaning on SELECT can be worthless and I'd
definitely like to be able to turn it off conditionally or
permanently.

> There is some
> additional overhad to check if a LIVE tuple is all-visible or not,
> but that doesn't look too much. I did run some pgbench tests for fully
> cached tables and did not see any difference in tps.
>
> Another idea could have been to NOT clear the visibility bit when a
> HOT update happens. Such tuple can get pruned by HOT prune, so we
> don't need vacuum per se, and the index-only scans are not affected
> because the update was a HOT update, so the index keys did not change
> either. So index-only scans would continue to return the same result.
> Don't know if this would work with hot standby, probably not.
>
>> Maybe it's time to look at what it'd take for the low-level scan
>> operations to know whether they're scanning the target relation of
>> an UPDATE query, so that we could skip pruning altogether except
>> when a HOT update could conceivably ensue.
>
> Yeah, we could do that. We may not be able to predict whether the
> coming update is HOT or not, but I don't think that matters.

We can predict that an update is HOT in advance, if none of the index
columns are touched in the UPDATE. Yes, there are some cases where it
might not be, but we could probably store that in the statement cache.

Making that checkat run time must cost some block contention, so it
would be good to remove it from every update.

> OTOH its
> probably not too bad to prune in any scan (like we do today) because
> there is fairly high chance that the page will be dirtied for hint bit
> updates. So may be its better to just prune as well. We have already
> put in several optimisations to do so only when required and without
> any unnecessary contention. Of course, benchmarks can prove me wrong.

I think we could use some measurement/stats there so we can check.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Set visibility map bit after HOT prune
Date:	2012-12-16 14:41:20
Message-ID:	20121216144120.GD4683@awork2.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2012-12-16 13:23:56 +0530, Pavan Deolasee wrote:
> Another idea could have been to NOT clear the visibility bit when a
> HOT update happens. Such tuple can get pruned by HOT prune, so we
> don't need vacuum per se, and the index-only scans are not affected
> because the update was a HOT update, so the index keys did not change
> either. So index-only scans would continue to return the same result.
> Don't know if this would work with hot standby, probably not.

For IOSs that sounds like an interesting and itself easy to implement
idea, you basically only would need to add a single !use_hot_update in
the if blocks doing the PageClearAllVisible in heap_update.
This probably could make IOSs far more likely in some scenarios.

The complicated bit seems to be the heapgetpage() logic arround
all_visible, because HOT updates are obviously relevant in normal heap
scans. It seems to me that would require the vm bit continuing to be set
while the page level bit get unset.
I *think* thats actually ok because whenever we set/clear the
visibilitymap we will still log it properly, so the crash safety
guarantees seem to hold true. Obviously we would have to change the
escape hatch for exactly that condition in vacuumlazy, but thats not a
problem...

I don't immediately see a problem with HS, your logic seems to hold
equally true there. The replay logic would need to be refined slightly,
but it looks possible. That is without having checked the code...

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Simon Riggs <simon(at)2ndQuadrant(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Set visibility map bit after HOT prune
Date:	2012-12-16 14:42:54
Message-ID:	20121216144254.GE4683@awork2.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2012-12-15 16:48:08 -0500, Tom Lane wrote:
> Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
> > Doing that only makes sense when we're running a SELECT. Setting the
> > all visible bit immediately prior to an UPDATE that clears it again is
> > pointless effort, generating extra work for no reason.
>
> On the other hand, the HOT prune operation itself is worthless when
> we're running a SELECT. The only reason we do it that way is that we
> have to prune before the query starts to use the page, else pruning
> might invalidate pointers-to-tuples that are being held within the
> query plan tree.
>
> Maybe it's time to look at what it'd take for the low-level scan
> operations to know whether they're scanning the target relation of
> an UPDATE query, so that we could skip pruning altogether except
> when a HOT update could conceivably ensue. I think this was discussed
> back when HOT went in, but nobody wanted to make the patch more invasive
> than it had to be.

FWIW I think that would be a pretty worthwile optimization - I have seen
workloads where hot pruning lead to considerable contention.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Set visibility map bit after HOT prune
Date:	2012-12-16 15:27:09
Message-ID:	CABOikdO-x5NCUZAesE93J2i60D0BuJCdLrP_1NPNMALdU=ab6A@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sun, Dec 16, 2012 at 3:10 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>
>
> As explained above, I disagree that it looks like a good idea, and
> you've shown no evidence it would be or is true.
>

Lets separate out these two issues. What you are suggesting as a
follow up to Tom's idea, I've no objection to that and that might be
worthwhile optimisation to try out. But this patch itself does not
attempt to deal with that and its a separate work item and will
require invasive changes and tests.

*Whenever* we HOT prune, either in SELECT path or UPDATE path, what
I'm suggesting is lets try to set the visibility map bit if the
conditions are favorable. The only extra work that we are doing (as in
the submitted patch) is to check few additional things for LIVE tuples
such as if xmin precedes the OldestXmin or not. That itself does not
seem too costly. What we gain is: 1. next vacuum may skip that page
because its marked all-visible and 2. index-only scan will not visit
the heap page. These two improvements will avoid an useless heap page
IO and may justify a little more work in HOT prune.

Thanks,
Pavan

--
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Set visibility map bit after HOT prune
Date:	2012-12-16 16:25:03
Message-ID:	CA+U5nML1aXONK52X8TO5S6Z3NDJOEEGsexS1GKq96V7tPduFLQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 16 December 2012 14:41, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> On 2012-12-16 13:23:56 +0530, Pavan Deolasee wrote:
>> Another idea could have been to NOT clear the visibility bit when a
>> HOT update happens. Such tuple can get pruned by HOT prune, so we
>> don't need vacuum per se, and the index-only scans are not affected
>> because the update was a HOT update, so the index keys did not change
>> either. So index-only scans would continue to return the same result.
>> Don't know if this would work with hot standby, probably not.
>
> For IOSs that sounds like an interesting and itself easy to implement
> idea, you basically only would need to add a single !use_hot_update in
> the if blocks doing the PageClearAllVisible in heap_update.
> This probably could make IOSs far more likely in some scenarios.

Doing that would completely change the meaning of the visibility map
from a heap visibility map into an index-only map.

IndexOnly scans would still work, but nothing else would ever and it
would be hard to confirm the validity of the vm.

-1

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc:	Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Set visibility map bit after HOT prune
Date:	2012-12-16 16:40:53
Message-ID:	20121216164053.GG4683@awork2.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2012-12-16 16:25:03 +0000, Simon Riggs wrote:
> On 16 December 2012 14:41, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> > On 2012-12-16 13:23:56 +0530, Pavan Deolasee wrote:
> >> Another idea could have been to NOT clear the visibility bit when a
> >> HOT update happens. Such tuple can get pruned by HOT prune, so we
> >> don't need vacuum per se, and the index-only scans are not affected
> >> because the update was a HOT update, so the index keys did not change
> >> either. So index-only scans would continue to return the same result.
> >> Don't know if this would work with hot standby, probably not.
> >
> > For IOSs that sounds like an interesting and itself easy to implement
> > idea, you basically only would need to add a single !use_hot_update in
> > the if blocks doing the PageClearAllVisible in heap_update.
> > This probably could make IOSs far more likely in some scenarios.
>
> Doing that would completely change the meaning of the visibility map
> from a heap visibility map into an index-only map.
>
> IndexOnly scans would still work, but nothing else would ever and it
> would be hard to confirm the validity of the vm.

I don't think it would change the meaning that much - the visibilitymap
would still work for vacuum as normal heap updates would still unset the
all-visible flag. Vacuum would skip pages that already were all-visible
and then only got hot updated, true, but that seems like its an
acceptable tradeoff as that dead space can be fully cleaned up by hot
pruning.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Set visibility map bit after HOT prune
Date:	2012-12-16 17:44:07
Message-ID:	16812.1355679847@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com> writes:
> On Sun, Dec 16, 2012 at 3:10 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>> As explained above, I disagree that it looks like a good idea, and
>> you've shown no evidence it would be or is true.

> Lets separate out these two issues. What you are suggesting as a
> follow up to Tom's idea, I've no objection to that and that might be
> worthwhile optimisation to try out. But this patch itself does not
> attempt to deal with that and its a separate work item and will
> require invasive changes and tests.

> *Whenever* we HOT prune, either in SELECT path or UPDATE path, what
> I'm suggesting is lets try to set the visibility map bit if the
> conditions are favorable.

I don't believe it's clear at all that this is a good idea. If we
restrict pruning to occur only when there's a fairly good chance of
an ensuing HOT update, then Simon's original objection (that we're
probably going to have to clear the bit again right away) has
considerable force. And I agree with him that your proposed
redefinition of the bit's meaning to avoid that is pretty horrid;
it's ugly, complicates the invariant quite a lot, and breaks some
existing usages of the bit.

If we decide that we don't want to restrict pruning like that, then
this patch probably has merit. But we can't evaluate the two issues
independently.

Another thing that would need to be considered, if we do want to
restrict when pruning happens, is whether it is worth introducing some
other path altogether for setting the all-visible bit. Or perhaps we
should modify autovacuum's rules so that it will fire on relations for
which there might be lots of unmarked all-visible pages.

regards, tom lane

From:	Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Set visibility map bit after HOT prune
Date:	2012-12-17 12:26:47
Message-ID:	CABOikdOkBvDfOkGE00Hku+2k2TPOsT1QLr7BEzNLOVS=S_FugA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sun, Dec 16, 2012 at 11:14 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> And I agree with him that your proposed
> redefinition of the bit's meaning to avoid that is pretty horrid;
> it's ugly, complicates the invariant quite a lot, and breaks some
> existing usages of the bit.

(slammed.. feels the pain) You definitely have better intuitions and
understanding of the system than me and I've to be really shameful to
defend my idea which was out of the hat anyways, but let me try
nevertheless at the risk of being slammed again :-)

So whats the current meaning of the VM bit ? If set, it tells us that
all tuples in the page are visible to all current and future
transactions. But thats not entirely true. Otherwise why wouldn't we
mark a page all-visible if it contains a DEAD line pointer ? Of
course, we don't do that to support index only scans and make sure
that vacuum picks up such page. So would I be too wrong if I take
liberty to read VM bit as: if set, every index pointer to the heap
page is for sure pointing to *only* all-visible tuple. I also consider
vacuum as an activity that is needed because we can't clean up dead
line pointers in the heap without first removing the index pointers.
So visibility maps go hand-in-hand with indexes. In fact, if a table
does not have any index, I am not sure if we even need a VM for that
table, especially if we can teach HOT prune to record free space with
FSM.

My half-cooked idea extends that and says: if VM bit is set, every
index pointer to the heap page is pointing to either a all-visible
tuple or a valid HOT-chain that has all-visible tuple at the end
(aborted HOT tuples at the end are not counted as part of a valid
chain). OTOH the page-level bit is set if every tuple in the heap page
is all-visible. DEAD line pointers are not counted while arriving at
page level bit. So in this new scheme of things, there is a loose
correspondence between these two. VM bit is useful for index-only
scans and vacuum optimisations while page-level bit has limited use of
optimising heap scans. Andres probably said the same thing. Jeff Davis
is already talking about removing the page-level bit on other threads,
so I don't see much problem breaking a 1-to-1 mapping with the VM bit
and the page-level bit.

Having said all of this, I am not insisting on this unless we see a
value in it. And there could be when you reach a steady state so that
almost every update is a HOT update. Today you will still need
periodic vacuums just so that index-only scans work. Of course, you
talked about other code paths to set all-visible bits.

>
> If we decide that we don't want to restrict pruning like that, then
> this patch probably has merit. But we can't evaluate the two issues
> independently.
>

Fair enough. If someone starts actively working on moving HOT prune
logic to the UPDATE path, I'll withdraw this patch. But any change to
the HOT prune logic will require extensive and long duration testing
to prove its value. After all, we spent hours testing it when we wrote
it and had seen how small changes can cause drop in performance.

If nobody is volunteering to do this change at the moment, I feel that
the patch has value, as you yourself noted, and should be considered
on its merits, as things stand today and not based on things that we
might do in future.

Thanks,
Pavan

--
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee

From:	Amit Kapila <amit(dot)kapila(at)huawei(dot)com>
To:	"'Tom Lane'" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "'Pavan Deolasee'" <pavan(dot)deolasee(at)gmail(dot)com>
Cc:	"'Simon Riggs'" <simon(at)2ndquadrant(dot)com>, "'pgsql-hackers'" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Set visibility map bit after HOT prune
Date:	2012-12-18 12:06:51
Message-ID:	006401cddd18$2aa9ff30$7ffdfd90$@kapila@huawei.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sunday, December 16, 2012 11:14 PM Tom Lane wrote:
> Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com> writes:
> > On Sun, Dec 16, 2012 at 3:10 PM, Simon Riggs <simon(at)2ndquadrant(dot)com>
> wrote:
> >> As explained above, I disagree that it looks like a good idea, and
> >> you've shown no evidence it would be or is true.
>
> > Lets separate out these two issues. What you are suggesting as a
> > follow up to Tom's idea, I've no objection to that and that might be
> > worthwhile optimisation to try out. But this patch itself does not
> > attempt to deal with that and its a separate work item and will
> > require invasive changes and tests.
>

> Another thing that would need to be considered, if we do want to
> restrict when pruning happens, is whether it is worth introducing some
> other path altogether for setting the all-visible bit. Or perhaps we
> should modify autovacuum's rules so that it will fire on relations for
> which there might be lots of unmarked all-visible pages.

Can something similar be also used for putting deleted index pages into FSM.

The reason is that currently deleted index pages are recorded in FSM in the
next Vacuum cycle.
So if after bulk index update (always increasing order), even if the auto
vacuum is done once, it still does not put
deleted index pages into FSM.
Now let's assume there are no operations which can lead to auto-vacuum on
same table, next cycle of bulk update will allocate
new index pages.
I had observed this in one of the tests that if bulk index update happens
such that new value is always increasing, then index bloat happens. As per
initial analysis, it seems one of the reasons is what I described above.
If required, I can create a self-containing test which can show that
bulk-index update can lead to index bloat.

With Regards,
Amit Kapila.

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Set visibility map bit after HOT prune
Date:	2012-12-19 15:02:10
Message-ID:	CA+TgmoZPEJn0BwDYFtnmpFFSBFqumnpp5BkVFhQcb7CHwqRu5g@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sat, Dec 15, 2012 at 4:48 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> On the other hand, the HOT prune operation itself is worthless when
> we're running a SELECT. The only reason we do it that way is that we
> have to prune before the query starts to use the page, else pruning
> might invalidate pointers-to-tuples that are being held within the
> query plan tree.
>
> Maybe it's time to look at what it'd take for the low-level scan
> operations to know whether they're scanning the target relation of
> an UPDATE query, so that we could skip pruning altogether except
> when a HOT update could conceivably ensue. I think this was discussed
> back when HOT went in, but nobody wanted to make the patch more invasive
> than it had to be.

I think it's wrong to assume that HOT pruning has no value except in
this case. Truncating dead tuples to line pointers and collapsing HOT
chains speeds up future page scans, and if we were able to set the
all-visible bit, that would help even more. The problem is that this
is all somewhat prospective: HOT pruning the page doesn't help the
*current* scan - in fact, it can sometimes slow it down considerably -
but it can be a great help to the next scan that comes through. We
say, oh, don't worry, VACUUM will take care of it, but there are
plenty of cases where a page can be scanned a very large number of
times before VACUUM comes along; and you do can lose a lot of
performance in those cases.

That having been said, I agree with the concerns expressed elsewhere
in this thread that setting the visibility map bit too aggressively
will be a waste. If the page is about to get dirtied again we surely
don't want to go there. Aside from the obvious problem of doing work
that may not be necessary, it figures to create buffer-lock contention
on the visibility map page. One of the strengths of the current
design is that we avoid that pretty effectively.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Set visibility map bit after HOT prune
Date:	2012-12-19 15:56:57
Message-ID:	CABOikdPf-NQ=hi2iKt-WL7HaP7T6G6vcT2xpL=6gbbDe3zn7Pg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Dec 19, 2012 at 8:32 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Sat, Dec 15, 2012 at 4:48 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> On the other hand, the HOT prune operation itself is worthless when
>> we're running a SELECT. The only reason we do it that way is that we
>> have to prune before the query starts to use the page, else pruning
>> might invalidate pointers-to-tuples that are being held within the
>> query plan tree.
>>
>> Maybe it's time to look at what it'd take for the low-level scan
>> operations to know whether they're scanning the target relation of
>> an UPDATE query, so that we could skip pruning altogether except
>> when a HOT update could conceivably ensue. I think this was discussed
>> back when HOT went in, but nobody wanted to make the patch more invasive
>> than it had to be.
>
> I think it's wrong to assume that HOT pruning has no value except in
> this case. Truncating dead tuples to line pointers and collapsing HOT
> chains speeds up future page scans, and if we were able to set the
> all-visible bit, that would help even more.

Good point.

> The problem is that this
> is all somewhat prospective: HOT pruning the page doesn't help the
> *current* scan - in fact, it can sometimes slow it down considerably -
> but it can be a great help to the next scan that comes through. We
> say, oh, don't worry, VACUUM will take care of it, but there are
> plenty of cases where a page can be scanned a very large number of
> times before VACUUM comes along; and you do can lose a lot of
> performance in those cases.
>

Also, since we discount for number of tuples pruned by HOT pruning
while tracking number of dead tuples in a table, in a perfectly stable
system, autovacuum may not ever pick the table for vacuuming, slowly
stopping index-only scans from working. Soon we will have a situation
when all VM bits are clear, but autovacuum would fail to pick the
table. Tom had a good suggestion to periodically count vm bits to
choose tables for vacuuming even if there are no dead tuples or dead
line pointers to remove. I'm not sure though if the extra vacuum will
be better than setting the bit after HOT prune. Also, deciding when to
count the bits can be tricky. Do it every vacuum cycle ? Or after
every 5/10 cycles ? I don't have the answer.

> That having been said, I agree with the concerns expressed elsewhere
> in this thread that setting the visibility map bit too aggressively
> will be a waste. If the page is about to get dirtied again we surely
> don't want to go there.

Yeah, I agree. If we could figure out that we are soon going to UPDATE
a tuple in the page again, it will be worthless to set the bit. But
predicting that also could turn out to be tricky. Even if we could
somehow tell that the scan is happening on the result relation of an
UPDATE operation, not every page may receive updates because of where
quals etc. So we may get lots of false positives.

> Aside from the obvious problem of doing work
> that may not be necessary, it figures to create buffer-lock contention
> on the visibility map page. One of the strengths of the current
> design is that we avoid that pretty effectively.
>

Its a valid concern, though my limited pgbench tests did not show any
drop in the number. But thats hardly any proof. We can possibly
mitigate this by conditional update to the VM bit. Do it only if you
get a conditional exclusive lock on the buffer page.

Thanks,
Pavan

--
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Set visibility map bit after HOT prune
Date:	2012-12-19 16:21:38
Message-ID:	17015.1355934098@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Sat, Dec 15, 2012 at 4:48 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Maybe it's time to look at what it'd take for the low-level scan
>> operations to know whether they're scanning the target relation of
>> an UPDATE query, so that we could skip pruning altogether except
>> when a HOT update could conceivably ensue. I think this was discussed
>> back when HOT went in, but nobody wanted to make the patch more invasive
>> than it had to be.

> I think it's wrong to assume that HOT pruning has no value except in
> this case. Truncating dead tuples to line pointers and collapsing HOT
> chains speeds up future page scans, and if we were able to set the
> all-visible bit, that would help even more. The problem is that this
> is all somewhat prospective: HOT pruning the page doesn't help the
> *current* scan - in fact, it can sometimes slow it down considerably -
> but it can be a great help to the next scan that comes through.

Well, no. The problem with the way we do it now is that doing it every
time a query scan arrives at a page is too often, resulting in a lot of
wasted work. That wasted work is somewhat tolerable as long as it only
involves looking at the current page and ending up not actually changing
it. If we start generating a lot of useless WAL activity and I/O as
a result of thrashing the all-visible bit, it won't be so tolerable
anymore. But the problem is not so much the desire to set the bit as
that we're doing this whole activity at the wrong place and time.

Perhaps doing it every time an UPDATE arrives at the page is too far
in the other direction, and we need to look for some other mechanism
entirely.

I think my core point still stands: the way that HOT pruning is done now
is an artifact of having wanted to shoehorn it into the system with
minimum changes. Which was reasonable at the time given the
experimental status of the feature, but now it's time to reconsider.

regards, tom lane

From:	Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Set visibility map bit after HOT prune
Date:	2012-12-19 16:49:25
Message-ID:	CABOikdOwfUm001nEw0183jh2hVRnEekZXA=_PPt1eu5UKw9YUw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Dec 19, 2012 at 9:51 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> If we start generating a lot of useless WAL activity and I/O as
> a result of thrashing the all-visible bit, it won't be so tolerable
> anymore.

What if we wrap that into the WAL generated by HOT prune itself ?
Would that address your concerns for extra WAL logging ? I also
suggested doing it conditionally to avoid contention on the VM buffer.

(I actually wonder why we WAL-log set operation at all except for HS
to be able to do IOS, but thats a topic for separate thread may be)

Also, if extra WAL-logging is really worrisome, may be we should again
seriously reconsider my idea of *not* clearing the bit at HOT update.
My apologies for repeating myself. But that seems very important in a
steady system when almost every update is a HOT update. So you don't
clear the bit at HOT update and so don't need to set it back either,
thus saving two WAL activity. You definitely don't need any vacuum in
this case if pruning keeps reclaiming dead space at appropriate time
and make it available for the next update. More so, IOS still works
great. Why is this so bad ? I haven't forgotten your complaints about
changed meaning of the bit, but I tried to explain that we can read it
in a slightly different way and still show it as an invariant.

>
> I think my core point still stands: the way that HOT pruning is done now
> is an artifact of having wanted to shoehorn it into the system with
> minimum changes. Which was reasonable at the time given the
> experimental status of the feature, but now it's time to reconsider.
>

ISTM that you already have concret ideas about what are those places
where HOT prune would be more effective. My worry is changing anything
there is going to be a lot trickier and will require heavy testing.
Our initial work has served us well so far. Of course, I've no problem
changing that if its going to benefit users.

Thanks,
Pavan

--
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Set visibility map bit after HOT prune
Date:	2012-12-19 17:10:04
Message-ID:	CA+Tgmob5hj-FBU-oR30HKZ_RwcjKLMofNU81E4LFi8MsVmSo6w@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Dec 19, 2012 at 11:49 AM, Pavan Deolasee
<pavan(dot)deolasee(at)gmail(dot)com> wrote:
> Also, if extra WAL-logging is really worrisome, may be we should again
> seriously reconsider my idea of *not* clearing the bit at HOT update.
> My apologies for repeating myself. But that seems very important in a
> steady system when almost every update is a HOT update. So you don't
> clear the bit at HOT update and so don't need to set it back either,
> thus saving two WAL activity. You definitely don't need any vacuum in
> this case if pruning keeps reclaiming dead space at appropriate time
> and make it available for the next update. More so, IOS still works
> great. Why is this so bad ?

It's bad because then sequential scans will return wrong answers,
unless we also rip out the optimization that uses PD_ALL_VISIBLE as an
excuse to skip all visibility checks for the page. That optimization
is worth a significant amount of performance.

It's also bad because then vacuum won't visit the page, and it really
should. It's much better to have vacuum prune the page in the
background than to have some query do it in the foreground, although
the latter is still better than not doing it at all.

We could potentially have two or three bits per page to suit these
different needs: (1) page can benefit from a vacuum, (2) page is safe
for IOS purposes, and (3) page is safe for seqscan purposes. But I
think that might be overengineering.

IMHO, the goal here should be to have some method of setting the
visibility map, in some set of circumstances, outside of vacuum.
Figuring out which set of circumstances is appropriate is the hard
part.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Set visibility map bit after HOT prune
Date:	2012-12-19 17:26:40
Message-ID:	CABOikdOxNRXf9DPtp=6T9hB5SM9Mqs7SYstG4ZPOKC2N+BE9nw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Dec 19, 2012 at 10:40 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Wed, Dec 19, 2012 at 11:49 AM, Pavan Deolasee
> <pavan(dot)deolasee(at)gmail(dot)com> wrote:
>> Also, if extra WAL-logging is really worrisome, may be we should again
>> seriously reconsider my idea of *not* clearing the bit at HOT update.
>> My apologies for repeating myself. But that seems very important in a
>> steady system when almost every update is a HOT update. So you don't
>> clear the bit at HOT update and so don't need to set it back either,
>> thus saving two WAL activity. You definitely don't need any vacuum in
>> this case if pruning keeps reclaiming dead space at appropriate time
>> and make it available for the next update. More so, IOS still works
>> great. Why is this so bad ?
>
> It's bad because then sequential scans will return wrong answers,
> unless we also rip out the optimization that uses PD_ALL_VISIBLE as an
> excuse to skip all visibility checks for the page. That optimization
> is worth a significant amount of performance.
>

This can be handled by breaking 1-to-1 mapping on VM bit and
PD_ALL_VISIBLE bit. So seq scans will only look at PD_ALL_VISIBLE. It
was proposed by Andres up thread, but shot down by Tom and Simon. But
I still feel that was over reaction and there is a lot of merit in the
idea. As I said elsewhere, it will also help the case when there are
DEAD line pointers in a page. Today we can't mark such pages
all-visible, but if we break this mapping, we can do that.

I would like to run some pgbench tests where we get the system in a
steady state such as all/most updates are HOT updates (not entirely
unlikely scenario for many real life cases). And then try running some
concurrent queries which can be executed via IOS. My gut feel is that,
today we will see slow and continuous drop in performance for these
queries because IOS will slowly stop working.

> It's also bad because then vacuum won't visit the page, and it really
> should. It's much better to have vacuum prune the page in the
> background than to have some query do it in the foreground, although
> the latter is still better than not doing it at all.
>

Hmm. This is a good point and I don't have an easy answer. I'm not
sure how this will pan out in real life cases though. We definitely
made great progress by having HOT, though the same concerns were
raised even then that we are moving work from background to
foreground. But I think generally HOT made great difference to the
system as a whole, may be at a cost of slowdown for some read-only,
select queries. And HOT prune is not the only operation that we do in
foreground. We also set hint bits and make buffers dirty in an
otherwise read-only queries.

>
> IMHO, the goal here should be to have some method of setting the
> visibility map, in some set of circumstances, outside of vacuum.
> Figuring out which set of circumstances is appropriate is the hard
> part.
>

Yeah, if we can figure that out conclusively, I'm sure we might be
able to auto-tune the system even further.

Thanks,
Pavan
--
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Set visibility map bit after HOT prune
Date:	2012-12-19 17:39:58
Message-ID:	CA+U5nMKnuLe8n8+Qb4YT-WoH0+We22mWcUYVm5F995VskCaTsQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 19 December 2012 16:21, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Sat, Dec 15, 2012 at 4:48 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> Maybe it's time to look at what it'd take for the low-level scan
>>> operations to know whether they're scanning the target relation of
>>> an UPDATE query, so that we could skip pruning altogether except
>>> when a HOT update could conceivably ensue. I think this was discussed
>>> back when HOT went in, but nobody wanted to make the patch more invasive
>>> than it had to be.
>
>> I think it's wrong to assume that HOT pruning has no value except in
>> this case. Truncating dead tuples to line pointers and collapsing HOT
>> chains speeds up future page scans, and if we were able to set the
>> all-visible bit, that would help even more. The problem is that this
>> is all somewhat prospective: HOT pruning the page doesn't help the
>> *current* scan - in fact, it can sometimes slow it down considerably -
>> but it can be a great help to the next scan that comes through.
>
> Well, no. The problem with the way we do it now is that doing it every
> time a query scan arrives at a page is too often, resulting in a lot of
> wasted work. That wasted work is somewhat tolerable as long as it only
> involves looking at the current page and ending up not actually changing
> it. If we start generating a lot of useless WAL activity and I/O as
> a result of thrashing the all-visible bit, it won't be so tolerable
> anymore. But the problem is not so much the desire to set the bit as
> that we're doing this whole activity at the wrong place and time.
>
> Perhaps doing it every time an UPDATE arrives at the page is too far
> in the other direction, and we need to look for some other mechanism
> entirely.

The benefit of saying that only UPDATEs clean the block is that this
penalises only the workload making the mess, rather than everybody
cleaning up repeatedly over one messy guy.

Having a random SELECT clean the block causes both delay in
non-UPDATEing process, contention and additional writes.

We definitely know we write too often; this has been measured and
discussed over a period of years.

It would be useful to have a table-level option of hot_cleanup= SELECT
| UPDATE | NONE to allow people to minimise cleanup and test the
difference this makes.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Set visibility map bit after HOT prune
Date:	2012-12-19 18:52:30
Message-ID:	19868.1355943150@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com> writes:
> On Wed, Dec 19, 2012 at 9:51 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> If we start generating a lot of useless WAL activity and I/O as
>> a result of thrashing the all-visible bit, it won't be so tolerable
>> anymore.

> What if we wrap that into the WAL generated by HOT prune itself ?

What WAL? The case we're worried about here is that there's nothing
else for HOT prune to do.

>> I think my core point still stands: the way that HOT pruning is done now
>> is an artifact of having wanted to shoehorn it into the system with
>> minimum changes. Which was reasonable at the time given the
>> experimental status of the feature, but now it's time to reconsider.

> ISTM that you already have concret ideas about what are those places
> where HOT prune would be more effective.

No, I don't; I'm just suggesting that we ought to think outside the box
of the way it's being done now.

regards, tom lane

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Set visibility map bit after HOT prune
Date:	2012-12-19 23:35:39
Message-ID:	CA+U5nMLvN0Esd2do=_HaLeOs=OS2H6AADxCrM3vKyQ78K1_s-Q@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 19 December 2012 17:26, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com> wrote:

> We definitely
> made great progress by having HOT

Yes, definitely. Great work. That is not for debate.

> But I think generally HOT made great difference to the
> system as a whole, may be at a cost of slowdown for some read-only,
> select queries. And HOT prune is not the only operation that we do in
> foreground. We also set hint bits and make buffers dirty in an
> otherwise read-only queries.

And those last things are being debated hotly. We definitely need to
ask whether the way things are now can be tweaked to be better. The
major mechanics need not be reviewed, but the tradeoffs and balances?
Definitely.

Anything we do in foreground needs evaluation. Assuming eager actions
give a good payoff is not always a useful thought.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Set visibility map bit after HOT prune
Date:	2012-12-19 23:53:08
Message-ID:	CABOikdOveuJcsbykUgUVGf6dzq7ShNqGeEQsHaSt=Xy64GYUhw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Dec 20, 2012 at 12:22 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com> writes:
>
>
>> What if we wrap that into the WAL generated by HOT prune itself ?
>
> What WAL? The case we're worried about here is that there's nothing
> else for HOT prune to do.
>

Does such a case exist ? Or at least, is it that common ? I mean, we
have enough checks in place to ensure that HOT prune is attempted only
when there is something interesting in the page to be done. Otherwise
we don't even attempt getting a cleanup lock on the page. Of course,
they are just hints, but they serve very well. Prune XID is what I've
in mind in particular.

So the fact that the visibility map bit is cleared, it could be
because either at least one tuple in the page was updated, deleted or
inserted. The first two would have set prune XID and will trigger an
HOT action and HOT prune will indeed do something useful. I think
aborted non-HOT update may create a scenario that you're talking about
i.e. HOT prune will have nothing to do, but the page again turned
all-visible. Being an abort path, I wonder if its really that common
though.

That leaves us with the inserts which will clear the VM bit, but may
not have anything for HOT prune to do. But we don't set prune XID for
inserts either. So we won't get into hot_page_prune() for such pages.

So my point is, for fairly large and common cases, often we will set
the bit only when HOT prune did something useful, though not every
useful HOT prune will necessarily set the bit. And even if we slip
through all the safety nets on HOT prune, we can choose to set the bit
only if HOT did something useful to avoid any extra WAL logging
assuming we are still worried about those corner cases.

Thanks,
Pavan

--
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee

From:	Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Set visibility map bit after HOT prune
Date:	2012-12-20 00:10:18
Message-ID:	CABOikdPx7uEwC4y6YtSvvdLTzypxGrC8rbXTJFeqQSPfLkeLZA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Dec 20, 2012 at 5:05 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>
>
> And those last things are being debated hotly. We definitely need to
> ask whether the way things are now can be tweaked to be better. The
> major mechanics need not be reviewed, but the tradeoffs and balances?
> Definitely.
>

I have zero objection to do that, just that I don't have solid ideas
right now. And its not because I haven't thought hard enough.

> Anything we do in foreground needs evaluation. Assuming eager actions
> give a good payoff is not always a useful thought.
>

I don't disagree. Your field experience is much larger than mine, but
I have spent hours testing PostgreSQL's performance, so can talk with
some degree of conviction. I think when we do things that can reduce
read/write IO or bloat of a large table in general, the system as a
whole benefits, may be at a cost of some genuinely good guy doing a
simple SELECT in this case. Often the SELECTs are also benefited
because one of their good siblings helped us reduce bloat of the table
and hence seq scans had to scan order of magnitude less blocks.

I just thought that we can fairly easily limit the damage if we are
really worried about SELECTs being penalised. What if we set a
configurable limit on *extra* things that a query may do which is
otherwise not very useful for the query itself, but is useful to keep
the system healthy and steady. HOT prune definitely counts as one of
them and may be even setting of hint bits. (This is a topic for a
separate thread though)

Thanks,
Pavan

--
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Set visibility map bit after HOT prune
Date:	2012-12-20 00:42:25
Message-ID:	CA+TgmobP2CLPG+r0-cWgSmycuGb-yLku97bVh2S78nN-yn=NKg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Dec 19, 2012 at 12:26 PM, Pavan Deolasee
<pavan(dot)deolasee(at)gmail(dot)com> wrote:
> This can be handled by breaking 1-to-1 mapping on VM bit and
> PD_ALL_VISIBLE bit. So seq scans will only look at PD_ALL_VISIBLE. It
> was proposed by Andres up thread, but shot down by Tom and Simon. But
> I still feel that was over reaction and there is a lot of merit in the
> idea. As I said elsewhere, it will also help the case when there are
> DEAD line pointers in a page. Today we can't mark such pages
> all-visible, but if we break this mapping, we can do that.

Sure, but you're zipping rather blithely past the disadvantages of
such an approach. Jeff Davis recently proposed getting rid of
PD_ALL_VISIBLE, and Tom and I both expressed considerable skepticism
about that; this proposal has the same problems. One of the major
benefits of PD_ALL_VISIBLE is that, when it isn't set, inserts,
updates, and deletes to the page can ignore the visibility map. That
means that a server under heavy concurrency is much less likely to
encounter contention on the visibility map blocks. Now, maybe that's
not really a problem, but I sure haven't seen enough evidence to make
me believe it. If it's really true that PD_ALL_VISIBLE needn't fill
this role, then Heikki wasted an awful lot of time implementing it,
and I wasted an awful lot of time keeping it working when I made the
visibility map crash-safe for IOS. That could be true, but I tend to
think it isn't.

> I would like to run some pgbench tests where we get the system in a
> steady state such as all/most updates are HOT updates (not entirely
> unlikely scenario for many real life cases). And then try running some
> concurrent queries which can be executed via IOS. My gut feel is that,
> today we will see slow and continuous drop in performance for these
> queries because IOS will slowly stop working.

If there are no vacuums, I agree.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Set visibility map bit after HOT prune
Date:	2012-12-20 00:43:44
Message-ID:	CA+Tgmoa5XAbpOtbd1_wgaH8vC0T145VyXtspeLgxdCiWR0B60Q@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Dec 19, 2012 at 12:39 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> The benefit of saying that only UPDATEs clean the block is that this
> penalises only the workload making the mess, rather than everybody
> cleaning up repeatedly over one messy guy.

Right, but there are plenty of situations where having everybody clean
up after the messy guy is better than waiting around and hoping that
Mom (aka vacuum) will do it.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Set visibility map bit after HOT prune
Date:	2012-12-20 04:12:02
Message-ID:	CABOikdN0BsmhyKTHGjho7ufikA=CSkgOPMDHm8p9e6Hru=GiRw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Dec 20, 2012 at 6:12 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Wed, Dec 19, 2012 at 12:26 PM, Pavan Deolasee
> <pavan(dot)deolasee(at)gmail(dot)com> wrote:
>> This can be handled by breaking 1-to-1 mapping on VM bit and
>> PD_ALL_VISIBLE bit. So seq scans will only look at PD_ALL_VISIBLE. It
>> was proposed by Andres up thread, but shot down by Tom and Simon. But
>> I still feel that was over reaction and there is a lot of merit in the
>> idea. As I said elsewhere, it will also help the case when there are
>> DEAD line pointers in a page. Today we can't mark such pages
>> all-visible, but if we break this mapping, we can do that.
>
> Sure, but you're zipping rather blithely past the disadvantages of
> such an approach.

Hmm. You're right. I did not think about the disadvantages and now
that you mention them, I feel they are important.

> Jeff Davis recently proposed getting rid of
> PD_ALL_VISIBLE, and Tom and I both expressed considerable skepticism
> about that; this proposal has the same problems. One of the major
> benefits of PD_ALL_VISIBLE is that, when it isn't set, inserts,
> updates, and deletes to the page can ignore the visibility map. That
> means that a server under heavy concurrency is much less likely to
> encounter contention on the visibility map blocks. Now, maybe that's
> not really a problem, but I sure haven't seen enough evidence to make
> me believe it. If it's really true that PD_ALL_VISIBLE needn't fill
> this role, then Heikki wasted an awful lot of time implementing it,
> and I wasted an awful lot of time keeping it working when I made the
> visibility map crash-safe for IOS. That could be true, but I tend to
> think it isn't.
>

Yeah, VM buffer contention can become prominent if we break the
invariant that page level bit status implies the vm bit status, at
least when its clear.OTOH IMHO we need some mechanism to address the
issue of aggressive clearing of the VM bits, but a very lame
corresponding set operation. Today we don't have much contention on
the VM page, but we must be sacrificing its usability in return. IOS
as well as vacuum optimizations using VMs will turn out not so useful
for many workloads. I'm very reluctant to suggest that we can solve
this my setting aside another page-level bit to track visibility of
tuples for heapscans. Or even have a bit in the tuple header itself to
track this information at that level to avoid repeated visibility
check for a tuple which is known to be visible to all current and
future transactions.

>> I would like to run some pgbench tests where we get the system in a
>> steady state such as all/most updates are HOT updates (not entirely
>> unlikely scenario for many real life cases). And then try running some
>> concurrent queries which can be executed via IOS. My gut feel is that,
>> today we will see slow and continuous drop in performance for these
>> queries because IOS will slowly stop working.
>
> If there are no vacuums, I agree.
>

And we expect vacuums to be very less or none. AFAIR in pgbench, it
now takes hours for accounts table to get chosen for vacuum and we
should be happy about it. But IOS are almost impossible for pgbench
kind of workloads today because of our aggressive strategy to clear
the VM bits.

Thanks,
Pavan

--
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee

From:	Amit Kapila <amit(dot)kapila(at)huawei(dot)com>
To:	"'Robert Haas'" <robertmhaas(at)gmail(dot)com>, "'Simon Riggs'" <simon(at)2ndquadrant(dot)com>
Cc:	"'Tom Lane'" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "'Pavan Deolasee'" <pavan(dot)deolasee(at)gmail(dot)com>, "'pgsql-hackers'" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Set visibility map bit after HOT prune
Date:	2012-12-20 05:22:10
Message-ID:	004301cdde71$f7159570$e540c050$@kapila@huawei.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thursday, December 20, 2012 6:14 AM Robert Haas wrote:
> On Wed, Dec 19, 2012 at 12:39 PM, Simon Riggs <simon(at)2ndquadrant(dot)com>
> wrote:
> > The benefit of saying that only UPDATEs clean the block is that this
> > penalises only the workload making the mess, rather than everybody
> > cleaning up repeatedly over one messy guy.
>
> Right, but there are plenty of situations where having everybody clean
> up after the messy guy is better than waiting around and hoping that
> Mom (aka vacuum) will do it.

If we see for similar situation in index, during index scan, it just marks
the tuple as DEAD without taking X lock and then during split (when it
already has X lock) it free's the actual space.
So not sure if it's good idea to take X lock for cleanup during heap scan,
where write operation's happens more frequently and have better chance of
cleanup.

With Regards,
Amit Kapila.

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Set visibility map bit after HOT prune
Date:	2012-12-20 09:58:22
Message-ID:	CA+U5nMKOH2u=GuwJwEp97B6vLyK7-30g_1eYTU8E=j+FHAU10A@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 20 December 2012 00:43, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Wed, Dec 19, 2012 at 12:39 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>> The benefit of saying that only UPDATEs clean the block is that this
>> penalises only the workload making the mess, rather than everybody
>> cleaning up repeatedly over one messy guy.
>
> Right, but there are plenty of situations where having everybody clean
> up after the messy guy is better than waiting around and hoping that
> Mom (aka vacuum) will do it.

The problems I see are that cleaning on SELECT is too frequent,
interferes with foreground performance and re-dirties data blocks too
often.

Waiting for Mom is configurable, since we can set parameters for
autovacuum. But we can't turn off the cleaning by SELECTs, which makes
the configurability of autovacuum somewhat moot.

We could also contact the Cleaner instead.

ISTM that if someone spots a block that could use cleanup, they mark
the block as BM_PIN_COUNT_WAITER, but don't set pid. Then when they
unpin the block they send a signal/queue work for a special cleaning
process to come in and do the work now that nobody is waiting. Logic
would allow VACUUMs to go past that by setting the pid. If we
prioritised cleanup onto blocks that are already dirty we would
minimise I/O.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Set visibility map bit after HOT prune
Date:	2012-12-20 10:03:25
Message-ID:	CA+U5nMLcWSC=UVbL=STL5eS0XqdiXuwHjFkCKESPitwa+nV+Zw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 20 December 2012 00:10, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com> wrote:

> I just thought that we can fairly easily limit the damage if we are
> really worried about SELECTs being penalised. What if we set a
> configurable limit on *extra* things that a query may do which is
> otherwise not very useful for the query itself, but is useful to keep
> the system healthy and steady. HOT prune definitely counts as one of
> them and may be even setting of hint bits. (This is a topic for a
> separate thread though)

I like this idea

transaction_cleanup_limit = -1 (default), 0, 1+

-1 means no limit on number of cleanups in this transaction, which is
current behaviour.
Other numbers are the number of cleanups that will be tolerated in
this transaction; once we hit the limit we don't attempt cleanup
anymore we just get on with it. The limit is ignored for UPDATEs since
they need to clear space for their work.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Set visibility map bit after HOT prune
Date:	2012-12-20 15:53:32
Message-ID:	CA+TgmoYAxiY5=FirwWfSq4+ezQ84hkTTGRKH4P3z97YB_q-p+Q@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Dec 19, 2012 at 11:12 PM, Pavan Deolasee
<pavan(dot)deolasee(at)gmail(dot)com> wrote:
> Yeah, VM buffer contention can become prominent if we break the
> invariant that page level bit status implies the vm bit status, at
> least when its clear.OTOH IMHO we need some mechanism to address the
> issue of aggressive clearing of the VM bits, but a very lame
> corresponding set operation.

I agree.

> Today we don't have much contention on
> the VM page, but we must be sacrificing its usability in return. IOS
> as well as vacuum optimizations using VMs will turn out not so useful
> for many workloads.

I have that fear too, but the evidence isn't really in yet. The
testing that people have reported on this list has had mostly positive
outcomes. Of course that doesn't mean that it will work as well in
the field as it did in the lab, but it doesn't mean that it won't,
either.

> I'm very reluctant to suggest that we can solve
> this my setting aside another page-level bit to track visibility of
> tuples for heapscans. Or even have a bit in the tuple header itself to
> track this information at that level to avoid repeated visibility
> check for a tuple which is known to be visible to all current and
> future transactions.

This has been suggested before, as an alternative to freezing tuples.
It seems to have some potential although I think more thought and work
is needed to figure out exactly where to go with it.

> And we expect vacuums to be very less or none. AFAIR in pgbench, it
> now takes hours for accounts table to get chosen for vacuum and we
> should be happy about it. But IOS are almost impossible for pgbench
> kind of workloads today because of our aggressive strategy to clear
> the VM bits.

IMHO, it's probably fairly hopeless to make a pure pgbench workload
show a benefit from index-only scans. A large table under a very
heavy, completely random write workload is just about the worst
possible case for index-only scans. Index-only scans are a way of
avoiding unnecessary visibility checks when the target data hasn't
changed recently, not a magic bullet to escape all heap access. If
the target data has changed, you're going to have to touch the heap.
And while I agree that we aren't aggressive enough in setting the VM
bits right now, I also think it wouldn't be too hard to go too far in
the opposite direction: we could easily spend more effort trying to
make index-only scans effective than we could ever hope to recoup from
the scans themselves.

Now, if you set up N threads of which 10% are doing regular pgbench
and the other 90% are doing pgbench -S, or something like that, then
you might start to hope for some benefit from index-only scans. But I
think you might also GET some benefit in that case, even at steady
state.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Set visibility map bit after HOT prune
Date:	2012-12-20 15:55:27
Message-ID:	CA+TgmoaU6UtjaqVb4+kqrUaZ_pcDXVjMtoOEhwb_T93r+qse+A@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Dec 20, 2012 at 4:58 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> ISTM that if someone spots a block that could use cleanup, they mark
> the block as BM_PIN_COUNT_WAITER, but don't set pid. Then when they
> unpin the block they send a signal/queue work for a special cleaning
> process to come in and do the work now that nobody is waiting. Logic
> would allow VACUUMs to go past that by setting the pid. If we
> prioritised cleanup onto blocks that are already dirty we would
> minimise I/O.

I don't favor that particular signaling mechanism, but I agree that
there is quite a bit of potential utility in having foreground
processes notice that work (like a HOT prune, or setting the VM bit)
needs to be done and pass those requests off to a background process.
I'm hoping the new background worker framework in 9.3 will make that
sort of thing easier for people to play around with.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Set visibility map bit after HOT prune
Date:	2012-12-20 16:49:30
Message-ID:	CABOikdN_z0ojf_sX3X3txdt7Gj=QdOYdyP6UioFpDkAQ8+DGRQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Dec 20, 2012 at 9:23 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Wed, Dec 19, 2012 at 11:12 PM, Pavan Deolasee
>
>
>> I'm very reluctant to suggest that we can solve
>> this my setting aside another page-level bit to track visibility of
>> tuples for heapscans. Or even have a bit in the tuple header itself to
>> track this information at that level to avoid repeated visibility
>> check for a tuple which is known to be visible to all current and
>> future transactions.
>
> This has been suggested before, as an alternative to freezing tuples.
> It seems to have some potential although I think more thought and work
> is needed to figure out exactly where to go with it.
>

Ok. Will try to read archives to see what was actually suggested and
why it was put on back burner. BTW at the risk of being shot down
again, I wonder if can we push down the freeze operation to HOT prune
also. A single WAL record can then record this action as well. Also,
it saves us from repeated checks for transaction status flags in
heap_freeze_tuple(). Of course, we do all these only if HOT prune has
work on its on and just try to piggyback.

I wonder if we should add a flag to heap_page_prune and try to do some
additional work if its being called from lazy vacuum such as setting
the VM bit and the tuple freeze. IIRC I had put something like that in
the early patches, but then ripped of for simplicity. May be its time
to play with that again.

In fact, I'd also suggested ripping off the line pointer scan in lazy
vacuum since its preceded by a HOT prune which does bulk of the work
anyways. I remember Tom taking objection to that, but can't remember
why. Will try to read up the old thread again.

>> And we expect vacuums to be very less or none. AFAIR in pgbench, it
>> now takes hours for accounts table to get chosen for vacuum and we
>> should be happy about it. But IOS are almost impossible for pgbench
>> kind of workloads today because of our aggressive strategy to clear
>> the VM bits.
>
> IMHO, it's probably fairly hopeless to make a pure pgbench workload
> show a benefit from index-only scans. A large table under a very
> heavy, completely random write workload is just about the worst
> possible case for index-only scans. Index-only scans are a way of
> avoiding unnecessary visibility checks when the target data hasn't
> changed recently, not a magic bullet to escape all heap access. If
> the target data has changed, you're going to have to touch the heap.

Not always. Not clearing the VM bit at HOT update is one such idea we
discussed. Of course, there are open issues with that, but they are
not unsolvable. The advantage of not touching heap is just too big to
ignore.

> And while I agree that we aren't aggressive enough in setting the VM
> bits right now, I also think it wouldn't be too hard to go too far in
> the opposite direction: we could easily spend more effort trying to
> make index-only scans effective than we could ever hope to recoup from
> the scans themselves.
>

I agree. I also started having that worry. We are at one extreme right
now and it might not help to get to the other extreme. Looks like I'm
coming along the idea of somehow detecting if the scan is happening on
the result relation of a ModifyTable and avoid setting VM bit in that
case.

Thanks,
Pavan

--
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Set visibility map bit after HOT prune
Date:	2012-12-20 17:25:08
Message-ID:	14159.1356024308@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com> writes:
> Ok. Will try to read archives to see what was actually suggested and
> why it was put on back burner. BTW at the risk of being shot down
> again, I wonder if can we push down the freeze operation to HOT prune
> also.

Seems unlikely to be a win. We only care about freezing tuples in the
context of being able to advance a relation-wide relfrozenxid horizon.
Freezing pages retail accomplishes nothing whatsoever towards that goal,
unless you have some way to know that no new freeze work will be needed
on the page before the next vacuum freeze happens. Otherwise, you're
just moving portions of the work from background vacuuming into
foreground processes, with no benefit gained thereby. In fact, you
might well be *creating* work that would otherwise not have had to be
done at all --- the tuple might get deleted before the next freeze
happens.

regards, tom lane

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Set visibility map bit after HOT prune
Date:	2012-12-20 17:29:43
Message-ID:	CA+TgmoYQ1FsDrZ6mB95KDGM+9YKa3QRvY9kMyTHe19qMAmbJqA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Dec 20, 2012 at 11:49 AM, Pavan Deolasee
<pavan(dot)deolasee(at)gmail(dot)com> wrote:
> I wonder if we should add a flag to heap_page_prune and try to do some
> additional work if its being called from lazy vacuum such as setting
> the VM bit and the tuple freeze. IIRC I had put something like that in
> the early patches, but then ripped of for simplicity. May be its time
> to play with that again.

That seems unlikely to be a good trade-off. If VACUUM is going to do
extra stuff, it's better to have that in the vacuum-specific code,
rather than in code that is also traversed from other places.
Otherwise the conditional logic might impose a penalty on people who
aren't taking those branches.

>> IMHO, it's probably fairly hopeless to make a pure pgbench workload
>> show a benefit from index-only scans. A large table under a very
>> heavy, completely random write workload is just about the worst
>> possible case for index-only scans. Index-only scans are a way of
>> avoiding unnecessary visibility checks when the target data hasn't
>> changed recently, not a magic bullet to escape all heap access. If
>> the target data has changed, you're going to have to touch the heap.
>
> Not always. Not clearing the VM bit at HOT update is one such idea we
> discussed. Of course, there are open issues with that, but they are
> not unsolvable. The advantage of not touching heap is just too big to
> ignore.

I don't really agree. Sure, not touching the heap is nice, but mostly
because you avoid pulling pages into shared_buffers that aren't
otherwise needed. IIRC, an index-only scan isn't faster than an index
scan if all the necessary table and index pages are already cached.
Touching already-resident pages just isn't that expensive. And of
course, if a page has recently suffered an insert, update, or delete,
it is more likely to be resident. You can construct access patterns
where this isn't so - e.g. update the page, wait for it to get paged
out, and then SELECT from it with an index-only scan, wait for it to
get paged out again, etc. - but I'm not sure how much of a problem
that is in the real world.

>> And while I agree that we aren't aggressive enough in setting the VM
>> bits right now, I also think it wouldn't be too hard to go too far in
>> the opposite direction: we could easily spend more effort trying to
>> make index-only scans effective than we could ever hope to recoup from
>> the scans themselves.
>
> I agree. I also started having that worry. We are at one extreme right
> now and it might not help to get to the other extreme. Looks like I'm
> coming along the idea of somehow detecting if the scan is happening on
> the result relation of a ModifyTable and avoid setting VM bit in that
> case.

It's unclear to me that that's the right way to slice it. There are
several different sets of concerns here: (1) avoiding setting the
all-visible bit when it'll be cleared again just after, (2) avoiding
slowing down SELECT with hot-pruning, and (3) avoiding slowing down
repeated SELECTs by NOT having the first one do HOT-pruning. And
maybe others. The right thing to do depends on which problems you
think are relatively more important. That question might not even
have one right answer, but even if it does we don't have consensus on
what it is.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Set visibility map bit after HOT prune
Date:	2012-12-20 17:49:27
Message-ID:	CABOikdPGnsWO6W3Px9GKtVByHQY9QrRVX2kGVx00ZCObMXJhYQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Dec 20, 2012 at 10:55 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com> writes:
>> Ok. Will try to read archives to see what was actually suggested and
>> why it was put on back burner. BTW at the risk of being shot down
>> again, I wonder if can we push down the freeze operation to HOT prune
>> also.
>
> Seems unlikely to be a win. We only care about freezing tuples in the
> context of being able to advance a relation-wide relfrozenxid horizon.
> Freezing pages retail accomplishes nothing whatsoever towards that goal,
> unless you have some way to know that no new freeze work will be needed
> on the page before the next vacuum freeze happens. Otherwise, you're
> just moving portions of the work from background vacuuming into
> foreground processes, with no benefit gained thereby.

If we can establish an invariant that a all-visible page is always
fully freezed, then vacuum freeze does not need to look at those pages
again. Another advantage is that we are holding the right lock and
piggyback freeze with cleanup WAL-logging, thus avoiding re-dirtying
of the page and additional WAL logging.

> In fact, you
> might well be *creating* work that would otherwise not have had to be
> done at all --- the tuple might get deleted before the next freeze
> happens.
>

Yeah, there will be cases where it might not add any value or even add
little overhead. Don't know what will serve better on an average or
majority of the workloads though. Vacuum freeze has known to add
sudden and unexpected load on the system, so I thought this might
mitigate that to a certain extent.

Thanks,
Pavan

--
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee

From:	Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Set visibility map bit after HOT prune
Date:	2012-12-20 18:00:35
Message-ID:	CABOikdNw3V_inSyfKCty+yEtDdvb0ePg0i+hYbYC2w+x3PNRYA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Dec 20, 2012 at 10:59 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Thu, Dec 20, 2012 at 11:49 AM, Pavan Deolasee
> <pavan(dot)deolasee(at)gmail(dot)com> wrote:
>> I wonder if we should add a flag to heap_page_prune and try to do some
>> additional work if its being called from lazy vacuum such as setting
>> the VM bit and the tuple freeze. IIRC I had put something like that in
>> the early patches, but then ripped of for simplicity. May be its time
>> to play with that again.
>
> That seems unlikely to be a good trade-off. If VACUUM is going to do
> extra stuff, it's better to have that in the vacuum-specific code,
> rather than in code that is also traversed from other places.
> Otherwise the conditional logic might impose a penalty on people who
> aren't taking those branches.
>

Thats a call we need to take between code duplication vs customising
execution. We do that all over the code. Not sure if it will be any
different here.

>
> It's unclear to me that that's the right way to slice it. There are
> several different sets of concerns here: (1) avoiding setting the
> all-visible bit when it'll be cleared again just after, (2) avoiding
> slowing down SELECT with hot-pruning, and (3) avoiding slowing down
> repeated SELECTs by NOT having the first one do HOT-pruning. And
> maybe others. The right thing to do depends on which problems you
> think are relatively more important. That question might not even
> have one right answer, but even if it does we don't have consensus on
> what it is.

Hmm. We tossed and discussed many interesting ideas in this thread. It
will be sad if none of them go anywhere. When I look at archives, I
see we might have discussed some of these even in the past but never
got an agreement because there always be a workload which may not be
served well by any specific idea. And many a times, they are so
interrelated that we either have to do all or none. Unfortunately,
trying to do all is too-much and too-invasive most often.

May be what we need an official experimental branch where such ideas
can be checked-in and encourage people to try out those branches in
their real world tests or set up dedicated benchmark machines to run
regular tests. Tested and proven ideas can then be merged into the
main trunk. That will be the only way to know efficacy of such ideas.

Thanks,
Pavan

--
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Set visibility map bit after HOT prune
Date:	2012-12-20 18:30:48
Message-ID:	16989.1356028248@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com> writes:
> On Thu, Dec 20, 2012 at 10:55 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Seems unlikely to be a win. We only care about freezing tuples in the
>> context of being able to advance a relation-wide relfrozenxid horizon.
>> Freezing pages retail accomplishes nothing whatsoever towards that goal,
>> unless you have some way to know that no new freeze work will be needed
>> on the page before the next vacuum freeze happens. Otherwise, you're
>> just moving portions of the work from background vacuuming into
>> foreground processes, with no benefit gained thereby.

> If we can establish an invariant that a all-visible page is always
> fully freezed, then vacuum freeze does not need to look at those pages
> again.

We're not going to do that, because it would require freezing tuples
immediately after they fall below the RecentGlobalXmin horizon. This
would be a significant loss of capability from a forensic standpoint,
not to mention breaking existing applications that look at xmin to
determine whether a tuple has recently been updated. Besides which,
I think it would result in a large increase in the WAL volume emitted
by prune operations (hint bit setting doesn't require WAL, unlike
freezing). I don't believe for a minute your argument that it would
result in a net reduction in WAL.

regards, tom lane

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Set visibility map bit after HOT prune
Date:	2012-12-20 18:51:19
Message-ID:	CA+TgmoahhSPrGUnrOUfe=5crf4h=wtab0gcJs8fJ9G6THKU9Qw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Dec 20, 2012 at 1:30 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com> writes:
>> On Thu, Dec 20, 2012 at 10:55 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> Seems unlikely to be a win. We only care about freezing tuples in the
>>> context of being able to advance a relation-wide relfrozenxid horizon.
>>> Freezing pages retail accomplishes nothing whatsoever towards that goal,
>>> unless you have some way to know that no new freeze work will be needed
>>> on the page before the next vacuum freeze happens. Otherwise, you're
>>> just moving portions of the work from background vacuuming into
>>> foreground processes, with no benefit gained thereby.
>
>> If we can establish an invariant that a all-visible page is always
>> fully freezed, then vacuum freeze does not need to look at those pages
>> again.
>
> We're not going to do that, because it would require freezing tuples
> immediately after they fall below the RecentGlobalXmin horizon. This
> would be a significant loss of capability from a forensic standpoint,
> not to mention breaking existing applications that look at xmin to
> determine whether a tuple has recently been updated. Besides which,
> I think it would result in a large increase in the WAL volume emitted
> by prune operations (hint bit setting doesn't require WAL, unlike
> freezing). I don't believe for a minute your argument that it would
> result in a net reduction in WAL.

I don't think the above makes sense, because making a page all-visible
already requires emitting a WAL record. Pavan didn't say "freeze the
page every time we set a hint bit"; he said "freeze the page every
time it gets marked all-visible". And that's already WAL-logged.

Now, there is a downside: right now, we play a tricky little game
where we emit a WAL record for setting the visibility map bit, but we
don't actually set the LSN of the heap page. It's OK because it's
harmless if the PD_ALL_VISIBLE bit makes it to disk and the
visibility-map doesn't, and also because the PD_ALL_VISIBLE bit can be
set without relying on the previous page contents. But doing anything
more complicated with the same WAL record, like freezing, is likely to
require setting the LSN on the heap page. And that will result in a
huge increase in WAL traffic when vacuuming an insert-only table.
Whee, crash recovery is fun.

With respect to the forensic problem, we've previously discussed
setting a HEAP_XMIN_FROZEN bit in the tuple header rather than
overwriting the xmin with FrozenXID.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Set visibility map bit after HOT prune
Date:	2012-12-21 02:24:00
Message-ID:	CAMkU=1xYTKNxG7jY9rJCzq+JWO7VgbC55EJzXks_Q-Vf5xQxKg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wednesday, December 19, 2012, Robert Haas wrote:

> On Wed, Dec 19, 2012 at 12:26 PM, Pavan Deolasee
> <pavan(dot)deolasee(at)gmail(dot)com <javascript:;>> wrote:
>
> > I would like to run some pgbench tests where we get the system in a
> > steady state such as all/most updates are HOT updates (not entirely
> > unlikely scenario for many real life cases). And then try running some
> > concurrent queries which can be executed via IOS. My gut feel is that,
> > today we will see slow and continuous drop in performance for these
> > queries because IOS will slowly stop working.
>
> If there are no vacuums, I agree.
>

If the table is randomly updated over its entire size, then pretty much
every block will be not-all-visible (and so disqualified from IOS) before
you hit the default 20% vacuum threshold. I wonder if there ought not be
another vac threshold, based on vm density rather than estimated obsolete
tuple density.

Cheers,

Jeff

From:	Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Set visibility map bit after HOT prune
Date:	2012-12-21 03:30:35
Message-ID:	CAMkU=1zY2SD30EivNVGAocMTE_9Pq0D_N_uKL_2fHj643U1=Mw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thursday, December 20, 2012, Robert Haas wrote:

>
> IMHO, it's probably fairly hopeless to make a pure pgbench workload
> show a benefit from index-only scans. A large table under a very
> heavy, completely random write workload is just about the worst
> possible case for index-only scans. Index-only scans are a way of
> avoiding unnecessary visibility checks when the target data hasn't
> changed recently, not a magic bullet to escape all heap access. If
> the target data has changed, you're going to have to touch the heap.
> And while I agree that we aren't aggressive enough in setting the VM
> bits right now, I also think it wouldn't be too hard to go too far in
> the opposite direction: we could easily spend more effort trying to
> make index-only scans effective than we could ever hope to recoup from
> the scans themselves.
>
> Now, if you set up N threads of which 10% are doing regular pgbench
> and the other 90% are doing pgbench -S, or something like that, then
> you might start to hope for some benefit from index-only scans.

I set this up before, by dropping the primary key and instead building an
index on (aid,abalance) and then just running pgbench with a mixture of -f
flags that corresponded to some -S-like and some default-like transactions.

On a freshly vacuumed table, I saw a hint of a performance boost at even a
50:50 ratio, and clear boost at 3 -S to 1 default

I ran this at a size where not even all the index fit in RAM, to maximize
the benefit of not having to visit the table.

However, the boost started going away due to vm clearance long before
autovacuum kicked in at default settings.

Cheers,

Jeff

From:	Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
To:	Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Set visibility map bit after HOT prune
Date:	2013-01-08 06:17:17
Message-ID:	CABOikdPpMWT4tDyEJ4j9XM1_rM=sL-L+Ds6hWOG4WcCbSMxWBA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Sorry for a long pause on this thread. A new arrival at home kept me
occupied all the time.

This thread saw a lot of ideas and suggestions from different people. I
don't think we had an agreement one way or the other on any of them, but
let me summarize the discussion for archival and taking further action if
deemed necessary.

*Suggestion 1: Set the visibility map bit after HOT prune
*
The rational for this idea is to improve the chances of an index-only scan
happening after HOT prune. This is especially interesting when the a table
gets random updates or deletes, each of which will clear the VM bit. The
table may not still come up for vacuum, either because the number of
updates/deletes are not over the vac threshold or because subsequent HOT
prune did not leave any work for vacuum. The only place where we set the VM
bits again is during vacuum. So this idea would add another path where VM
bits are set. This would also help vacuums to avoid visiting those heap
pages that don't have any work to be done.

The main objection to this idea is that this may result in too much
flip-flopping of the bit, especially if the HOT prune is to be followed by
an UPDATE to the page. This is a valid concern. But the way HOT prune works
today, it has no linkage to the future UPDATE operations other than the
fact that it frees up space for future UPDATE operations. But the prune can
happen even in a select code path. Suggestion 2 below is about changing
this behavior, but my point is to consider 1 unless and until we do 2. Tom
and Simon opposed saying we need to take a holistic view. Another concern
with this idea is that VM bit set operation now generates WAL and repeated
setting/clearing of the bit may increase WAL activity. I suggested to piggy
back the VM bit set logging with the HOT prune WAL log. Robert raised some
doubts regarding increased full-page writes if VM set LSN is recorded in
the heap page LSN. I am not sure if that applies if we piggy back the
operation though because HOT prune WAL would anyway record LSN in the heap
page.

If we do this, we should also consider updating FSM after prune because
vacuum may not scan this page at all.

*Suggestion 2: Move HOT prune logic somewhere else
*
Tom/Simon suggested that we ought to consider moving HOT prune to some
other code path. When we implemented HOT a few years back, we wanted to
make it as less invasive as possible. But now the code has proven
stability, we can experiment a few more things. Especially, we would like
to prune only if the page is going to receive an UPDATE soon. Otherwise,
pruning may unnecessarily add overheads to a simple read-only query and the
space freed up by prune may not even be used soon/ever. Tom suggested that
we can teach planner/executor to distinguish between a scan on a normal
relation vs result relation. I'd some concerns that even if we have such
mechanism, it may not be enough because a scan does not guarantee that the
tuple will be finally updated because it may fail qualification etc.

Simon has strong views regarding burdening SELECTs with maintenance work,
but Robert and I are not convinced that its necessarily a bad idea to let
SELECTs do a little extra work which can help to keep the overall state
healthy. But in general, it might be a good idea to try such approaches
and see if we can extract more out of the system. Suggestion 5 and 6 also
popped up to handle this problem in a slightly different way.

*Suggestion 3: Don't clear visibility map bit after HOT update
*
I proposed this during the course of discussion and Andreas F
liked/supported the idea. This could be useful when most/all updates are
HOT updates. So the page does not need any work during vacuum (assuming HOT
prune will take care of it) and index-only scans still work because the
index pointers will be pointing to a valid HOT chain. Tom/Simon didn't
quite like it because they were worried that this will change the meaning
on the VM. I (and I think even Andreas) don't think that way. Of course,
there are some concerns because this will break the use of PD_ALL_VISIBLE
flag for avoiding MVCC checks during heap scans. There are couple of
suggestions to fix that like having another page level bit to differentiate
these two states. Doing that will help us skip MVCC checks even if there
are one or more DEAD line pointers in the page. We should also run some
performance tests to see how much benefit is really served by skipping MVCC
checks in heap scans. We can weigh that against the benefit of keeping the
VM bit set.

*Suggestion 4: Freeze tuples during HOT prune*
*
*
I suggested that we can also freeze tuples during HOT prune. The rational
for doing it this way is to remove unnecessary work from vacuum by
piggy-backing the freeze logging in the HOT prune WAL record. Today vacuum
will generate additional WAL and dirty the buffers again just to freeze the
tuples. There are couple of objections to this idea. One is pushes
background work into foreground operation. My explanation to that is the
additional work is not much since we are already doing a lot other things
in HOT prune and the extra work is justified because it will save us much
more extra work later on. Another objection to the idea is that we might
freeze a tuple which gets unfreezed again because say it gets updated or
deleted. Also, we may lose forensic information by overwriting the xmin too
soon. A possible solution (and Robert mentioned that it was suggested even
before) for the latter issue is to have a separate tuple header flag to
designate frozen tuples. This might be good in any case since we then don't
lose forensic information ever.
*
Suggestion 5: Add a cost model so that a transaction does only limited
maintenance activity such as hint bit setting and HOT prune*
*
*
Simon liked the idea and suggested having a GUC like
transaction_cleanup_cost. One can set this at a session level and the
amount of extra work that is done by any transaction will be governed by
its value. The default would be the current behavior and we might place
some lower limit so that every transaction at least contribute that much
toward maintenance activities. I did not see any objections to the idea.
*
Suggestion 6: Move maintenance activity to background workers
*
The idea here is to leave the maintenance activities to the background
threads. For example, when a backend finds that a heap page needs pruning,
it can send that information to the background thread instead of doing that
itself. Once 9.3 has background thread infrastructure, this might be worth
exploring further.

Thanks,
Pavan

--
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee