lazy_truncate_heap()

Lists: pgsql-hackers
From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: lazy_truncate_heap()
Date: 2008-12-31 18:21:34
Message-ID: 1230747694.4032.72.camel@ebony.2ndQuadrant
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

While watching WAL records float by I noticed some AccessExclusiveLocks
occurring unnecessarily during VACUUMs.

This is caused by lines 186-189 in lazy_vacuum_rel(), vacuumlazy.c

possibly_freeable = vacrelstats->rel_pages -
vacrelstats->nonempty_pages;
if (possibly_freeable >= REL_TRUNCATE_MINIMUM ||
possibly_freeable >= vacrelstats->rel_pages /
REL_TRUNCATE_FRACTION)
lazy_truncate_heap(onerel, vacrelstats);

If you look closely you'll see that if rel_pages is small then we will
attempt to truncate the heap even if possibly_freeable == 0.

While looking at this some more, I notice there is another bug. When
VACUUM has nothing at all to do, then it appears that
vacrelstats->nonempty_pages is zero, so that possibly_freeable is always
set to vacrelstats->rel_pages. vacrelstats->nonempty_pages doesn't
appear to be used anywhere else, so nobody notices it is wrongly set.

Both of these bugs are minor, but the effect of either/both of them is
to cause more AccessExclusiveLocks than we might expect.

For Hot Standby this means that many VACUUMs take AccessExclusiveLocks
on relations, which would potentially lead to having queries cancelled
for no reason at all.

Does anybody think any of the above is intentional? Can I fix?

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support


From: Greg Stark <greg(dot)stark(at)enterprisedb(dot)com>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: lazy_truncate_heap()
Date: 2008-12-31 19:45:18
Message-ID: 3DF8A552-B9BD-4496-88FE-55D1309BB25A@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On 31 Dec 2008, at 13:21, Simon Riggs <simon(at)2ndQuadrant(dot)com> wrote:
>
> Both of these bugs are minor, but the effect of either/both of them is
> to cause more AccessExclusiveLocks than we might expect.
>
> For Hot Standby this means that many VACUUMs take AccessExclusiveLocks
> on relations, which would potentially lead to having queries cancelled
> for no reason at all.

Well by default it would just cause wal to pause briefly until the
queries with those locks finish, no?


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: lazy_truncate_heap()
Date: 2008-12-31 19:45:20
Message-ID: 495BCBD0.9000003@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Simon Riggs wrote:
> While watching WAL records float by I noticed some AccessExclusiveLocks
> occurring unnecessarily during VACUUMs.
>
> This is caused by lines 186-189 in lazy_vacuum_rel(), vacuumlazy.c
>
> possibly_freeable = vacrelstats->rel_pages -
> vacrelstats->nonempty_pages;
> if (possibly_freeable >= REL_TRUNCATE_MINIMUM ||
> possibly_freeable >= vacrelstats->rel_pages /
> REL_TRUNCATE_FRACTION)
> lazy_truncate_heap(onerel, vacrelstats);
>
> If you look closely you'll see that if rel_pages is small then we will
> attempt to truncate the heap even if possibly_freeable == 0.

Good catch! And it goes all the way back to 7.4.

> While looking at this some more, I notice there is another bug. When
> VACUUM has nothing at all to do, then it appears that
> vacrelstats->nonempty_pages is zero, so that possibly_freeable is always
> set to vacrelstats->rel_pages. vacrelstats->nonempty_pages doesn't
> appear to be used anywhere else, so nobody notices it is wrongly set.

Hmm, this is a new issue with the visibility map. For pages at the end
that are skipped, we don't know if they're empty or not. Currently we
assume that they are, but perhaps it would be better to assume they're
not? On the other hand, that makes it much less likely that we even try
to truncate a relation, and when we do, we wouldn't truncate it as far
as we could.

> Does anybody think any of the above is intentional? Can I fix?

No. Yes please.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Greg Stark <greg(dot)stark(at)enterprisedb(dot)com>
Cc: Simon Riggs <simon(at)2ndQuadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: lazy_truncate_heap()
Date: 2009-01-01 10:00:41
Message-ID: 495C9449.8030305@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Greg Stark wrote:
>
> On 31 Dec 2008, at 13:21, Simon Riggs <simon(at)2ndQuadrant(dot)com> wrote:
>>
>> Both of these bugs are minor, but the effect of either/both of them is
>> to cause more AccessExclusiveLocks than we might expect.
>>
>> For Hot Standby this means that many VACUUMs take AccessExclusiveLocks
>> on relations, which would potentially lead to having queries cancelled
>> for no reason at all.
>
> Well by default it would just cause wal to pause briefly until the
> queries with those locks finish, no?

Wait a minute. Why does an AccessExclusiveLock lead to cancelled queries
or pausing WAL application? I thought it'd just block other queries
trying to acquire a conflicting lock in the standby, just like holding
an AccessExclusiveLock on the primary does. It's unrelated to the xmin
horizon issue.

There is a noteworthy point though. In the primary, vacuum trying to
truncate takes AccessExclusiveLock conditionally, so that it doesn't
disturb queries accessing the table, and only truncates the table if it
got the lock. But in standby, we have to truncate the table, and
therefore have to acquire the lock, waiting until we get it. I guess we
have to stop applying WAL while waiting.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Greg Stark <greg(dot)stark(at)enterprisedb(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: lazy_truncate_heap()
Date: 2009-01-01 17:16:08
Message-ID: 1230830168.4032.92.camel@ebony.2ndQuadrant
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On Wed, 2008-12-31 at 14:45 -0500, Greg Stark wrote:
> On 31 Dec 2008, at 13:21, Simon Riggs <simon(at)2ndQuadrant(dot)com> wrote:
> >
> > Both of these bugs are minor, but the effect of either/both of them is
> > to cause more AccessExclusiveLocks than we might expect.
> >
> > For Hot Standby this means that many VACUUMs take AccessExclusiveLocks
> > on relations, which would potentially lead to having queries cancelled
> > for no reason at all.
>
> Well by default it would just cause wal to pause briefly until the
> queries with those locks finish, no?

Yes, but why allow pointless actions in the first place?

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: lazy_truncate_heap()
Date: 2009-01-01 17:54:33
Message-ID: 1230832473.4032.105.camel@ebony.2ndQuadrant
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On Wed, 2008-12-31 at 21:45 +0200, Heikki Linnakangas wrote:
> > Can I fix?
>
> Yes please.

Fix attached.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support

Attachment Content-Type Size
lazy_truncate_heap_fix.v1.patch text/x-patch 1.2 KB

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Greg Stark <greg(dot)stark(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: lazy_truncate_heap()
Date: 2009-01-01 17:57:32
Message-ID: 1230832652.4032.106.camel@ebony.2ndQuadrant
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On Thu, 2009-01-01 at 12:00 +0200, Heikki Linnakangas wrote:
> Greg Stark wrote:
> >
> > On 31 Dec 2008, at 13:21, Simon Riggs <simon(at)2ndQuadrant(dot)com> wrote:
> >>
> >> Both of these bugs are minor, but the effect of either/both of them is
> >> to cause more AccessExclusiveLocks than we might expect.
> >>
> >> For Hot Standby this means that many VACUUMs take AccessExclusiveLocks
> >> on relations, which would potentially lead to having queries cancelled
> >> for no reason at all.
> >
> > Well by default it would just cause wal to pause briefly until the
> > queries with those locks finish, no?
>
> Wait a minute. Why does an AccessExclusiveLock lead to cancelled queries
> or pausing WAL application? I thought it'd just block other queries
> trying to acquire a conflicting lock in the standby, just like holding
> an AccessExclusiveLock on the primary does. It's unrelated to the xmin
> horizon issue.

Yes, it is unrelated to the xmin horizon issue. There are two reasons
for delaying WAL apply:
* locks
* xmin horizon

When a lock is acquired on the primary it almost always precedes an
action which cannot occur concurrently. For example, if VACUUM did
truncate a table then queries could get errors because parts of their
table disappear from under them. Others are drop table etc..

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: Greg Stark <greg(dot)stark(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: lazy_truncate_heap()
Date: 2009-01-04 11:01:56
Message-ID: 49609724.6090006@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Simon Riggs wrote:
> On Thu, 2009-01-01 at 12:00 +0200, Heikki Linnakangas wrote:
>> Greg Stark wrote:
>>> On 31 Dec 2008, at 13:21, Simon Riggs <simon(at)2ndQuadrant(dot)com> wrote:
>>>> Both of these bugs are minor, but the effect of either/both of them is
>>>> to cause more AccessExclusiveLocks than we might expect.
>>>>
>>>> For Hot Standby this means that many VACUUMs take AccessExclusiveLocks
>>>> on relations, which would potentially lead to having queries cancelled
>>>> for no reason at all.
>>> Well by default it would just cause wal to pause briefly until the
>>> queries with those locks finish, no?
>> Wait a minute. Why does an AccessExclusiveLock lead to cancelled queries
>> or pausing WAL application? I thought it'd just block other queries
>> trying to acquire a conflicting lock in the standby, just like holding
>> an AccessExclusiveLock on the primary does. It's unrelated to the xmin
>> horizon issue.
>
> Yes, it is unrelated to the xmin horizon issue. There are two reasons
> for delaying WAL apply:
> * locks
> * xmin horizon
>
> When a lock is acquired on the primary it almost always precedes an
> action which cannot occur concurrently. For example, if VACUUM did
> truncate a table then queries could get errors because parts of their
> table disappear from under them. Others are drop table etc..

Have you implemented the query cancellation mechanism for that scenario
too? (I'm cool either way, just curious..)

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Greg Stark <greg(dot)stark(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: lazy_truncate_heap()
Date: 2009-01-04 12:48:03
Message-ID: 1231073283.4032.233.camel@ebony.2ndQuadrant
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On Sun, 2009-01-04 at 13:01 +0200, Heikki Linnakangas wrote:
> Why does an AccessExclusiveLock lead to cancelled queries
> >> or pausing WAL application? I thought it'd just block other queries
> >> trying to acquire a conflicting lock in the standby, just like holding
> >> an AccessExclusiveLock on the primary does. It's unrelated to the xmin
> >> horizon issue.
> >
> > Yes, it is unrelated to the xmin horizon issue. There are two reasons
> > for delaying WAL apply:
> > * locks
> > * xmin horizon
> >
> > When a lock is acquired on the primary it almost always precedes an
> > action which cannot occur concurrently. For example, if VACUUM did
> > truncate a table then queries could get errors because parts of their
> > table disappear from under them. Others are drop table etc..
>
> Have you implemented the query cancellation mechanism for that scenario
> too? (I'm cool either way, just curious..)

Yes, they both lead to a conflict between WAL and standby queries, so
are treated the same, currently: if conflict occurs, wait until
max_standby_delay expires, then cancel.

Logically, "xmin horizon" conflicts could be flexible/soft. That is, if
we implemented the idea to store a lastCleanedLSN for each buffer then
"xmin horizon" conflicts would be able to continue executing until they
see a buffer with buffer.lastCleanedLSN > conflictLSN. Whereas the lock
would be a hard limit beyond which a query could not progress.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support


From: Zeugswetter Andreas OSB sIT <Andreas(dot)Zeugswetter(at)s-itsolutions(dot)at>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Greg Stark <greg(dot)stark(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: lazy_truncate_heap()
Date: 2009-01-05 17:24:51
Message-ID: 6DAFE8F5425AB84DB3FCA4537D829A561CEA8AA752@M0164.s-mxs.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


> Logically, "xmin horizon" conflicts could be flexible/soft.
> That is, if we implemented the idea to store a lastCleanedLSN for each buffer then
> "xmin horizon" conflicts would be able to continue executing until they
> see a buffer with buffer.lastCleanedLSN > conflictLSN.

I think the trouble is, that HOT can put extremely recent lastCleanedLSN's on pages.
It would need some knobs to avoid this, that most likely reduce efficiency of HOT.

What about using the page LSN after max_standby_delay ?
Using the page LSN cancels queries earlier than the lastCleanedLSN,
but probably in many cases later than an immediate cancel after max_standby_delay.
Of course that only helps when reading static parts of tables :-(

Instead of a cancel message, the replay would need to send (set in shmem) the first
LSN applied after max_standby_delay to the relevant backend for it's LSN checks
(if buffer.LSN >= received_max_delay_lsn cancel).

Andreas


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Zeugswetter Andreas OSB sIT <Andreas(dot)Zeugswetter(at)s-itsolutions(dot)at>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Greg Stark <greg(dot)stark(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: lazy_truncate_heap()
Date: 2009-01-05 18:21:42
Message-ID: 1231179702.4032.403.camel@ebony.2ndQuadrant
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On Mon, 2009-01-05 at 18:24 +0100, Zeugswetter Andreas OSB sIT wrote:
> > Logically, "xmin horizon" conflicts could be flexible/soft.
> > That is, if we implemented the idea to store a lastCleanedLSN for each buffer then
> > "xmin horizon" conflicts would be able to continue executing until they
> > see a buffer with buffer.lastCleanedLSN > conflictLSN.
>
> I think the trouble is, that HOT can put extremely recent lastCleanedLSN's on pages.
> It would need some knobs to avoid this, that most likely reduce efficiency of HOT.
>
> What about using the page LSN after max_standby_delay ?
> Using the page LSN cancels queries earlier than the lastCleanedLSN,
> but probably in many cases later than an immediate cancel after max_standby_delay.
> Of course that only helps when reading static parts of tables :-(
>
> Instead of a cancel message, the replay would need to send (set in shmem) the first
> LSN applied after max_standby_delay to the relevant backend for it's LSN checks
> (if buffer.LSN >= received_max_delay_lsn cancel).

I like your train of thought there: If HOT is the problem then
lastCleanedLSN approx= LSN on HOT blocks, so having lastCleanedLSN
doesn't help much.

OK, so I'll skip that idea and go with what you suggest.

Design:

When conflict occurs we set a RecoveryConflictLSN on the Proc of the
backend to be cancelled. Every time we read a block in recovery we check
MyProc.RecoveryConflictLSN against the LSN on the block and backend will
commit suicide (ERROR) if block LSN is equal or greater.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: lazy_truncate_heap()
Date: 2009-01-06 13:48:15
Message-ID: 4963611F.5060901@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Simon Riggs wrote:
> On Wed, 2008-12-31 at 21:45 +0200, Heikki Linnakangas wrote:
>>> Can I fix?
>> Yes please.
>
> Fix attached.

> --- 183,192 ----
> * number of pages. Otherwise, the time taken isn't worth it.
> */
> possibly_freeable = vacrelstats->rel_pages - vacrelstats->nonempty_pages;
> ! if (vacrelstats->tuples_deleted > 0 &&
> ! (possibly_freeable >= REL_TRUNCATE_MINIMUM ||
> ! (possibly_freeable >= vacrelstats->rel_pages / REL_TRUNCATE_FRACTION &&
> ! possibly_freeable > 0)))
> lazy_truncate_heap(onerel, vacrelstats);
>

Where did that "tuples_deleted > 0" condition come from? It seems
counter-productive; if a previous vacuum failed to acquire the lock,
subsequent vacuums wouldn't even try if they don't remove any tuples.
How about simply:

***************
*** 183,190 ****
* number of pages. Otherwise, the time taken isn't worth it.
*/
possibly_freeable = vacrelstats->rel_pages -
vacrelstats->nonempty_pages;
! if (possibly_freeable >= REL_TRUNCATE_MINIMUM ||
! possibly_freeable >= vacrelstats->rel_pages / REL_TRUNCATE_FRACTION)
lazy_truncate_heap(onerel, vacrelstats);

/* Vacuum the Free Space Map */
--- 183,191 ----
* number of pages. Otherwise, the time taken isn't worth it.
*/
possibly_freeable = vacrelstats->rel_pages -
vacrelstats->nonempty_pages;
! if (possibly_freeable > 0 &&
! (possibly_freeable >= REL_TRUNCATE_MINIMUM ||
! possibly_freeable >= vacrelstats->rel_pages / REL_TRUNCATE_FRACTION))
lazy_truncate_heap(onerel, vacrelstats);

/* Vacuum the Free Space Map */

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: lazy_truncate_heap()
Date: 2009-01-06 14:18:35
Message-ID: 1231251515.15005.32.camel@ebony.2ndQuadrant
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On Tue, 2009-01-06 at 15:48 +0200, Heikki Linnakangas wrote:
> How about simply:

OK

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: lazy_truncate_heap()
Date: 2009-01-06 15:16:10
Message-ID: 496375BA.3090603@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Simon Riggs wrote:
> On Tue, 2009-01-06 at 15:48 +0200, Heikki Linnakangas wrote:
>> How about simply:
>
> OK

Committed and backpatched all the way back to 7.4 stable.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com