Quick Links

Re: autovacuum truncate exclusive lock round two

Lists:	pgsql-hackers

From:	Jan Wieck <JanWieck(at)Yahoo(dot)com>
To:	PostgreSQL Development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: autovacuum truncate exclusive lock round two
Date:	2012-10-25 13:04:39
Message-ID:	508938E7.8060906@Yahoo.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Steven,

On 10/24/2012 10:46 PM, Stephen Frost wrote:
> Jan,
>
> * Jan Wieck (JanWieck(at)Yahoo(dot)com) wrote:
>> This problem has been discussed before. Those familiar with the
>> subject please skip the next paragraph.
>
> Apologies if this was already thought-of and ruled out for some reason,
> but...
>
>> Because all the scanning had been done in parallel to normal DB
>> activity, it needs to verify that all those blocks are still empty.
>
> Would it be possible to use the FSM to figure out if things have changed
> since the last scan..? Does that scan update the FSM, which would then
> be updated by another backend in the event that it decided to write
> something there? Or do we consider the FSM to be completely
> untrustworthy wrt this (and if so, I don't suppose there's any hope to
> using the visibility map...)?

I honestly don't know if we can trust the FSM enough when it comes to
throwing away heap pages. Can we?

>
> The notion of having to double-scan and the AccessExclusiveLock on the
> relation are telling me this work-around, while completely possible,
> isn't exactly ideal...

Under normal circumstances with just a few pages to trim off the end
this is no problem. Those pages were the last pages just scanned by this
very autovacuum, so they are found in the shared buffers anyway. All the
second scan does in that case is to fetch the page once more from shared
buffers to be 100% sure, we are not truncating off new tuples. We
definitely need the AccessExclusiveLock to prevent someone from
extending the relation at the end between our check for relation size
and the truncate. Fetching 50 empty blocks from the buffer cache while
at it isn't that big of a deal and that is what it normally looks like.

The problem case this patch is dealing with is rolling window tables
that experienced some bloat. The typical example is a log table, that
has new data constantly added and the oldest data constantly purged out.
This data normally rotates through some blocks like a rolling window. If
for some reason (purging turned off for example) this table bloats by
several GB and later shrinks back to its normal content, soon all the
used blocks are at the beginning of the heap and we find tens of
thousands of empty pages at the end. Only now does the second scan take
more than 1000ms and autovacuum is at risk to get killed while at it.

Since we have experienced this problem several times now on our
production systems, something clearly needs to be done. But IMHO it
doesn't happen often enough to take any risk here.

Jan

--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Jan Wieck <JanWieck(at)Yahoo(dot)com>
Cc:	PostgreSQL Development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: autovacuum truncate exclusive lock round two
Date:	2012-10-25 13:45:59
Message-ID:	26446.1351172759@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Jan Wieck <JanWieck(at)Yahoo(dot)com> writes:
> On 10/24/2012 10:46 PM, Stephen Frost wrote:
>> Would it be possible to use the FSM to figure out if things have changed
>> since the last scan..? Does that scan update the FSM, which would then
>> be updated by another backend in the event that it decided to write
>> something there? Or do we consider the FSM to be completely
>> untrustworthy wrt this (and if so, I don't suppose there's any hope to
>> using the visibility map...)?

> I honestly don't know if we can trust the FSM enough when it comes to
> throwing away heap pages. Can we?

No. Backends are under no obligation to update FSM for each individual
tuple insertion, and typically don't do so.

More to the point, you have to take AccessExclusiveLock *anyway*,
because this is interlocking not only against new insertions but plain
read-only seqscans: if a seqscan falls off the end of the table it will
be very unhappy. So I don't see where we'd buy anything by consulting
the FSM.

regards, tom lane

From:	Jan Wieck <JanWieck(at)Yahoo(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	PostgreSQL Development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: autovacuum truncate exclusive lock round two
Date:	2012-10-25 13:57:38
Message-ID:	50894552.4060804@Yahoo.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 10/25/2012 9:45 AM, Tom Lane wrote:
> Jan Wieck <JanWieck(at)Yahoo(dot)com> writes:
>> On 10/24/2012 10:46 PM, Stephen Frost wrote:
>>> Would it be possible to use the FSM to figure out if things have changed
>>> since the last scan..? Does that scan update the FSM, which would then
>>> be updated by another backend in the event that it decided to write
>>> something there? Or do we consider the FSM to be completely
>>> untrustworthy wrt this (and if so, I don't suppose there's any hope to
>>> using the visibility map...)?
>
>> I honestly don't know if we can trust the FSM enough when it comes to
>> throwing away heap pages. Can we?
>
> No. Backends are under no obligation to update FSM for each individual
> tuple insertion, and typically don't do so.
>
> More to the point, you have to take AccessExclusiveLock *anyway*,
> because this is interlocking not only against new insertions but plain
> read-only seqscans: if a seqscan falls off the end of the table it will
> be very unhappy. So I don't see where we'd buy anything by consulting
> the FSM.

Thank you.

One thing that I haven't mentioned yet is that with this patch, we could
actually insert a vacuum_delay_point() into the loop in
count_nondeletable_pages(). We no longer cling to the exclusive lock but
rather get out of the way as soon as somebody needs the table. Under
this condition we no longer need to do the second scan full bore.

Jan

--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin

From:	Stephen Frost <sfrost(at)snowman(dot)net>
To:	Jan Wieck <JanWieck(at)Yahoo(dot)com>
Cc:	PostgreSQL Development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: autovacuum truncate exclusive lock round two
Date:	2012-10-25 14:12:01
Message-ID:	20121025141201.GB5162@tamriel.snowman.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Jan,

* Jan Wieck (JanWieck(at)Yahoo(dot)com) wrote:
> The problem case this patch is dealing with is rolling window tables
> that experienced some bloat. The typical example is a log table,
> that has new data constantly added and the oldest data constantly
> purged out. This data normally rotates through some blocks like a
> rolling window. If for some reason (purging turned off for example)
> this table bloats by several GB and later shrinks back to its normal
> content, soon all the used blocks are at the beginning of the heap
> and we find tens of thousands of empty pages at the end. Only now
> does the second scan take more than 1000ms and autovacuum is at risk
> to get killed while at it.

My concern is that this could certainly also happen to a heavily updated
table in an OLTP type of environment where the requirement to take a
heavy lock to clean it up might prevent it from ever happening.. I was
simply hoping we could find a mechanism to lock just those pages we're
getting ready to nuke rather than the entire relation. Perhaps we can
consider how to make those changes alongside of changes to eliminate or
reduce the extent locking that has been painful (for me at least) when
doing massive parallel loads into a table.

> Since we have experienced this problem several times now on our
> production systems, something clearly needs to be done. But IMHO it
> doesn't happen often enough to take any risk here.

I'm not advocating a 'do-nothing' approach, was just looking for another
option that might allow for this work to happen on the heap in parallel
with regular access. Since we havn't got any way to do that currently,
+1 for moving forward with this as it clearly improves the current
situation.

Thanks,

Stephen

From:	Jan Wieck <JanWieck(at)Yahoo(dot)com>
To:	Stephen Frost <sfrost(at)snowman(dot)net>
Cc:	PostgreSQL Development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: autovacuum truncate exclusive lock round two
Date:	2012-10-25 16:16:22
Message-ID:	508965D6.9020009@Yahoo.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 10/25/2012 10:12 AM, Stephen Frost wrote:
> Jan,
>
> * Jan Wieck (JanWieck(at)Yahoo(dot)com) wrote:
>> The problem case this patch is dealing with is rolling window tables
>> that experienced some bloat. The typical example is a log table,
>> that has new data constantly added and the oldest data constantly
>> purged out. This data normally rotates through some blocks like a
>> rolling window. If for some reason (purging turned off for example)
>> this table bloats by several GB and later shrinks back to its normal
>> content, soon all the used blocks are at the beginning of the heap
>> and we find tens of thousands of empty pages at the end. Only now
>> does the second scan take more than 1000ms and autovacuum is at risk
>> to get killed while at it.
>
> My concern is that this could certainly also happen to a heavily updated
> table in an OLTP type of environment where the requirement to take a
> heavy lock to clean it up might prevent it from ever happening.. I was
> simply hoping we could find a mechanism to lock just those pages we're
> getting ready to nuke rather than the entire relation. Perhaps we can
> consider how to make those changes alongside of changes to eliminate or
> reduce the extent locking that has been painful (for me at least) when
> doing massive parallel loads into a table.

I've been testing this with loads of 20 writes/s to that bloated table.
Preventing not only the clean up, but the following ANALYZE as well is
precisely what happens. There may be multiple ways how to get into this
situation, but once you're there the symptoms are the same. Vacuum fails
to truncate it and causing a 1 second hiccup every minute, while vacuum
is holding the exclusive lock until the deadlock detection code of
another transaction kills it.

My patch doesn't change the logic how we ensure that we don't zap any
data by accident with the truncate and Tom's comments suggest we should
stick to it. It only makes autovacuum check frequently if the
AccessExclusiveLock is actually blocking anyone and then get out of the
way.

I would rather like to discuss any ideas how to do all this without 3
new GUCs.

In the original code, the maximum delay that autovacuum can cause by
holding the exclusive lock is one deadlock_timeout (default 1s). It
would appear reasonable to me to use max(deadlock_timeout/10,10ms) as
the interval to check for a conflicting lock request. For another
transaction that needs to access the table this is 10 times faster than
it is now and still guarantees that autovacuum will make some progress
with the truncate.

The other two GUCs control how often and how fast autovacuum tries to
acquire the exclusive lock in the first place. Since we actively release
the lock *because someone needs it* it is pretty much guaranteed that
the immediate next lock attempt fails. We on purpose do a
ConditionalLockRelation() because there is a chance to deadlock. The
current code only tries one lock attempt and gives up immediately. I
don't know from what to derive a good value for how long to retry, but
the nap time in between tries could be a hardcoded 20ms or using the
cost based vacuum nap time (which defaults to 20ms).

Any other ideas are welcome.

Thanks,
Jan

--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin

From:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To:	Jan Wieck <JanWieck(at)Yahoo(dot)com>
Cc:	Stephen Frost <sfrost(at)snowman(dot)net>, PostgreSQL Development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: autovacuum truncate exclusive lock round two
Date:	2012-10-25 16:24:15
Message-ID:	20121025162415.GF6442@alvh.no-ip.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Jan Wieck wrote:

> In the original code, the maximum delay that autovacuum can cause by
> holding the exclusive lock is one deadlock_timeout (default 1s). It
> would appear reasonable to me to use max(deadlock_timeout/10,10ms)
> as the interval to check for a conflicting lock request. For another
> transaction that needs to access the table this is 10 times faster
> than it is now and still guarantees that autovacuum will make some
> progress with the truncate.

So you would be calling GetCurrentTimestamp() continuously? Since you
mentioned adding a vacuum delay point I wonder if it would make sense to
test for lockers each time it would consider going to sleep, instead.
(One hazard to keep in mind is the case where no vacuum delay is
configured.)

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Jan Wieck <JanWieck(at)Yahoo(dot)com>
To:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc:	Stephen Frost <sfrost(at)snowman(dot)net>, PostgreSQL Development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: autovacuum truncate exclusive lock round two
Date:	2012-10-25 17:48:17
Message-ID:	50897B61.2040902@Yahoo.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 10/25/2012 12:24 PM, Alvaro Herrera wrote:
> Jan Wieck wrote:
>
>> In the original code, the maximum delay that autovacuum can cause by
>> holding the exclusive lock is one deadlock_timeout (default 1s). It
>> would appear reasonable to me to use max(deadlock_timeout/10,10ms)
>> as the interval to check for a conflicting lock request. For another
>> transaction that needs to access the table this is 10 times faster
>> than it is now and still guarantees that autovacuum will make some
>> progress with the truncate.
>
> So you would be calling GetCurrentTimestamp() continuously? Since you
> mentioned adding a vacuum delay point I wonder if it would make sense to
> test for lockers each time it would consider going to sleep, instead.
> (One hazard to keep in mind is the case where no vacuum delay is
> configured.)

Depends on your definition of "continuously". If doing one
INSTR_TIME_SET_CURRENT(), which on Unix boils down to a gettimeofday(),
every 32 ReadBufferExtended() calls counts as continuously, then yes.

Adding a vacuum_delay_point() is something we should consider. However,
the vacuum_delay_point() call simply naps when enough cost has been
racked up. You don't know if the next call will nap or not. We would
have to extend that functionality with some vacuum_delay_would_nap()
call to do what you suggest.

Jan

--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin

From:	Amit Kapila <amit(dot)kapila(at)huawei(dot)com>
To:	"'Jan Wieck'" <JanWieck(at)Yahoo(dot)com>, "'Stephen Frost'" <sfrost(at)snowman(dot)net>
Cc:	"'PostgreSQL Development'" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: autovacuum truncate exclusive lock round two
Date:	2012-10-26 05:29:06
Message-ID:	001001cdb33a$d23c3a90$76b4afb0$@kapila@huawei.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thursday, October 25, 2012 9:46 PM Jan Wieck wrote:
> On 10/25/2012 10:12 AM, Stephen Frost wrote:
> > Jan,
> >
> > * Jan Wieck (JanWieck(at)Yahoo(dot)com) wrote:
> >> The problem case this patch is dealing with is rolling window tables
> >> that experienced some bloat. The typical example is a log table,
> >> that has new data constantly added and the oldest data constantly
> >> purged out. This data normally rotates through some blocks like a
> >> rolling window. If for some reason (purging turned off for example)
> >> this table bloats by several GB and later shrinks back to its normal
> >> content, soon all the used blocks are at the beginning of the heap
> >> and we find tens of thousands of empty pages at the end. Only now
> >> does the second scan take more than 1000ms and autovacuum is at risk
> >> to get killed while at it.
> >
> > My concern is that this could certainly also happen to a heavily
> updated
> > table in an OLTP type of environment where the requirement to take a
> > heavy lock to clean it up might prevent it from ever happening.. I
> was
> > simply hoping we could find a mechanism to lock just those pages we're
> > getting ready to nuke rather than the entire relation. Perhaps we can
> > consider how to make those changes alongside of changes to eliminate
> or
> > reduce the extent locking that has been painful (for me at least) when
> > doing massive parallel loads into a table.
>
> I've been testing this with loads of 20 writes/s to that bloated table.
> Preventing not only the clean up, but the following ANALYZE as well is
> precisely what happens. There may be multiple ways how to get into this
> situation, but once you're there the symptoms are the same. Vacuum fails
> to truncate it and causing a 1 second hiccup every minute, while vacuum
> is holding the exclusive lock until the deadlock detection code of
> another transaction kills it.
>
> My patch doesn't change the logic how we ensure that we don't zap any
> data by accident with the truncate and Tom's comments suggest we should
> stick to it. It only makes autovacuum check frequently if the
> AccessExclusiveLock is actually blocking anyone and then get out of the
> way.
>
> I would rather like to discuss any ideas how to do all this without 3
> new GUCs.
>
> In the original code, the maximum delay that autovacuum can cause by
> holding the exclusive lock is one deadlock_timeout (default 1s). It
> would appear reasonable to me to use max(deadlock_timeout/10,10ms) as
> the interval to check for a conflicting lock request. For another
> transaction that needs to access the table this is 10 times faster than
> it is now and still guarantees that autovacuum will make some progress
> with the truncate.

One other way could be to check after every few pages for a conflicting
lock request.

> The other two GUCs control how often and how fast autovacuum tries to
> acquire the exclusive lock in the first place. Since we actively release
> the lock *because someone needs it* it is pretty much guaranteed that
> the immediate next lock attempt fails. We on purpose do a
> ConditionalLockRelation() because there is a chance to deadlock. The
> current code only tries one lock attempt and gives up immediately. I
> don't know from what to derive a good value for how long to retry,

Can't we do something like, after nap check for conditional lock and if it
didn't get
then get lock unconditionally.
The reason why after your implementation it might be okay to have lock
unconditionally after one try is that
anyway after every few pages or after small time, it will release the lock
if there is any waiter.

> but
> the nap time in between tries could be a hardcoded 20ms or using the
> cost based vacuum nap time (which defaults to 20ms).

I think using cost based vacuum nap time or default value is good.

Adding new parameters might have user/administrator overhead, it is always
better if it can be intelligently decided by database itself.
However if you feel these are parameters which can vary based on different
kind of usage, then I think it is better to expose it through configuration
parameters to users.

With Regards,
Amit Kapila.

From:	Jan Wieck <JanWieck(at)Yahoo(dot)com>
To:	Amit Kapila <amit(dot)kapila(at)huawei(dot)com>
Cc:	'Stephen Frost' <sfrost(at)snowman(dot)net>, 'PostgreSQL Development' <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: autovacuum truncate exclusive lock round two
Date:	2012-10-26 06:19:30
Message-ID:	508A2B72.3010205@Yahoo.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 10/26/2012 1:29 AM, Amit Kapila wrote:
> On Thursday, October 25, 2012 9:46 PM Jan Wieck wrote:
>> On 10/25/2012 10:12 AM, Stephen Frost wrote:
>> > Jan,
>> >
>> > * Jan Wieck (JanWieck(at)Yahoo(dot)com) wrote:
>> >> The problem case this patch is dealing with is rolling window tables
>> >> that experienced some bloat. The typical example is a log table,
>> >> that has new data constantly added and the oldest data constantly
>> >> purged out. This data normally rotates through some blocks like a
>> >> rolling window. If for some reason (purging turned off for example)
>> >> this table bloats by several GB and later shrinks back to its normal
>> >> content, soon all the used blocks are at the beginning of the heap
>> >> and we find tens of thousands of empty pages at the end. Only now
>> >> does the second scan take more than 1000ms and autovacuum is at risk
>> >> to get killed while at it.
>> >
>> > My concern is that this could certainly also happen to a heavily
>> updated
>> > table in an OLTP type of environment where the requirement to take a
>> > heavy lock to clean it up might prevent it from ever happening.. I
>> was
>> > simply hoping we could find a mechanism to lock just those pages we're
>> > getting ready to nuke rather than the entire relation. Perhaps we can
>> > consider how to make those changes alongside of changes to eliminate
>> or
>> > reduce the extent locking that has been painful (for me at least) when
>> > doing massive parallel loads into a table.
>>
>> I've been testing this with loads of 20 writes/s to that bloated table.
>> Preventing not only the clean up, but the following ANALYZE as well is
>> precisely what happens. There may be multiple ways how to get into this
>> situation, but once you're there the symptoms are the same. Vacuum fails
>> to truncate it and causing a 1 second hiccup every minute, while vacuum
>> is holding the exclusive lock until the deadlock detection code of
>> another transaction kills it.
>>
>> My patch doesn't change the logic how we ensure that we don't zap any
>> data by accident with the truncate and Tom's comments suggest we should
>> stick to it. It only makes autovacuum check frequently if the
>> AccessExclusiveLock is actually blocking anyone and then get out of the
>> way.
>>
>> I would rather like to discuss any ideas how to do all this without 3
>> new GUCs.
>>
>> In the original code, the maximum delay that autovacuum can cause by
>> holding the exclusive lock is one deadlock_timeout (default 1s). It
>> would appear reasonable to me to use max(deadlock_timeout/10,10ms) as
>> the interval to check for a conflicting lock request. For another
>> transaction that needs to access the table this is 10 times faster than
>> it is now and still guarantees that autovacuum will make some progress
>> with the truncate.
>
> One other way could be to check after every few pages for a conflicting
> lock request.

How is this any different from what my patch does? Did you even look at
the code?

>
>> The other two GUCs control how often and how fast autovacuum tries to
>> acquire the exclusive lock in the first place. Since we actively release
>> the lock *because someone needs it* it is pretty much guaranteed that
>> the immediate next lock attempt fails. We on purpose do a
>> ConditionalLockRelation() because there is a chance to deadlock. The
>> current code only tries one lock attempt and gives up immediately. I
>> don't know from what to derive a good value for how long to retry,
>
> Can't we do something like, after nap check for conditional lock and if it
> didn't get
> then get lock unconditionally.

No, we cannot. This is also well documented in the code.

Jan

--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin

From:	Amit Kapila <amit(dot)kapila(at)huawei(dot)com>
To:	"'Amit Kapila'" <amit(dot)kapila(at)huawei(dot)com>, "'Jan Wieck'" <JanWieck(at)Yahoo(dot)com>, "'Stephen Frost'" <sfrost(at)snowman(dot)net>
Cc:	"'PostgreSQL Development'" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: autovacuum truncate exclusive lock round two
Date:	2012-10-26 07:04:47
Message-ID:	001401cdb348$2fcc2e60$8f648b20$@kapila@huawei.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Friday, October 26, 2012 10:59 AM Amit Kapila wrote:
> On Thursday, October 25, 2012 9:46 PM Jan Wieck wrote:
> > On 10/25/2012 10:12 AM, Stephen Frost wrote:
> > > Jan,
> > >
> > > * Jan Wieck (JanWieck(at)Yahoo(dot)com) wrote:
> > >> The problem case this patch is dealing with is rolling window
> tables
> > >> that experienced some bloat. The typical example is a log table,
> > >> that has new data constantly added and the oldest data constantly
> > >> purged out. This data normally rotates through some blocks like a
> > >> rolling window. If for some reason (purging turned off for example)
> > >> this table bloats by several GB and later shrinks back to its
> normal
> > >> content, soon all the used blocks are at the beginning of the heap
> > >> and we find tens of thousands of empty pages at the end. Only now
> > >> does the second scan take more than 1000ms and autovacuum is at
> risk
> > >> to get killed while at it.
> > >
> > > My concern is that this could certainly also happen to a heavily
> > updated
> > > table in an OLTP type of environment where the requirement to take a
> > > heavy lock to clean it up might prevent it from ever happening.. I
> > was
> > > simply hoping we could find a mechanism to lock just those pages
> we're
> > > getting ready to nuke rather than the entire relation. Perhaps we
> can
> > > consider how to make those changes alongside of changes to eliminate
> > or
> > > reduce the extent locking that has been painful (for me at least)
> when
> > > doing massive parallel loads into a table.
> >
> > I've been testing this with loads of 20 writes/s to that bloated
> table.
> > Preventing not only the clean up, but the following ANALYZE as well is
> > precisely what happens. There may be multiple ways how to get into
> this
> > situation, but once you're there the symptoms are the same. Vacuum
> fails
> > to truncate it and causing a 1 second hiccup every minute, while
> vacuum
> > is holding the exclusive lock until the deadlock detection code of
> > another transaction kills it.
> >
> > My patch doesn't change the logic how we ensure that we don't zap any
> > data by accident with the truncate and Tom's comments suggest we
> should
> > stick to it. It only makes autovacuum check frequently if the
> > AccessExclusiveLock is actually blocking anyone and then get out of
> the
> > way.
> >
> > I would rather like to discuss any ideas how to do all this without 3
> > new GUCs.
> >
> > In the original code, the maximum delay that autovacuum can cause by
> > holding the exclusive lock is one deadlock_timeout (default 1s). It
> > would appear reasonable to me to use max(deadlock_timeout/10,10ms) as
> > the interval to check for a conflicting lock request. For another
> > transaction that needs to access the table this is 10 times faster
> than
> > it is now and still guarantees that autovacuum will make some progress
> > with the truncate.
>
> One other way could be to check after every few pages for a
> conflicting
> lock request.
>
> > The other two GUCs control how often and how fast autovacuum tries to
> > acquire the exclusive lock in the first place. Since we actively
> release
> > the lock *because someone needs it* it is pretty much guaranteed that
> > the immediate next lock attempt fails. We on purpose do a
> > ConditionalLockRelation() because there is a chance to deadlock. The
> > current code only tries one lock attempt and gives up immediately. I
> > don't know from what to derive a good value for how long to retry,
>
> Can't we do something like, after nap check for conditional lock and
> if it
> didn't get
> then get lock unconditionally.
> The reason why after your implementation it might be okay to have lock
> unconditionally after one try is that
> anyway after every few pages or after small time, it will release the
> lock
> if there is any waiter.

I am sorry, at this point trying to take unconditional X lock can lead to
deadlock, so above is not possible.

> > but
> > the nap time in between tries could be a hardcoded 20ms or using the
> > cost based vacuum nap time (which defaults to 20ms).
>
> I think using cost based vacuum nap time or default value is good.
>
> Adding new parameters might have user/administrator overhead, it is
> always
> better if it can be intelligently decided by database itself.
> However if you feel these are parameters which can vary based on
> different
> kind of usage, then I think it is better to expose it through
> configuration
> parameters to users.
>
> With Regards,
> Amit Kapila.
>
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

From:	Amit Kapila <amit(dot)kapila(at)huawei(dot)com>
To:	"'Jan Wieck'" <JanWieck(at)Yahoo(dot)com>
Cc:	"'Stephen Frost'" <sfrost(at)snowman(dot)net>, "'PostgreSQL Development'" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: autovacuum truncate exclusive lock round two
Date:	2012-10-26 10:35:41
Message-ID:	001901cdb365$a6677800$f3366800$@kapila@huawei.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Friday, October 26, 2012 11:50 AM Jan Wieck wrote:
> On 10/26/2012 1:29 AM, Amit Kapila wrote:
> > On Thursday, October 25, 2012 9:46 PM Jan Wieck wrote:
> >> On 10/25/2012 10:12 AM, Stephen Frost wrote:
> >> > Jan,
> >> >
> >> > * Jan Wieck (JanWieck(at)Yahoo(dot)com) wrote:
> >> >> The problem case this patch is dealing with is rolling window
> tables
> >> >> that experienced some bloat. The typical example is a log table,
> >> >> that has new data constantly added and the oldest data constantly
> >> >> purged out. This data normally rotates through some blocks like a
> >> >> rolling window. If for some reason (purging turned off for
> example)
> >> >> this table bloats by several GB and later shrinks back to its
> normal
> >> >> content, soon all the used blocks are at the beginning of the heap
> >> >> and we find tens of thousands of empty pages at the end. Only now
> >> >> does the second scan take more than 1000ms and autovacuum is at
> risk
> >> >> to get killed while at it.
> >> >
> >> > My concern is that this could certainly also happen to a heavily
> >> updated
> >> > table in an OLTP type of environment where the requirement to take
> a
> >> > heavy lock to clean it up might prevent it from ever happening.. I
> >> was
> >> > simply hoping we could find a mechanism to lock just those pages
> we're
> >> > getting ready to nuke rather than the entire relation. Perhaps we
> can
> >> > consider how to make those changes alongside of changes to
> eliminate
> >> or
> >> > reduce the extent locking that has been painful (for me at least)
> when
> >> > doing massive parallel loads into a table.
> >>
> >> I've been testing this with loads of 20 writes/s to that bloated
> table.
> >> Preventing not only the clean up, but the following ANALYZE as well
> is
> >> precisely what happens. There may be multiple ways how to get into
> this
> >> situation, but once you're there the symptoms are the same. Vacuum
> fails
> >> to truncate it and causing a 1 second hiccup every minute, while
> vacuum
> >> is holding the exclusive lock until the deadlock detection code of
> >> another transaction kills it.
> >>
> >> My patch doesn't change the logic how we ensure that we don't zap any
> >> data by accident with the truncate and Tom's comments suggest we
> should
> >> stick to it. It only makes autovacuum check frequently if the
> >> AccessExclusiveLock is actually blocking anyone and then get out of
> the
> >> way.
> >>
> >> I would rather like to discuss any ideas how to do all this without 3
> >> new GUCs.
> >>
> >> In the original code, the maximum delay that autovacuum can cause by
> >> holding the exclusive lock is one deadlock_timeout (default 1s). It
> >> would appear reasonable to me to use max(deadlock_timeout/10,10ms) as
> >> the interval to check for a conflicting lock request. For another
> >> transaction that needs to access the table this is 10 times faster
> than
> >> it is now and still guarantees that autovacuum will make some
> progress
> >> with the truncate.
> >
> > One other way could be to check after every few pages for a
> conflicting
> > lock request.
>
> How is this any different from what my patch does?
The difference is that in the patch it checks for waiters by using 2
parameters autovacuum_truncate_lock_check and blkno%32 and what I
had mentioned was to check only based on blkno.
Will it effect too much if we directly check for waiters after every 32
(any feasible number) blocks?

> Did you even look at the code?
I haven't looked at code when I had given reply to your previous mail. But
now I have checked it.

With Regards,
Amit Kapila.

From:	Jan Wieck <JanWieck(at)Yahoo(dot)com>
To:	Amit Kapila <amit(dot)kapila(at)huawei(dot)com>
Cc:	'Stephen Frost' <sfrost(at)snowman(dot)net>, 'PostgreSQL Development' <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: autovacuum truncate exclusive lock round two
Date:	2012-10-26 13:30:56
Message-ID:	508A9090.6050909@Yahoo.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 10/26/2012 6:35 AM, Amit Kapila wrote:
> On Friday, October 26, 2012 11:50 AM Jan Wieck wrote:
>> On 10/26/2012 1:29 AM, Amit Kapila wrote:
>> > One other way could be to check after every few pages for a
>> conflicting
>> > lock request.
>>
>> How is this any different from what my patch does?
> The difference is that in the patch it checks for waiters by using 2
> parameters autovacuum_truncate_lock_check and blkno%32 and what I
> had mentioned was to check only based on blkno.
> Will it effect too much if we directly check for waiters after every 32
> (any feasible number) blocks?

The blkno%32 is there to not do the gettimeofday() call too often. But
relying on the blkno alone is IMHO not a good idea. It had to be a
number small enough so that even on a busy system and when the pages
have to be read from disk, vacuum checks and releases the lock quickly.
But large enough so that it doesn't create a significant amount of
spinlock calls in the lmgr. We would end up with another parameter,
number of blocks, that is a lot harder to estimate a good value for.

Jan

--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin

From:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To:	Jan Wieck <JanWieck(at)Yahoo(dot)com>
Cc:	Amit Kapila <amit(dot)kapila(at)huawei(dot)com>, 'Stephen Frost' <sfrost(at)snowman(dot)net>, 'PostgreSQL Development' <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: autovacuum truncate exclusive lock round two
Date:	2012-11-23 15:44:14
Message-ID:	20121123154414.GA4558@alvh.no-ip.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Jan,

Are you posting an updated patch?

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services