Re: autovacuum truncate exclusive lock round two

Lists: pgsql-hackers
From: Jan Wieck <JanWieck(at)Yahoo(dot)com>
To: PostgreSQL Development <pgsql-hackers(at)postgresql(dot)org>
Subject: autovacuum truncate exclusive lock round two
Date: 2012-10-24 20:20:38
Message-ID: 50884D96.7010601@Yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

This problem has been discussed before. Those familiar with the subject
please skip the next paragraph.

When autovacuum finds a substantial amount of empty pages at the end of
a relation, it attempts to truncate it in lazy_truncate_heap(). Because
all the scanning had been done in parallel to normal DB activity, it
needs to verify that all those blocks are still empty. To do that
autovacuum grabs an AccessExclusiveLock on the relation, then scans
backwards to the last non-empty page. If any other backend needs to
access that table during this time, it will kill the autovacuum from the
deadlock detection code, which by default is done after a 1000ms
timeout. The autovacuum launcher will start another vacuum after
(default) 60 seconds, which most likely is getting killed again, and
again, and again. The net result of this is that the table is never
truncated and every 60 seconds there is a 1 second hiccup before the
autovacuum is killed.

Proposal:

Add functions to lmgr that are derived from the lock release code, but
instead of releasing the lock and waking up waiters, just return a
boolean telling if there are any waiters that would be woken up if this
lock was released.

Use this lmgr feature inside count_nondeletable_pages() of vacuumlazy.c
to periodically check, if there is a conflicting lock request waiting.
If not, keep going. If there is a waiter, truncate the relation to the
point checked thus far, release the AccessExclusiveLock, then loop back
to where we acquire this lock in the first place and continue
checking/truncating.

I have a working patch here:

https://github.com/wieck/postgres/tree/autovacuum-truncate-lock

This patch does introduce three new postgresql.conf parameters, which I
would be happy to get rid of if we could derive them from something
else. Something based on the deadlock timeout may be possible.

autovacuum_truncate_lock_check = 100ms # how frequent to check
# for conflicting locks
autovacuum_truncate_lock_retry = 50 # how often to try acquiring
# the exclusive lock
autovacuum_truncate_lock_wait = 20ms # nap in between attempts

With these settings, I see the truncate of a bloated table progressing
at a rate of 3 minutes per GB, while that table is accessed 20 times per
second.

The original "kill autovacuum" mechanism in the deadlock code is still
there. All this code really does is 10 lmgr lookups per second and
releasing the AccessExclusiveLock if there are any waiters. I don't
think it can get any cheaper than this.

I am attaching a script that uses pgbench to demonstrate the actual
problem of a bloated table with significant empty pages at the end.

Comments?

Jan

--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin

Attachment Content-Type Size
t1.autovac-lock-issue.tgz application/x-compressed 2.0 KB

From: Jan Wieck <JanWieck(at)Yahoo(dot)com>
To: PostgreSQL Development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: autovacuum truncate exclusive lock round two
Date: 2012-10-24 22:20:40
Message-ID: 508869B8.3080000@Yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Here is the patch for it.

Jan

--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin

Attachment Content-Type Size
autovacuum-truncate-lock.diff text/x-patch 19.1 KB

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Jan Wieck <JanWieck(at)Yahoo(dot)com>
Cc: PostgreSQL Development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: autovacuum truncate exclusive lock round two
Date: 2012-10-25 02:22:14
Message-ID: 20121025022214.GA5162@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Jan,

* Jan Wieck (JanWieck(at)Yahoo(dot)com) wrote:
> This problem has been discussed before. Those familiar with the
> subject please skip the next paragraph.

Apologies if this was already thought-of and ruled out for some reason,
but...

> Because all the scanning had been done in parallel to normal DB
> activity, it needs to verify that all those blocks are still empty.

Would it be possible to use the FSM to figure out if things have changed
since the last scan..? Does that scan update the FSM, which would then
be updated by another backend in the event that it decided to write
something there? Or do we consider the FSM to be completely
untrustworthy wrt this (and if so, I don't suppose there's any hope to
using the visibility map...)?

The notion of having to double-scan and the AccessExclusiveLock on the
relation are telling me this work-around, while completely possible,
isn't exactly ideal...

Perhaps another option would be a page-level or something which is
larger than per-row (strikes me as a lot of overhead for this and it's
not clear how we'd do it), but less than an entire relation, but there
are certainly pain points there too.

Thanks,

Stephen


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Jan Wieck <JanWieck(at)yahoo(dot)com>
Cc: PostgreSQL Development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: autovacuum truncate exclusive lock round two
Date: 2012-10-29 11:20:40
Message-ID: CA+TgmoZA-MR+aPisCAB6byGDixAFd9etyq18v0rv=k0BpYm9EQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Oct 24, 2012 at 4:20 PM, Jan Wieck <JanWieck(at)yahoo(dot)com> wrote:
> This patch does introduce three new postgresql.conf parameters, which I
> would be happy to get rid of if we could derive them from something else.
> Something based on the deadlock timeout may be possible.
>
> autovacuum_truncate_lock_check = 100ms # how frequent to check
> # for conflicting locks
> autovacuum_truncate_lock_retry = 50 # how often to try acquiring
> # the exclusive lock
> autovacuum_truncate_lock_wait = 20ms # nap in between attempts

+1 for this general approach.

As you suggested downthread, I think that hard-coding
autovacuum_truncate_lock_check to one-tenth of the deadlock timeout
should be just fine. For the other two parameters, I doubt we need to
make them configurable at all. It's not exactly clear what to set
them to, but it does seem clear that the down side of setting them
incorrectly isn't very much as long as the defaults are roughly sane.
Personally, I'd be inclined to retry less frequently but over a
slightly longer time period - say twenty retries, one after every
100ms. But I wouldn't be upset if we settled on what you've got here,
either. We just don't want to let the total time we spend waiting for
the lock get too long, because that means pinning down an auto-vacuum
worker that might be critically needed elsewhere. So the product of
autovacuum_truncate_lock_retry and autovacuum_truncate_lock_wait
probably should not be more than a couple of seconds.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>
To: Jan Wieck <JanWieck(at)Yahoo(dot)com>
Cc: PostgreSQL Development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: autovacuum truncate exclusive lock round two
Date: 2012-11-15 22:31:47
Message-ID: m24nkq8pm4.fsf@2ndQuadrant.fr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Jan Wieck <JanWieck(at)Yahoo(dot)com> writes:
> Use this lmgr feature inside count_nondeletable_pages() of vacuumlazy.c to
> periodically check, if there is a conflicting lock request waiting. If not,
> keep going. If there is a waiter, truncate the relation to the point checked
> thus far, release the AccessExclusiveLock, then loop back to where we
> acquire this lock in the first place and continue checking/truncating.

I think that maybe we could just bail out after releasing the
AccessExclusiveLock and trust autovacuum to get back to truncating that
relation later. That would allow removing 2 of the 3 GUCs below:

> autovacuum_truncate_lock_check = 100ms # how frequent to check
> # for conflicting locks

This is the one remaining. Could we maybe check for lock conflict after
every move backward a page, or some multiple thereof? The goal would be
to ensure that progress is made, while also being aware of concurrent
activity, ala CHECK_FOR_INTERRUPTS().

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>
Cc: Jan Wieck <JanWieck(at)Yahoo(dot)com>, PostgreSQL Development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: autovacuum truncate exclusive lock round two
Date: 2012-11-15 22:39:26
Message-ID: 20121115223926.GD14789@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Dimitri Fontaine wrote:
> Jan Wieck <JanWieck(at)Yahoo(dot)com> writes:
> > Use this lmgr feature inside count_nondeletable_pages() of vacuumlazy.c to
> > periodically check, if there is a conflicting lock request waiting. If not,
> > keep going. If there is a waiter, truncate the relation to the point checked
> > thus far, release the AccessExclusiveLock, then loop back to where we
> > acquire this lock in the first place and continue checking/truncating.
>
> I think that maybe we could just bail out after releasing the
> AccessExclusiveLock and trust autovacuum to get back to truncating that
> relation later.

That doesn't work, because the truncating code is not reached unless
vacuuming actually took place. So if you interrupt it, it will just not
get called again later. Maybe we could have autovacuum somehow invoke
that separately, but that would require that the fact that truncation
was aborted is kept track of somewhere.

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Amit Kapila <amit(dot)kapila(at)huawei(dot)com>
To: "'Alvaro Herrera'" <alvherre(at)2ndquadrant(dot)com>, "'Dimitri Fontaine'" <dimitri(at)2ndQuadrant(dot)fr>
Cc: "'Jan Wieck'" <JanWieck(at)Yahoo(dot)com>, "'PostgreSQL Development'" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: autovacuum truncate exclusive lock round two
Date: 2012-11-16 05:43:58
Message-ID: 007e01cdc3bd$62402a60$26c07f20$@kapila@huawei.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Friday, November 16, 2012 4:09 AM Alvaro Herrera wrote:
> Dimitri Fontaine wrote:
> > Jan Wieck <JanWieck(at)Yahoo(dot)com> writes:
> > > Use this lmgr feature inside count_nondeletable_pages() of
> vacuumlazy.c to
> > > periodically check, if there is a conflicting lock request waiting.
> If not,
> > > keep going. If there is a waiter, truncate the relation to the point
> checked
> > > thus far, release the AccessExclusiveLock, then loop back to where
> we
> > > acquire this lock in the first place and continue
> checking/truncating.
> >
> > I think that maybe we could just bail out after releasing the
> > AccessExclusiveLock and trust autovacuum to get back to truncating
> that
> > relation later.
>
> That doesn't work, because the truncating code is not reached unless
> vacuuming actually took place. So if you interrupt it, it will just not
> get called again later. Maybe we could have autovacuum somehow invoke
> that separately, but that would require that the fact that truncation
> was aborted is kept track of somewhere.

Won't it have a chance to be handled next time when vacuum will trigger due
to updates/deletes on some other pages.

OTOH, may be next time again the same thing happens and it was not able to
complete the truncate.
So I think it's better to complete first time only, but may be using some
heuristic time for wait and retry rather than
with configuration variables.

With Regards,
Amit Kapila