Re: autovacuum truncate exclusive lock round two

Lists: pgsql-hackers
From: "Kevin Grittner" <kgrittn(at)mail(dot)com>
To: "Jan Wieck" <JanWieck(at)Yahoo(dot)com>
Cc: "Alvaro Herrera" <alvherre(at)2ndquadrant(dot)com>,"Amit Kapila" <amit(dot)kapila(at)huawei(dot)com>, "Stephen Frost" <sfrost(at)snowman(dot)net>, "PostgreSQL Development" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: autovacuum truncate exclusive lock round two
Date: 2012-12-04 13:06:31
Message-ID: 20121204130631.69290@gmx.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Jan Wieck wrote:

> Thinking about it, I'm not really happy with removing the
> autovacuum_truncate_lock_check GUC at all.
>
> Fact is that the deadlock detection code and the configuration
> parameter for it should IMHO have nothing to do with all this in
> the first place. A properly implemented application does not
> deadlock.

I don't agree. I believe that in some cases it is possible and
practicable to set access rules which would prevent deadlocks in
application access to a database. In other cases the convolutions
required in the code, the effort in educating dozens or hundreds of
programmers maintaining the code (and keeping the training current
during staff turnover), and the staff time required for compliance
far outweigh the benefit of an occasional transaction retry.
However, it is enough for your argument that there are cases where
it can be done.

> Someone running such a properly implemented application should be
> able to safely set deadlock_timeout to minutes without the
> slightest ill side effect, but with the benefit that the deadlock
> detection code itself does not add to the lock contention. The
> only reason one cannot do so today is because autovacuum's
> truncate phase could then freeze the application with an
> exclusive lock for that long.
>
> I believe the check interval needs to be decoupled from the
> deadlock_timeout again.

OK

> This will leave us with 2 GUCs at least.

Hmm. What problems do you see with hard-coding reasonable values?
Adding two or three GUC settings for a patch with so little
user-visible impact seems weird. And it seems to me (and also
seemed to Robert) as though the specific values of the other two
settings really aren't that critical as long as they are anywhere
within a reasonable range. Configuring PostgreSQL can be
intimidating enough without adding knobs that really don't do
anything useful. Can you show a case where special values would be
helpful?

-Kevin


From: Jan Wieck <JanWieck(at)Yahoo(dot)com>
To: Kevin Grittner <kgrittn(at)mail(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Amit Kapila <amit(dot)kapila(at)huawei(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, PostgreSQL Development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: autovacuum truncate exclusive lock round two
Date: 2012-12-04 16:55:48
Message-ID: 50BE2B14.400@Yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 12/4/2012 8:06 AM, Kevin Grittner wrote:
> Jan Wieck wrote:
>> I believe the check interval needs to be decoupled from the
>> deadlock_timeout again.
>
> OK
>
>> This will leave us with 2 GUCs at least.
>
> Hmm. What problems do you see with hard-coding reasonable values?

The question is what is reasonable?

Lets talk about the time to (re)acquire the lock first. In the cases
where truncating a table can hurt we are dealing with many gigabytes.
The original vacuumlazy scan of them can take hours if not days. During
that scan the vacuum worker has probably spent many hours napping in the
vacuum delay points. For me 50ms interval for 5 seconds would be
reasonable for (re)acquiring that lock.

The reasoning behind it being that we need some sort of retry mechanism
because if autovacuum just gave up the exclusive lock because someone
needed access, it is more or less guaranteed that the immediate attempt
to reacquire it will fail until that waiter has committed. But if it
can't get a lock after 5 seconds, the system seems busy enough so that
autovacuum should come back much later, when the launcher kicks it off
again.

I don't care much about occupying that autovacuum worker for a few
seconds. It just spent hours vacuuming that very table. How much harm
will a couple more seconds do?

The check interval for the LockHasWaiters() call however depends very
much on the response time constraints of the application. A 200ms
interval for example would cause the truncate phase to hold onto the
exclusive lock for 200ms at least. That means that a steady stream of
short running transactions would see a 100ms "blocking" on average,
200ms max. For many applications that is probably OK. If your response
time constraint is <=50ms on 98% of transactions, you might want to have
that knob though.

I admit I really have no idea what the most reasonable default for that
value would be. Something between 50ms and deadlock_timeout/2 I guess.

Jan

--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin