Re: autovacuum truncate exclusive lock round two

Lists: pgsql-hackers
From: "Kevin Grittner" <kgrittn(at)mail(dot)com>
To: "Alvaro Herrera" <alvherre(at)2ndquadrant(dot)com>,"Jan Wieck" <JanWieck(at)Yahoo(dot)com>
Cc: "Amit Kapila" <amit(dot)kapila(at)huawei(dot)com>,"Stephen Frost" <sfrost(at)snowman(dot)net>, "PostgreSQL Development" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: autovacuum truncate exclusive lock round two
Date: 2012-11-28 20:33:34
Message-ID: 20121128203334.69320@gmx.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Kevin Grittner wrote:

> I still need to review the timing calls, since I'm not familiar
> with them so it wasn't immediately obvious to me whether they
> were being used correctly. I have no reason to believe that they
> aren't, but feel I should check.

It seems odd to me that assignment of one instr_time to another is
done with INSTR_TIME_SET_ZERO() of the target followed by
INSTR_TIME_ADD() with the target and the source. It seems to me
that simple assignment would be more readable, and I can't see any
down side.

Why shouldn't:

   INSTR_TIME_SET_ZERO(elapsed);
   INSTR_TIME_ADD(elapsed, currenttime);
   INSTR_TIME_SUBTRACT(elapsed, starttime);

instead be?:

   elapsed = currenttime;
   INSTR_TIME_SUBTRACT(elapsed, starttime);

And why shouldn't:

   INSTR_TIME_SET_ZERO(starttime);
   INSTR_TIME_ADD(starttime, currenttime);

instead be?:

   starttime = currenttime;

Resetting starttime this way seems especially odd.

> Also, I want to do another pass looking just for off-by-one
> errors on blkno. Again, I have no reason to believe that there is
> a problem; it just seems like it would be a particularly easy
> type of mistake to make and miss when a key structure has this
> field:
>
>   BlockNumber nonempty_pages;
>      /* actually, last nonempty page + 1 */

No problems found with that.

> And I want to test more.

The patch worked as advertised in all my tests, but I became
uncomforatable with the games being played with the last autovacuum
timestamp and the count of dead tuples. Sure, that causes
autovacuum to kick back in until it has dealt with the truncation,
but it makes it hard for a human looking at the stat views to see
what's going on, and I'm not sure it wouldn't lead to bad plans due
to stale statistics.

Personally, I would rather see us add a boolean to indicate that
autovacuum was needed (regardless of the math on the other columns)
and use that to control the retries -- leaving the other columns
free to reflect reality.

-Kevin


From: Jan Wieck <JanWieck(at)Yahoo(dot)com>
To: Kevin Grittner <kgrittn(at)mail(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Amit Kapila <amit(dot)kapila(at)huawei(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, PostgreSQL Development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: autovacuum truncate exclusive lock round two
Date: 2012-11-29 14:41:54
Message-ID: 50B77432.10007@Yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 11/28/2012 3:33 PM, Kevin Grittner wrote:
> Kevin Grittner wrote:
>
>> I still need to review the timing calls, since I'm not familiar
>> with them so it wasn't immediately obvious to me whether they
>> were being used correctly. I have no reason to believe that they
>> aren't, but feel I should check.
>
> It seems odd to me that assignment of one instr_time to another is
> done with INSTR_TIME_SET_ZERO() of the target followed by
> INSTR_TIME_ADD() with the target and the source. It seems to me
> that simple assignment would be more readable, and I can't see any
> down side.
>
> Why shouldn't:
>
> INSTR_TIME_SET_ZERO(elapsed);
> INSTR_TIME_ADD(elapsed, currenttime);
> INSTR_TIME_SUBTRACT(elapsed, starttime);
>
> instead be?:
>
> elapsed = currenttime;
> INSTR_TIME_SUBTRACT(elapsed, starttime);
>
> And why shouldn't:
>
> INSTR_TIME_SET_ZERO(starttime);
> INSTR_TIME_ADD(starttime, currenttime);
>
> instead be?:
>
> starttime = currenttime;
>
> Resetting starttime this way seems especially odd.

instr_time is LARGE_INTEGER on Win32 but struct timeval on Unix. Is

starttime = currenttime;

portable if those are structs?

>
>> Also, I want to do another pass looking just for off-by-one
>> errors on blkno. Again, I have no reason to believe that there is
>> a problem; it just seems like it would be a particularly easy
>> type of mistake to make and miss when a key structure has this
>> field:
>>
>> BlockNumber nonempty_pages;
>> /* actually, last nonempty page + 1 */
>
> No problems found with that.
>
>> And I want to test more.
>
> The patch worked as advertised in all my tests, but I became
> uncomforatable with the games being played with the last autovacuum
> timestamp and the count of dead tuples. Sure, that causes
> autovacuum to kick back in until it has dealt with the truncation,
> but it makes it hard for a human looking at the stat views to see
> what's going on, and I'm not sure it wouldn't lead to bad plans due
> to stale statistics.
>
> Personally, I would rather see us add a boolean to indicate that
> autovacuum was needed (regardless of the math on the other columns)
> and use that to control the retries -- leaving the other columns
> free to reflect reality.

You mean to extend the stats by yet another column? The thing is that
this whole case happens rarely. We see this every other month or so and
only on a rolling window table after it got severely bloated due to some
abnormal use (purging disabled for some operational reason). The whole
situation resolves itself after a few minutes to maybe an hour or two.

Personally I don't think living with a few wrong stats on one table for
that time is so bad that it warrants changing that much more code.

Lower casing TRUE/FALSE will be done.

If the LW_SHARED is enough in LockHasWaiters(), that's fine too.

I think we have a consensus that the check interval should be derived
from deadlock_timeout/10 clamped to 10ms. I'm going to remove that GUC.

About the other two, I'm just not sure. Maybe using half the value as
for the lock waiter interval as the lock retry interval, again clamped
to 10ms, and simply leaving one GUC that controls how long autovacuum
should retry this. I'm not too worried about the retry interval.
However, the problem with that overall retry duration is that this value
very much depends on the usage pattern of the database. If long running
transactions (like >5 seconds) are the norm, then a hard coded value of
2 seconds before autovacuum gives up isn't going to help much.

Jan

--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jan Wieck <JanWieck(at)Yahoo(dot)com>
Cc: Kevin Grittner <kgrittn(at)mail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Amit Kapila <amit(dot)kapila(at)huawei(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, PostgreSQL Development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: autovacuum truncate exclusive lock round two
Date: 2012-11-29 14:46:47
Message-ID: 7643.1354200407@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Jan Wieck <JanWieck(at)Yahoo(dot)com> writes:
> On 11/28/2012 3:33 PM, Kevin Grittner wrote:
>> Resetting starttime this way seems especially odd.

> instr_time is LARGE_INTEGER on Win32 but struct timeval on Unix. Is
> starttime = currenttime;
> portable if those are structs?

Sure. We rely on struct assignment in lots of places.

regards, tom lane


From: Jan Wieck <JanWieck(at)Yahoo(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Kevin Grittner <kgrittn(at)mail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Amit Kapila <amit(dot)kapila(at)huawei(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, PostgreSQL Development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: autovacuum truncate exclusive lock round two
Date: 2012-11-29 14:57:16
Message-ID: 50B777CC.8070609@Yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 11/29/2012 9:46 AM, Tom Lane wrote:
> Jan Wieck <JanWieck(at)Yahoo(dot)com> writes:
>> On 11/28/2012 3:33 PM, Kevin Grittner wrote:
>>> Resetting starttime this way seems especially odd.
>
>> instr_time is LARGE_INTEGER on Win32 but struct timeval on Unix. Is
>> starttime = currenttime;
>> portable if those are structs?
>
> Sure. We rely on struct assignment in lots of places.

Will be done then.

Thanks,
Jan

--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin


From: Jan Wieck <JanWieck(at)Yahoo(dot)com>
To: Kevin Grittner <kgrittn(at)mail(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Amit Kapila <amit(dot)kapila(at)huawei(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, PostgreSQL Development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: autovacuum truncate exclusive lock round two
Date: 2012-12-02 15:13:18
Message-ID: 50BB700E.8060404@Yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Attached is a new patch that addresses most of the points raised in
discussion before.

1) Most of the configuration variables are derived from deadlock_timeout
now. The "check for conflicting lock request" interval is
deadlock_timeout/10, clamped to 10ms. The "try to acquire exclusive
lock" interval is deadlock_timeout/20, also clamped to 10ms. The only
GUC variable remaining is autovacuum_truncate_lock_try=2000ms with a
range from 0 (just try once) to 20000ms.

I'd like to point out that this is a significant change in functionality
as without the config option for the check interval, there is no longer
any possibility to disable the call to LockHasWaiters() and return to
the original (deadlock code kills autovacuum) behavior.

2) The partition lock in LockHasWaiters() was lowered to LW_SHARED. The
LW_EXCLUSIVE was indeed a copy/paste result.

3) The instr_time handling was simplified as suggested.

4) Lower case TRUE/FALSE.

I did not touch the part about suppressing the stats and the ANALYZE
step of "auto vacuum+analyze". The situation is no different today. When
the deadlock code kills autovacuum, stats aren't updated either. And
this patch is meant to cause autovacuum to finish the truncate in a few
minutes or hours, so that the situation fixes itself.

Jan

--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin

Attachment Content-Type Size
autovacuum-truncate-lock-2.diff text/x-patch 19.3 KB