Re: Re: Better way of dealing with pgstat wait timeout during buildfarm runs?

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: Better way of dealing with pgstat wait timeout during buildfarm runs?
Date: 2015-01-18 21:09:29
Message-ID: 9765.1421615369@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Noah Misch <noah(at)leadboat(dot)com> writes:
> On Sat, Dec 27, 2014 at 07:55:05PM -0500, Tom Lane wrote:
>> To get back to that original complaint about buildfarm runs failing,
>> I notice that essentially all of those failures are coming from "wait
>> timeout" warnings reported by manual VACUUM commands. Now, VACUUM itself
>> has no need to read the stats files. What's actually causing these
>> messages is failure to get a timely response in pgstat_vacuum_stat().
>> So let me propose a drastic solution: let's dike out this bit in vacuum.c:
>>
>> /*
>> * Send info about dead objects to the statistics collector, unless we are
>> * in autovacuum --- autovacuum.c does this for itself.
>> */
>> if ((vacstmt->options & VACOPT_VACUUM) && !IsAutoVacuumWorkerProcess())
>> pgstat_vacuum_stat();
>>
>> This would have the effect of transferring all responsibility for
>> dead-stats-entry cleanup to autovacuum. For ordinary users, I think
>> that'd be just fine. It might be less fine though for people who
>> disable autovacuum, if there still are any.

> Like Robert, I don't care for that.

I obviously failed to make it clear enough that that was meant as a straw
man, lets-think-outside-the-box proposal.

>> Or, much more simply, we could conclude that it's not that important
>> if pgstat_vacuum_stat() is unable to get up-to-date stats, and rejigger
>> the code so that we don't bleat when the file-reading request comes from
>> that source. This should be a back-patchable fix, whereas #2 above might
>> be a bit too complicated for that.

> This, however, feels like a clear improvement. When every VACUUM does
> pgstat_vacuum_stat(), it doesn't matter if any given VACUUM misses removing
> entries that became obsolete in the preceding second or so. In fact, rather
> than merely not bleating when the wait times out, let's not wait at all. I
> don't favor VACUUM of a small table taking 12s because it spent 2s on the
> table and 10s waiting to garbage collect a piping-hot stats file.

I think that would be going too far: if an autovac wakes up in the dead of
night, it should not use an ancient stats file merely because no one else
has asked for stats recently. But if it never waits at all, it'll be
at the mercy of whatever is already on disk.

After looking at the code, the minimum-change alternative would be more or
less as attached: first, get rid of the long-obsolete proposition that
autovacuum workers need fresher-than-usual stats; second, allow
pgstat_vacuum_stat to accept stats that are moderately stale (the number
given below allows them to be up to 50 seconds old); and third, suppress
wait-timeout warnings when the call is from pgstat_vacuum_stat. The third
point is what we need to avoid unnecessary buildfarm failures. The second
point addresses the idea that we don't need to stress the stats collector
too much for this.

regards, tom lane

Attachment Content-Type Size
suppress-vacuum-wait-timeout-warnings-1.patch text/x-diff 7.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2015-01-18 22:48:11 Reducing buildfarm disk usage: remove temp installs when done
Previous Message Pavel Stehule 2015-01-18 19:30:46 Re: proposal: row_to_array function