Re: Better way of dealing with pgstat wait timeout during buildfarm runs?

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Better way of dealing with pgstat wait timeout during buildfarm runs?
Date: 2014-12-28 01:45:22
Message-ID: CA+TgmoZHEg1aHAewYpV9yCbFFj8sDOFufMnPgyT_2jkj2nU89A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Dec 27, 2014 at 7:55 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> writes:
>> On 12/27/2014 12:16 AM, Alvaro Herrera wrote:
>>> Tom Lane wrote:
>>>> The argument that autovac workers need fresher stats than anything else
>>>> seems pretty dubious to start with. Why shouldn't we simplify that down
>>>> to "they use PGSTAT_STAT_INTERVAL like everybody else"?
>
>>> The point of wanting fresher stats than that, eons ago, was to avoid a
>>> worker vacuuming a table that some other worker vacuumed more recently
>>> than PGSTAT_STAT_INTERVAL. ...
>>> Nowadays we can probably disregard the whole issue, since starting a new
>>> vacuum just after the prior one finished should not cause much stress to
>>> the system thanks to the visibility map.
>
>> Vacuuming is far from free, even if the visibility map says that most
>> pages are visible to all: you still scan all indexes, if you remove any
>> dead tuples at all.
>
> With typical autovacuum settings, I kinda doubt that there's much value in
> reducing the window for this problem from 500ms to 10ms. As Alvaro says,
> this was just a partial, kluge solution from the start --- if we're
> worried about such duplicate vacuuming, we should undertake a real
> solution that closes the window altogether. In any case, timeouts
> occurring inside autovacuum are not directly causing the buildfarm
> failures, since autovacuum's log entries don't reflect into regression
> outputs. (It's possible that autovacuum's tight tolerance is contributing
> to the failures by increasing the load on the stats collector, but I'm
> not sure I believe that.)
>
> To get back to that original complaint about buildfarm runs failing,
> I notice that essentially all of those failures are coming from "wait
> timeout" warnings reported by manual VACUUM commands. Now, VACUUM itself
> has no need to read the stats files. What's actually causing these
> messages is failure to get a timely response in pgstat_vacuum_stat().
> So let me propose a drastic solution: let's dike out this bit in vacuum.c:
>
> /*
> * Send info about dead objects to the statistics collector, unless we are
> * in autovacuum --- autovacuum.c does this for itself.
> */
> if ((vacstmt->options & VACOPT_VACUUM) && !IsAutoVacuumWorkerProcess())
> pgstat_vacuum_stat();
>
> This would have the effect of transferring all responsibility for
> dead-stats-entry cleanup to autovacuum. For ordinary users, I think
> that'd be just fine. It might be less fine though for people who
> disable autovacuum, if there still are any.

-1. I don't think it's a good idea to inflict pain on people who want
to schedule their vacuums manually (and yes, there are some) to get
clean buildfarm runs.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2014-12-28 01:51:47 Re: Better way of dealing with pgstat wait timeout during buildfarm runs?
Previous Message Robert Haas 2014-12-28 01:36:27 Re: nls and server log