Re: BUG #12071: Stat collector went crasy (50MB/s constant writes)

From: Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com>
To: pgsql-bugs <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #12071: Stat collector went crasy (50MB/s constant writes)
Date: 2014-11-26 20:07:08
Message-ID: CAK-MWwTmJV4VprhAfCyTKRLndYbwbToF-Z0JYv6_o=qq3VYxyQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

> On one of my servers stat collector went crasy on idle system.
>>> It constantly write around 50MB/s on idle database (no activity at all).
>>>
>>
> OK there are results of some entry-level gdb kung-fu:
>
> problem point is:
> /*
> * Write the stats file if a new request has arrived that is
> not
> * satisfied by existing file.
> */
> if (last_statwrite < last_statrequest)
> pgstat_write_statsfile(false);
>
> on some loop iteration difference between last_statrequest and
> last_statwrite is around 23000-29000 (e.g. 20-30ms).
>
> Hm may be there some clock drift between CPU cores?
>

What's made this situation interesting that the fact the code never got
inside
/*
* If there is clock skew between backends and the collector, we
could
* receive a stats request time that's in the future. If so,
complain
* and reset last_statrequest. Resetting ensures that no inquiry
* message can cause more than one stats file write to occur.
*/
if (last_statrequest > last_statwrite)
...
block.

Watchpoint on the (last_statwrite < last_statrequest) condition repetable
fire ONLY inside:
pgstat_recv_inquiry(PgStat_MsgInquiry *msg, int len)
{
if (msg->inquiry_time > last_statrequest)
last_statrequest = msg->inquiry_time;
}
function.

Now just some musings/theory (I might be dead wrong of course).
Lets suppose there are no clock skew actually and check execution flow.

Main loop executed:
if (last_statwrite < last_statrequest)
pgstat_write_statsfile(false);
and updated last_statwrite value.
However, it took some time because of file size.

During that period (while pgstat busy with writes) some backend issued
PGSTAT_MTYPE_INQUIRY request which will be succesfuly accepted on the next
code lines at
len = recv(pgStatSock, (char *) &msg,...
And as a result the last_statrequest will be updated to the
msg->inquiry_time value.
And this value will be definitely > than last_statwrite (because this
request issued after pgstat worker stared writing new copy of the stat
file).

So these could be case when this situation could happen without time
travel/clock drift.

Kind Regards,
Maksym

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Jeff Janes 2014-11-26 20:29:59 Re: BUG #12071: Stat collector went crasy (50MB/s constant writes)
Previous Message Tomas Vondra 2014-11-26 20:06:37 Re: BUG #12071: Stat collector went crasy (50MB/s constant writes)

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Janes 2014-11-26 20:29:59 Re: BUG #12071: Stat collector went crasy (50MB/s constant writes)
Previous Message Tomas Vondra 2014-11-26 20:06:37 Re: BUG #12071: Stat collector went crasy (50MB/s constant writes)