From: | Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com> |
---|---|
To: | pgsql-bugs <pgsql-bugs(at)postgresql(dot)org> |
Subject: | Re: BUG #12071: Stat collector went crasy (50MB/s constant writes) |
Date: | 2014-11-26 20:07:08 |
Message-ID: | CAK-MWwTmJV4VprhAfCyTKRLndYbwbToF-Z0JYv6_o=qq3VYxyQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs pgsql-hackers |
> On one of my servers stat collector went crasy on idle system.
>>> It constantly write around 50MB/s on idle database (no activity at all).
>>>
>>
> OK there are results of some entry-level gdb kung-fu:
>
> problem point is:
> /*
> * Write the stats file if a new request has arrived that is
> not
> * satisfied by existing file.
> */
> if (last_statwrite < last_statrequest)
> pgstat_write_statsfile(false);
>
> on some loop iteration difference between last_statrequest and
> last_statwrite is around 23000-29000 (e.g. 20-30ms).
>
> Hm may be there some clock drift between CPU cores?
>
What's made this situation interesting that the fact the code never got
inside
/*
* If there is clock skew between backends and the collector, we
could
* receive a stats request time that's in the future. If so,
complain
* and reset last_statrequest. Resetting ensures that no inquiry
* message can cause more than one stats file write to occur.
*/
if (last_statrequest > last_statwrite)
...
block.
Watchpoint on the (last_statwrite < last_statrequest) condition repetable
fire ONLY inside:
pgstat_recv_inquiry(PgStat_MsgInquiry *msg, int len)
{
if (msg->inquiry_time > last_statrequest)
last_statrequest = msg->inquiry_time;
}
function.
Now just some musings/theory (I might be dead wrong of course).
Lets suppose there are no clock skew actually and check execution flow.
Main loop executed:
if (last_statwrite < last_statrequest)
pgstat_write_statsfile(false);
and updated last_statwrite value.
However, it took some time because of file size.
During that period (while pgstat busy with writes) some backend issued
PGSTAT_MTYPE_INQUIRY request which will be succesfuly accepted on the next
code lines at
len = recv(pgStatSock, (char *) &msg,...
And as a result the last_statrequest will be updated to the
msg->inquiry_time value.
And this value will be definitely > than last_statwrite (because this
request issued after pgstat worker stared writing new copy of the stat
file).
So these could be case when this situation could happen without time
travel/clock drift.
Kind Regards,
Maksym
From | Date | Subject | |
---|---|---|---|
Next Message | Jeff Janes | 2014-11-26 20:29:59 | Re: BUG #12071: Stat collector went crasy (50MB/s constant writes) |
Previous Message | Tomas Vondra | 2014-11-26 20:06:37 | Re: BUG #12071: Stat collector went crasy (50MB/s constant writes) |
From | Date | Subject | |
---|---|---|---|
Next Message | Jeff Janes | 2014-11-26 20:29:59 | Re: BUG #12071: Stat collector went crasy (50MB/s constant writes) |
Previous Message | Tomas Vondra | 2014-11-26 20:06:37 | Re: BUG #12071: Stat collector went crasy (50MB/s constant writes) |