Re: BUG #12071: Stat collector went crasy (50MB/s constant writes)

Lists: pgsql-bugspgsql-hackers
From: maxim(dot)boguk(at)gmail(dot)com
To: pgsql-bugs(at)postgresql(dot)org
Subject: BUG #12071: Stat collector went crasy (50MB/s constant writes)
Date: 2014-11-26 13:08:10
Message-ID: 20141126130810.15507.39189@wrigleys.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

The following bug has been logged on the website:

Bug reference: 12071
Logged by: Maxim Boguk
Email address: maxim(dot)boguk(at)gmail(dot)com
PostgreSQL version: 9.2.9
Operating system: Linux
Description:

Hi,

On one of my servers stat collector went crasy on idle system.
It constantly write around 50MB/s on idle database (no activity at all).

What I tried:
1)stop/start the database - no effect
2)update to 9.2.9 from 9.2.7 - no effect

Size of the pgstat.stat file: 5270114 bytes (reasonable small).

strace look like:

open("pg_stat_tmp/pgstat.tmp", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 7
fstat(7, {st_mode=S_IFREG|0600, st_size=0, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) =
0x7f90455b5000
write(7,
"\232\274\245\1fCWw\301\253\1\0\370\27\1\0\0\0\0\0\242\7\0\0\0\0\0\0\211\2569\270"...,
4096) = 4096
write(7,
"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096)
= 4096
write(7,
"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0T(at)\340N"..., 4096)
= 4096
write(7,
"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096)
= 4096
write(7,
"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096)
= 4096
write(7,
"\0\0\0\0\0\2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096)
= 4096
write(7,
"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096)
= 4096

I see nothing real weird in strace output.

It's test server with full access so I can try a lot of things but at this
moment I out of ideas.

Kind Regards,
Maksym


From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: maxim(dot)boguk(at)gmail(dot)com
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #12071: Stat collector went crasy (50MB/s constant writes)
Date: 2014-11-26 13:15:26
Message-ID: 20141126131526.GL1639@alvin.alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

maxim(dot)boguk(at)gmail(dot)com wrote:

> Hi,
>
> On one of my servers stat collector went crasy on idle system.
> It constantly write around 50MB/s on idle database (no activity at all).
>
> What I tried:
> 1)stop/start the database - no effect
> 2)update to 9.2.9 from 9.2.7 - no effect

Is the server time correct? What is autovacuum doing?

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: pgsql-bugs <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #12071: Stat collector went crasy (50MB/s constant writes)
Date: 2014-11-26 13:25:33
Message-ID: CAK-MWwTRDpwNAepESosHi5J805jy6NmODQARpyVGR0PeFKwNeA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On Wed, Nov 26, 2014 at 4:15 PM, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
wrote:

> maxim(dot)boguk(at)gmail(dot)com wrote:
>
> > Hi,
> >
> > On one of my servers stat collector went crasy on idle system.
> > It constantly write around 50MB/s on idle database (no activity at all).
> >
> > What I tried:
> > 1)stop/start the database - no effect
> > 2)update to 9.2.9 from 9.2.7 - no effect
>
> Is the server time correct? What is autovacuum doing

Server time look correct for me.
Аutovacuum doing nothing since no activity on server at all (I checked it
twice).

After careful examination of strace output I found one strange thing:
constant flow of:
16:22:33.202523 read(3, 0x7fff4148a5b0, 16) = -1 EAGAIN (Resource
temporarily unavailable)
16:22:33.297432 read(3, 0x7fff4148a5b0, 16) = -1 EAGAIN (Resource
temporarily unavailable)
16:22:33.322471 read(3, 0x7fff4148a5b0, 16) = -1 EAGAIN (Resource
temporarily unavailable)
16:22:33.417873 read(3, 0x7fff4148a5b0, 16) = -1 EAGAIN (Resource
temporarily unavailable)
16:22:33.456801 read(3, 0x7fff4148a5b0, 16) = -1 EAGAIN (Resource
temporarily unavailable)
But I not sure that is actual error signal or normal behaviour.

--
Maxim Boguk
Senior Postgresql DBA
http://www.postgresql-consulting.ru/ <http://www.postgresql-consulting.com/>


From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com>
Cc: pgsql-bugs <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #12071: Stat collector went crasy (50MB/s constant writes)
Date: 2014-11-26 13:35:36
Message-ID: 20141126133536.GM1639@alvin.alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Maxim Boguk wrote:

> After careful examination of strace output I found one strange thing:
> constant flow of:
> 16:22:33.202523 read(3, 0x7fff4148a5b0, 16) = -1 EAGAIN (Resource
> temporarily unavailable)

Uh. What is on fd 3? (check /proc). If it's the stat socket, maybe
you're having network issues with localhost UDP connections or
something.

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: pgsql-bugs <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #12071: Stat collector went crasy (50MB/s constant writes)
Date: 2014-11-26 14:02:05
Message-ID: CAK-MWwSnRjk-q_SNu+oOYwjU9fik5uXOez9v-v4Ua40aFjUNRA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On Wed, Nov 26, 2014 at 4:35 PM, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
wrote:

> Maxim Boguk wrote:
>
> > After careful examination of strace output I found one strange thing:
> > constant flow of:
> > 16:22:33.202523 read(3, 0x7fff4148a5b0, 16) = -1 EAGAIN (Resource
> > temporarily unavailable)
>
> Uh. What is on fd 3? (check /proc). If it's the stat socket, maybe
> you're having network issues with localhost UDP connections or
> something.
>

It's pipe to itself:
root(at)db13:~# ls -la /proc/26933/fd/ | grep 1343367616
lr-x------ 1 postgres postgres 64 Nov 26 16:54 3 -> pipe:[1343367616]
l-wx------ 1 postgres postgres 64 Nov 26 16:54 6 -> pipe:[1343367616]

--
Maxim Boguk
Senior Postgresql DBA
http://www.postgresql-consulting.ru/ <http://www.postgresql-consulting.com/>


From: Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com>
To: pgsql-bugs <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #12071: Stat collector went crasy (50MB/s constant writes)
Date: 2014-11-26 18:08:43
Message-ID: CAK-MWwSDmXTWCc3F2Wj3Zw6WZ_3JSKHaXFqZ2TDuq1jfKZOu=A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

> On one of my servers stat collector went crasy on idle system.
> It constantly write around 50MB/s on idle database (no activity at all).
>

Detailed strace look like endless tight loop of:

... 5MB writes ...
21:03:01.131550 write(7,
"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
2188) = 2188
21:03:01.131613 close(7) = 0
21:03:01.131674 munmap(0x7fde72dc1000, 4096) = 0
21:03:01.131743 rename("/mnt/tmpfs/pgstat.tmp", "/mnt/tmpfs/pgstat.stat") =
0
21:03:01.131826 recvfrom(11, 0x7fff4148a620, 1000, 0, 0, 0) = -1 EAGAIN
(Resource temporarily unavailable)
21:03:01.131889 read(3, 0x7fff4148a5b0, 16) = -1 EAGAIN (Resource
temporarily unavailable)
21:03:01.131942 poll([{fd=11, events=POLLIN}, {fd=3, events=POLLIN}, {fd=8,
events=POLLIN}], 3, -1) = 1 ([{fd=11
, revents=POLLIN}])
21:03:01.144257 recvfrom(11,
"\10\0\0\0\30\0\0\0V\277\363\23\0\0\0\0(\262k\233\305\253\1\0", 1000, 0,
NULL, NULL
) = 24
21:03:01.144354 recvfrom(11, 0x7fff4148a620, 1000, 0, 0, 0) = -1 EAGAIN
(Resource temporarily unavailable)
21:03:01.144409 read(3, 0x7fff4148a5b0, 16) = -1 EAGAIN (Resource
temporarily unavailable)
21:03:01.144479 poll([{fd=11, events=POLLIN}, {fd=3, events=POLLIN}, {fd=8,
events=POLLIN}], 3, -1) = 1 ([{fd=11
, revents=POLLIN}])
21:03:01.146321 recvfrom(11, "\1\0\0\0\20\0\0\0X\223k\233\305\253\1\0",
1000, 0, NULL, NULL) = 16
21:03:01.146382 open("/mnt/tmpfs/pgstat.tmp", O_WRONLY|O_CREAT|O_TRUNC,
0666) = 7
21:03:01.146458 fstat(7, {st_mode=S_IFREG|0600, st_size=0, ...}) = 0
21:03:01.146519 mmap(NULL, 4096, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fde72dc1000
21:03:01.146592 write(7,
"\232\274\245\1N\273k\233\305\253\1\0\374\27\1\0\0\0\0\0\250\7\0\0\0\0\0\0\311\331\243\
270"..., 4096) = 4096
21:03:01.146664 write(7,
"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
4096) = 4096
... 5MB writes...
and so on, with no visible changes...

Kind Regards,
Maksym


From: Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com>
To: pgsql-bugs <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #12071: Stat collector went crasy (50MB/s constant writes)
Date: 2014-11-26 18:48:30
Message-ID: CAK-MWwQJF+EM5f_47r5wLpr6_2FBYNFeKKAD-DPP3JtzuAM15Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On Wed, Nov 26, 2014 at 9:08 PM, Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com> wrote:

>
> On one of my servers stat collector went crasy on idle system.
>> It constantly write around 50MB/s on idle database (no activity at all).
>>
>
OK there are results of some low-level gdb kung-fu:

problem point is:
/*
* Write the stats file if a new request has arrived that is not
* satisfied by existing file.
*/
if (last_statwrite < last_statrequest)
pgstat_write_statsfile(false);

on some loop iteration difference between last_statrequest and
last_statwrite is around 23000-29000 (e.g. 20-30ms).

Hm may be there some clock drift between CPU cores?

Kind Regards,
Maksym

--
Maxim Boguk
Senior Postgresql DBA
http://www.postgresql-consulting.ru/ <http://www.postgresql-consulting.com/>


From: Tomas Vondra <tv(at)fuzzy(dot)cz>
To: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #12071: Stat collector went crasy (50MB/s constant writes)
Date: 2014-11-26 20:06:37
Message-ID: 547632CD.9080609@fuzzy.cz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On 26.11.2014 19:48, Maxim Boguk wrote:
>
>
> On Wed, Nov 26, 2014 at 9:08 PM, Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com
> <mailto:maxim(dot)boguk(at)gmail(dot)com>> wrote:
>
>
> On one of my servers stat collector went crasy on idle system. It
> constantly write around 50MB/s on idle database (no activity at
> all).
>
> OK there are results of some low-level gdb kung-fu:
>
> problem point is:
> /*
> * Write the stats file if a new request has arrived that is not
> * satisfied by existing file.
> */
> if (last_statwrite < last_statrequest)
> pgstat_write_statsfile(false);
>
> on some loop iteration difference between last_statrequest and
> last_statwrite is around 23000-29000 (e.g. 20-30ms).
>
> Hm may be there some clock drift between CPU cores?

Maybe, although I'd expect much smaller difference in such case. 20-30ms
is a bit too much, IMHO. Which clock source are you using?

cat /sys/devices/system/clocksource/clocksource0/available_clocksource

Do you have NTP on the machine? Is it working correctly?

There's code in pgstat.c that should handle such clock skew cases,
around line 3560:

https://github.com/postgres/postgres/blob/REL9_2_STABLE/src/backend/postmaster/pgstat.c#L3564

Do you get those messages in the log?

Can you perform more gdb-fu? Are the last_statwrite / last_statrequest
timestamps advance "equally"?

Do you know what processes are sending the requests to write the file?

The only source of 'write statfile' requests I can think of at the
moment is the autovacuum launcher (who needs stats for all databases).
Let's assume there really is ~30ms clock skew between the postmaster and
the launcher. If I read the code correctly, then:

1) the launcher accepts only stats not older than 10ms

2) the timestamp is read from the file, and is representing timestamp
when writing the file started

3) if the clock skew is >10ms, or if writing the file takes >10ms (or
combination of those two times), a request is sent

4) this is done repeatedly for 10 seconds (requests 10ms apart)

Maybe the stats file grew a bit, and now the difference got slightly
over 10ms?

The other thing that might happen is inability to write the new file.
But that's inconsistent with small file size (I've seen that e.g. for
files exceeding the tmpfs size limit, but if you only have ~5MB
pgstat.stat file that's unlikely).

Also, it would cause "could not write temporary statistics" in the log,
and the difference between the timestamps would grow (while you're
saying it's ~30ms).

regards
Tomas


From: Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com>
To: pgsql-bugs <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #12071: Stat collector went crasy (50MB/s constant writes)
Date: 2014-11-26 20:07:08
Message-ID: CAK-MWwTmJV4VprhAfCyTKRLndYbwbToF-Z0JYv6_o=qq3VYxyQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

> On one of my servers stat collector went crasy on idle system.
>>> It constantly write around 50MB/s on idle database (no activity at all).
>>>
>>
> OK there are results of some entry-level gdb kung-fu:
>
> problem point is:
> /*
> * Write the stats file if a new request has arrived that is
> not
> * satisfied by existing file.
> */
> if (last_statwrite < last_statrequest)
> pgstat_write_statsfile(false);
>
> on some loop iteration difference between last_statrequest and
> last_statwrite is around 23000-29000 (e.g. 20-30ms).
>
> Hm may be there some clock drift between CPU cores?
>

What's made this situation interesting that the fact the code never got
inside
/*
* If there is clock skew between backends and the collector, we
could
* receive a stats request time that's in the future. If so,
complain
* and reset last_statrequest. Resetting ensures that no inquiry
* message can cause more than one stats file write to occur.
*/
if (last_statrequest > last_statwrite)
...
block.

Watchpoint on the (last_statwrite < last_statrequest) condition repetable
fire ONLY inside:
pgstat_recv_inquiry(PgStat_MsgInquiry *msg, int len)
{
if (msg->inquiry_time > last_statrequest)
last_statrequest = msg->inquiry_time;
}
function.

Now just some musings/theory (I might be dead wrong of course).
Lets suppose there are no clock skew actually and check execution flow.

Main loop executed:
if (last_statwrite < last_statrequest)
pgstat_write_statsfile(false);
and updated last_statwrite value.
However, it took some time because of file size.

During that period (while pgstat busy with writes) some backend issued
PGSTAT_MTYPE_INQUIRY request which will be succesfuly accepted on the next
code lines at
len = recv(pgStatSock, (char *) &msg,...
And as a result the last_statrequest will be updated to the
msg->inquiry_time value.
And this value will be definitely > than last_statwrite (because this
request issued after pgstat worker stared writing new copy of the stat
file).

So these could be case when this situation could happen without time
travel/clock drift.

Kind Regards,
Maksym


From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, pgsql-bugs <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #12071: Stat collector went crasy (50MB/s constant writes)
Date: 2014-11-26 20:29:59
Message-ID: CAMkU=1yrsnKZsw3UyN7UAVjvOkra1TPtS0NU-e=VVM1zFgO5dA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On Wed, Nov 26, 2014 at 5:25 AM, Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com> wrote:

>
>
> On Wed, Nov 26, 2014 at 4:15 PM, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
> wrote:
>
>> maxim(dot)boguk(at)gmail(dot)com wrote:
>>
>> > Hi,
>> >
>> > On one of my servers stat collector went crasy on idle system.
>> > It constantly write around 50MB/s on idle database (no activity at all).
>> >
>> > What I tried:
>> > 1)stop/start the database - no effect
>> > 2)update to 9.2.9 from 9.2.7 - no effect
>>
>> Is the server time correct? What is autovacuum doing
>
>
> Server time look correct for me.
> Аutovacuum doing nothing since no activity on server at all (I checked it
> twice).
>

What method are you using to check that the server has no activity?

What do you see if you 'strace' the autovac launcher? The postmaster?

Cheers,

Jeff


From: Tomas Vondra <tv(at)fuzzy(dot)cz>
To: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #12071: Stat collector went crasy (50MB/s constant writes)
Date: 2014-11-26 20:45:12
Message-ID: 54763BD8.8020906@fuzzy.cz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On 26.11.2014 21:07, Maxim Boguk wrote:
>
> On one of my servers stat collector went crasy on idle system.
> It constantly write around 50MB/s on idle database (no
> activity at all).
>
>
> OK there are results of some entry-level gdb kung-fu:
>
> problem point is:
> /*
> * Write the stats file if a new request has arrived
> that is not
> * satisfied by existing file.
> */
> if (last_statwrite < last_statrequest)
> pgstat_write_statsfile(false);
>
> on some loop iteration difference between last_statrequest and
> last_statwrite is around 23000-29000 (e.g. 20-30ms).
>
> Hm may be there some clock drift between CPU cores?
>
>
> What's made this situation interesting that the fact the code never got
> inside
> /*
> * If there is clock skew between backends and the collector, we
> could
> * receive a stats request time that's in the future. If so,
> complain
> * and reset last_statrequest. Resetting ensures that no inquiry
> * message can cause more than one stats file write to occur.
> */
> if (last_statrequest > last_statwrite)
> ...
> block.

That suggests there's no (significant) clock skew between postmaster and
the process requesting the file.

> Watchpoint on the (last_statwrite < last_statrequest) condition
> repetable fire ONLY inside:
> pgstat_recv_inquiry(PgStat_MsgInquiry *msg, int len)
> {
> if (msg->inquiry_time > last_statrequest)
> last_statrequest = msg->inquiry_time;
> }
> function.

OK, that's expected, because the request is sent only if the file is
older than X miliseconds (10ms for autovacuum, 640ms otherwise). So it
would be strange if the condition didn't fire here.

> Now just some musings/theory (I might be dead wrong of course).
> Lets suppose there are no clock skew actually and check execution flow.
>
> Main loop executed:
> if (last_statwrite < last_statrequest)
> pgstat_write_statsfile(false);
> and updated last_statwrite value.
> However, it took some time because of file size.

OK. FWIW, it's updated to GetCurrentTimestamp() before starting to write
the file.

> During that period (while pgstat busy with writes) some backend issued
> PGSTAT_MTYPE_INQUIRY request which will be succesfuly accepted on the
> next code lines at
> len = recv(pgStatSock, (char *) &msg,...
> And as a result the last_statrequest will be updated to the
> msg->inquiry_time value.
> And this value will be definitely > than last_statwrite (because this
> request issued after pgstat worker stared writing new copy of the stat
> file).

How many backends are there, requesting a fresh stats file? I thought
you mentioned the system is rather idle?

A regular backend won't request a fresh copy unless the current file is
older than 640 ms. So that can't cause ~50MB of writes (it'd require 10
writes per second). Maybe there are multiple backends sending such
requests, but I don't see how that could cause a continuous write load
(I'd understand a short spike at the beginning, and then getting
synchronized).

Autovacuum launcher / workers might cause that, because it's using 10 ms
threshold. But that'd require a lot of autovacuum processes running in
parallel, I guess.

I wonder if this could be caused by a long "queue" of such inquiries,
but that should be filtered out by the (last_statwrite <
last_statrequest) condition.

regards
Tomas


From: Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com>
To: Tomas Vondra <tv(at)fuzzy(dot)cz>
Cc: pgsql-bugs <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #12071: Stat collector went crasy (50MB/s constant writes)
Date: 2014-11-26 21:19:32
Message-ID: CAK-MWwTFTRVuBjLz_rJVNEompGj2btot88G5uwXW+d2DierSNQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

>
> How many backends are there, requesting a fresh stats file? I thought
> you mentioned the system is rather idle?
>

0-1 backend
no active connections
no active autovacuums

>
> A regular backend won't request a fresh copy unless the current file is
> older than 640 ms. So that can't cause ~50MB of writes (it'd require 10
> writes per second). Maybe there are multiple backends sending such
> requests, but I don't see how that could cause a continuous write load
> (I'd understand a short spike at the beginning, and then getting
> synchronized).
>

It's definitely doing full file writes 6-12 times per second.

> Autovacuum launcher / workers might cause that, because it's using 10 ms
> threshold. But that'd require a lot of autovacuum processes running in
> parallel, I guess.
>

No active autovacuum processes visible.
Now just interesting calculation:
autovacuum_naptime = 1 second (customized value from production)
active databases - 9...
Suspiciously close to 1/10 second.
Testing: increase autovacuum_naptime to
10second - two writes per second.
100second - only two writes every 10 second.

So yes, pgstat_send_inquiry definitely issued by autovacuum launcher.
Now question why EVERY pgstat_send_inquiry request from autovacuum leads to
the 1-2 full pgstat.stat rewrite on idle server?

--
Maxim Boguk
Senior Postgresql DBA
http://www.postgresql-consulting.ru/ <http://www.postgresql-consulting.com/>


From: Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com>
To: Tomas Vondra <tv(at)fuzzy(dot)cz>
Cc: pgsql-bugs <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #12071: Stat collector went crasy (50MB/s constant writes)
Date: 2014-11-26 21:36:23
Message-ID: CAK-MWwSK5AUneiF5zyDvc_9pO4ZNTT_G45ZQG1AD+n55s2zzTg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

>>So yes, pgstat_send_inquiry definitely issued by autovacuum launcher.
>>Now question why EVERY pgstat_send_inquiry request from autovacuum leads
to the 1-2 full >>pgstat.stat rewrite on idle server?

Just another theory
It happens exactly because the full idle server so no other activity which
could provide some changes to pgstat.tmp file timestamp.

May be it could be good idea to update timestamp of pgstat.tmp file on
every PgstatCollectorMain iteration if nothing had been received from stat
socket?
Or alternatively internally track the timestamp of the last update received
from stat socket, and skip file rewrite step if nothing had been received
since last full rewrite?

Kind Regards,
Maksym


From: Tomas Vondra <tv(at)fuzzy(dot)cz>
To: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #12071: Stat collector went crasy (50MB/s constant writes)
Date: 2014-11-26 21:50:55
Message-ID: 54764B3F.5080702@fuzzy.cz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On 26.11.2014 22:19, Maxim Boguk wrote:
>
> No active autovacuum processes visible.
> Now just interesting calculation:
> autovacuum_naptime = 1 second (customized value from production)
> active databases - 9...
> Suspiciously close to 1/10 second.
> Testing: increase autovacuum_naptime to
> 10second - two writes per second.
> 100second - only two writes every 10 second.
>
> So yes, pgstat_send_inquiry definitely issued by autovacuum launcher.
> Now question why EVERY pgstat_send_inquiry request from autovacuum leads
> to the 1-2 full pgstat.stat rewrite on idle server?

Well, I think this explains it.

The autovacuum starts a worker for a particular database every
(naptime/ndatabases), so if you have naptime 1 second and 10 databases,
it will start a worker every 100 ms. And that's over the 10ms limit, so
every worker will force write of the statfile. And it explains the 50
MB/s writes.

Why have you set it to 1 second? The default is 1 minute, which is much
saner value.

I'm not sure why this was not happening before, though ...

Tomas


From: Tomas Vondra <tv(at)fuzzy(dot)cz>
To: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #12071: Stat collector went crasy (50MB/s constant writes)
Date: 2014-11-26 21:53:58
Message-ID: 54764BF6.20605@fuzzy.cz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On 26.11.2014 22:36, Maxim Boguk wrote:
>>>So yes, pgstat_send_inquiry definitely issued by autovacuum launcher.
>>>Now question why EVERY pgstat_send_inquiry request from autovacuum
> leads to the 1-2 full >>pgstat.stat rewrite on idle server?
>
> Just another theory
> It happens exactly because the full idle server so no other activity
> which could provide some changes to pgstat.tmp file timestamp.
>
> May be it could be good idea to update timestamp of pgstat.tmp file on
> every PgstatCollectorMain iteration if nothing had been received from
> stat socket?
> Or alternatively internally track the timestamp of the last update
> received from stat socket, and skip file rewrite step if nothing had
> been received since last full rewrite?

I'm not sure I understand what you mean.

The timestamp *IS* updated on every write - see this:

https://github.com/postgres/postgres/blob/REL9_2_STABLE/src/backend/postmaster/pgstat.c#L3469

Otherwise the difference between last_statwrite and last_statrequest
would grow indefinitely (and you're observing it's ~30ms every time).

Tomas


From: Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com>
To: Tomas Vondra <tv(at)fuzzy(dot)cz>
Cc: pgsql-bugs <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #12071: Stat collector went crasy (50MB/s constant writes)
Date: 2014-11-26 22:06:19
Message-ID: CAK-MWwRzyzX=BfXkMzPGGWCpxgHmc_zBXBYkp74PkYqQLrjtbQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

>
> I'm not sure I understand what you mean.
>
> The timestamp *IS* updated on every write - see this:
>
>
> https://github.com/postgres/postgres/blob/REL9_2_STABLE/src/backend/postmaster/pgstat.c#L3469
>

I mean that usually 99.9% writes goes through mmap, which update the
pgstat.tmp file timestamp (but not update last_statwrite value).

But that enough to make backend_read_statsfile happy around
https://github.com/postgres/postgres/blob/REL9_2_STABLE/src/backend/postmaster/pgstat.c#L3972
without kicking poor statistic collector (and forcing full file rewrite) at
https://github.com/postgres/postgres/blob/REL9_2_STABLE/src/backend/postmaster/pgstat.c#L3978

PS: 1second autovacuum_naptime could be useful if you have small but very
very actively updated tables, over 1 minute they could growth 100-500x size
without autovacuum.
1minute naptime will have the same issue with 500 small idle databases :).

--
Maxim Boguk
Senior Postgresql DBA
http://www.postgresql-consulting.ru/ <http://www.postgresql-consulting.com/>


From: Tomas Vondra <tv(at)fuzzy(dot)cz>
To: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #12071: Stat collector went crasy (50MB/s constant writes)
Date: 2014-11-26 22:26:34
Message-ID: 5476539A.6090506@fuzzy.cz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On 26.11.2014 23:06, Maxim Boguk wrote:
>
>
> I'm not sure I understand what you mean.
>
> The timestamp *IS* updated on every write - see this:
>
> https://github.com/postgres/postgres/blob/REL9_2_STABLE/src/backend/postmaster/pgstat.c#L3469
>
>
> I mean that usually 99.9% writes goes through mmap, which update the
> pgstat.tmp file timestamp (but not update last_statwrite value).

I don't follow. Can you explain what you mean by the mmap? I don't see
any mmap stuff in pgstat.c, nor I remember dealing with mmap in my
patches in this area ...

You mentioned the snippet:

if (last_statwrite < last_statrequest)
pgstat_write_statsfile(false);

and essentially what pgstat_write_statsfile() does is this:

1) open pgstat.tmp file
2) update globalStats.stats_timestamp to GetCurrentTimestamp();
3) write globalStats to the new file
4) ...

so it *does* update the timestamp on each write.

> But that enough to make backend_read_statsfile happy around
> https://github.com/postgres/postgres/blob/REL9_2_STABLE/src/backend/postmaster/pgstat.c#L3972
> without kicking poor statistic collector (and forcing full file rewrite) at
> https://github.com/postgres/postgres/blob/REL9_2_STABLE/src/backend/postmaster/pgstat.c#L3978

Well, the thing is that with 10 databases and 1s naptime, the launcher
starts a new autovacuum process every 100 ms. The first thing every
worker does is pgstat_read_statsfile() and this checks whether the stats
are older than 10 ms (which it is, because the last worker started
~100ms ago), and sends a new inquiry.

Regular backends use 640ms threshold, which is less than 100ms.

> PS: 1second autovacuum_naptime could be useful if you have small but
> very very actively updated tables, over 1 minute they could growth
> 100-500x size without autovacuum.
> 1minute naptime will have the same issue with 500 small idle databases :).

True. That's why I have reworked this in 9.3.

Anyway, this is not a bug - it works as designed/expected. It may be
imperfect, but that's how it is. The only thing you can do in 9.2 is
moving the pgstat.tmp to a tmpfs filesystem

Tomas


From: Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com>
To: Tomas Vondra <tv(at)fuzzy(dot)cz>
Cc: pgsql-bugs <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #12071: Stat collector went crasy (50MB/s constant writes)
Date: 2014-11-26 23:01:32
Message-ID: CAK-MWwQQedwJVk2LJ0g7LXJAr82NSSvOSv6BBAdSGFB8cLkObg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On Thu, Nov 27, 2014 at 1:26 AM, Tomas Vondra <tv(at)fuzzy(dot)cz> wrote:

> On 26.11.2014 23:06, Maxim Boguk wrote:
> >
> >
> > I'm not sure I understand what you mean.
> >
> > The timestamp *IS* updated on every write - see this:
> >
> >
> https://github.com/postgres/postgres/blob/REL9_2_STABLE/src/backend/postmaster/pgstat.c#L3469
> >
> >
> > I mean that usually 99.9% writes goes through mmap, which update the
> > pgstat.tmp file timestamp (but not update last_statwrite value).
>
> I don't follow. Can you explain what you mean by the mmap? I don't see
> any mmap stuff in pgstat.c, nor I remember dealing with mmap in my
> patches in this area ...
>

Hm I had been wrong. I seen mmap call on strace after every statistic
collector file rewrite and I thought that all writes to that file goes
through memory mapped structure instead of full file writes every time when
someone need fresh data.

However, point still open - there are no reason to write a new file from
scratch if no activity had been received from stat socket since the last
write (the same point valid for per-database stat files in 9.3+).

And check
https://github.com/postgres/postgres/blob/REL9_2_STABLE/src/backend/postmaster/pgstat.c#L3972
looks like redundant in that case?

Kind Regards,
Maksym


From: Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com>
To: Tomas Vondra <tv(at)fuzzy(dot)cz>
Cc: pgsql-bugs <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #12071: Stat collector went crasy (50MB/s constant writes)
Date: 2014-11-26 23:13:52
Message-ID: CAK-MWwQ2ZCW3+8UiMAdtLc6+YHph_CS3R8fVeLAFkfWdC08VFQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

>
> I'm not sure why this was not happening before, though ...
>
> Tomas
>
>
It not happened before because just few weeks ago development process had
been switched from single shared developer db to the private db's for each
developer (so amount of databases increased from few to 10).

PS: 10ms acceptable lag for autovacuum reading pgstat.tmp looks too strong
limitation for me.
I think it should be something like min(autovacuum_naptime/2,
PGSTAT_RETRY_DELAY)

--
Maxim Boguk
Senior Postgresql DBA
http://www.postgresql-consulting.ru/ <http://www.postgresql-consulting.com/>


From: Tomas Vondra <tv(at)fuzzy(dot)cz>
To: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #12071: Stat collector went crasy (50MB/s constant writes)
Date: 2014-11-26 23:21:42
Message-ID: 54766086.6090205@fuzzy.cz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On 27.11.2014 00:01, Maxim Boguk wrote:
>
>
> On Thu, Nov 27, 2014 at 1:26 AM, Tomas Vondra <tv(at)fuzzy(dot)cz
> <mailto:tv(at)fuzzy(dot)cz>> wrote:
>
> On 26.11.2014 23:06, Maxim Boguk wrote:
> >
> >
> > I'm not sure I understand what you mean.
> >
> > The timestamp *IS* updated on every write - see this:
> >
> > https://github.com/postgres/postgres/blob/REL9_2_STABLE/src/backend/postmaster/pgstat.c#L3469
> >
> >
> > I mean that usually 99.9% writes goes through mmap, which update the
> > pgstat.tmp file timestamp (but not update last_statwrite value).
>
> I don't follow. Can you explain what you mean by the mmap? I don't see
> any mmap stuff in pgstat.c, nor I remember dealing with mmap in my
> patches in this area ...
>
>
> Hm I had been wrong. I seen mmap call on strace after every statistic
> collector file rewrite and I thought that all writes to that file goes
> through memory mapped structure instead of full file writes every time
> when someone need fresh data.
>
> However, point still open - there are no reason to write a new file
> from scratch if no activity had been received from stat socket since
> the last write (the same point valid for per-database stat files in
> 9.3+).

Well, nothing is perfect and patches are always welcome ;-)

For us, the per-db split was enough - it lowered the CPU and I/O load to
the point that adding such 'is dirty' flag would make absolutely no
difference. I do have it on my TODO list, but it's not very high.

> And check
> https://github.com/postgres/postgres/blob/REL9_2_STABLE/src/backend/postmaster/pgstat.c#L3972
> looks like redundant in that case?

Redundant with what?

That's how the backend checks whether the new file has been written (by
postmaster, i.e. a different process). Keep in mind that this happens in
different process than

if (last_statwrite < last_statrequest)
pgstat_write_statsfile(false);

which runs in the postmaster (and not in the backend).

Tomas


From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com>
Cc: Tomas Vondra <tv(at)fuzzy(dot)cz>, pgsql-bugs <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #12071: Stat collector went crasy (50MB/s constant writes)
Date: 2014-11-27 01:38:44
Message-ID: 20141127013843.GS1639@alvin.alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Maxim Boguk wrote:

> It not happened before because just few weeks ago development process had
> been switched from single shared developer db to the private db's for each
> developer (so amount of databases increased from few to 10).
>
> PS: 10ms acceptable lag for autovacuum reading pgstat.tmp looks too strong
> limitation for me.
> I think it should be something like min(autovacuum_naptime/2,
> PGSTAT_RETRY_DELAY)

The issue is avoiding repeated vacuuming of the same table by two
different workers. I have thought sometimes that this concern is
overblown, and maybe we could use a different approach.

For example, one idea is to have the autovac worker sleep for some time
before vacuuming each table, to give pgstat time to catch up in case
someone else vacuumed the table concurrently. One issue with this is
that sleeping for too long would cause each worker run to take longer.
(But then workers already sleep due to cost_delay, so perhaps this is
not a serious problem).

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Tomas Vondra <tv(at)fuzzy(dot)cz>
To: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #12071: Stat collector went crasy (50MB/s constant writes)
Date: 2014-11-27 03:27:15
Message-ID: 54769A13.50408@fuzzy.cz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On 27.11.2014 00:21, Tomas Vondra wrote:
> On 27.11.2014 00:01, Maxim Boguk wrote:
>>
>>
>> On Thu, Nov 27, 2014 at 1:26 AM, Tomas Vondra <tv(at)fuzzy(dot)cz
>> <mailto:tv(at)fuzzy(dot)cz>> wrote:
>>
>> On 26.11.2014 23:06, Maxim Boguk wrote:
>> >
>> >
>> > I'm not sure I understand what you mean.
>> >
>> > The timestamp *IS* updated on every write - see this:
>> >
>> > https://github.com/postgres/postgres/blob/REL9_2_STABLE/src/backend/postmaster/pgstat.c#L3469
>> >
>> >
>> > I mean that usually 99.9% writes goes through mmap, which update the
>> > pgstat.tmp file timestamp (but not update last_statwrite value).
>>
>> I don't follow. Can you explain what you mean by the mmap? I don't see
>> any mmap stuff in pgstat.c, nor I remember dealing with mmap in my
>> patches in this area ...
>>
>>
>> Hm I had been wrong. I seen mmap call on strace after every statistic
>> collector file rewrite and I thought that all writes to that file goes
>> through memory mapped structure instead of full file writes every time
>> when someone need fresh data.
>>
>> However, point still open - there are no reason to write a new file
>> from scratch if no activity had been received from stat socket since
>> the last write (the same point valid for per-database stat files in
>> 9.3+).
>
> Well, nothing is perfect and patches are always welcome ;-)
>
> For us, the per-db split was enough - it lowered the CPU and I/O load to
> the point that adding such 'is dirty' flag would make absolutely no
> difference. I do have it on my TODO list, but it's not very high.

FWIW, I got curious and checked why we decided not to implement this
while reworking the stats in 9.3, as keeping an is_dirty flag seems as a
rather straightforward and simple optimization.

Turns out it's actually considerably more complex, because one of the
sources of statistics updates are (surprise surprise) autovacuum
workers. So the whole flow may look like this (in >= 9.3):

1) launcher requests a fresh stats file (dbs list only)
2) worker is started for a particular DB (by launcher)
3) the worker requests a stats file (with details for the DB)
4) the worker vacuums the DB - scans pg_class etc., updates the
table stats (autovac timestamps, blocks/tuples read ...)
5) at worker shutdown, the stats are sent back to postmaster

So this dirties the stats file (written in 3) for the database, making
the file obsolete.

Now, there are two choices - you may immediately flush the changes to
disk (but maybe you'll receive more changes before the next autovacuum
round happens), or you wait and only write the file if requested (which
is what we do now).

So the file *IS* dirty, and I don't see any obvious solution to this :-(

Tomas


From: Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com>
To: Tomas Vondra <tv(at)fuzzy(dot)cz>
Cc: pgsql-bugs <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #12071: Stat collector went crasy (50MB/s constant writes)
Date: 2014-11-27 10:26:32
Message-ID: CAK-MWwSXXWFA-Rgpo9EE-cO+wCs=xDDjoJP9+P6RR1SY1HiWpQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

FWIW, I got curious and checked why we decided not to implement this
> while reworking the stats in 9.3, as keeping an is_dirty flag seems as a
> rather straightforward and simple optimization.
>
> Turns out it's actually considerably more complex, because one of the
> sources of statistics updates are (surprise surprise) autovacuum
> workers. So the whole flow may look like this (in >= 9.3):
>
> 1) launcher requests a fresh stats file (dbs list only)
> 2) worker is started for a particular DB (by launcher)
> 3) the worker requests a stats file (with details for the DB)
>

Now for (nearly) idle databases worker do nothing and simple exit in 99.9%.
And if there a lot of idle/low-active db's in cluster - is_dirty tracking
would be beneficial on both pre 9.3 and after 9.3 versions (and in 9.3+ it
will be especially effective because id_dirty tracked per-db basis).

Kind Regards,
Maksym


From: Tomas Vondra <tv(at)fuzzy(dot)cz>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [BUGS] BUG #12071: Stat collector went crasy (50MB/s constant writes)
Date: 2014-11-27 18:24:11
Message-ID: 54776C4B.1040007@fuzzy.cz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Moving to pgsql-hackers, as that's a more appropriate place for this
discussion.

On 27.11.2014 11:26, Maxim Boguk wrote:
>
>
> FWIW, I got curious and checked why we decided not to implement this
> while reworking the stats in 9.3, as keeping an is_dirty flag seems as a
> rather straightforward and simple optimization.
>
> Turns out it's actually considerably more complex, because one of the
> sources of statistics updates are (surprise surprise) autovacuum
> workers. So the whole flow may look like this (in >= 9.3):
>
> 1) launcher requests a fresh stats file (dbs list only)
> 2) worker is started for a particular DB (by launcher)
> 3) the worker requests a stats file (with details for the DB)
>
>
>
> Now for (nearly) idle databases worker do nothing and simple exit in 99.9%.

While it may sound trivial, I believe it's actually more complicated
than just adding a 'is_dirty' flag.

The first question is what 'nearly idle' means. Is that a number of
modified rows? A number of tuples read from tables? Because this would
affect autovacuum processes as well as regular backends (i.e. reading
from pg_stat_user_tables might return stale data). I don't see a
definition that's obviously correct :-(

Secondly, even a 'nearly idle' database needs a vacuum/analyze from time
to time, so there needs to be some logic to detect that. So the logic of
updating is_dirty flag (or maybe reading the value) needs to consider that.

The only way I can think of is duplicating the 'needs vacuum / analyze'
detection into the collector, and setting the flag to 'true' when
there's at least one object in need of autovacuum. This is however much
more complex than a simple is_dirty flag, and I'm not sure it's really
worth it ...

> And if there a lot of idle/low-active db's in cluster - is_dirty
> tracking would be beneficial on both pre 9.3 and after 9.3 versions (and
> in 9.3+ it will be especially effective because id_dirty tracked per-db
> basis).

I don't think so. First, there are practical issues making it way more
complex than what you assume (see my explanation in the preceding text).

Second, let's do a bit of math. With 5MB of stats, 10 databases and 1s
naptime, the old (pre-9.3) implementation writes ~50MB/s. With the new
implementation, this drops to ~5MB/s, because only stats for that
particular database need to be written.

That's a huge improvement. And if even such I/O load is an issue (which
I'd say is unlikely), then move the pg_stat_tmp directory to a tmpfs
filesystem. Problem solved.

Also, you're the first person with such low naptime value (which is
really the culprit here), so most people really don't have such issues.
And I'm still not convinced this is the best way to deal with the
workload you described (tables that grow / get deleted quickly).

I'm not going to spend more time hacking on this, as I doubt it's worth
the effort. Feel free to propose a patch, of course.

regards
Tomas