Postgres stats collector showing high disk I/O

Lists: pgsql-generalpgsql-hackers
From: Justin Pasher <justinp(at)newmediagateway(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Postgres stats collector showing high disk I/O
Date: 2010-04-23 20:27:55
Message-ID: 4BD202CB.2050705@newmediagateway.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Hello,

Redhat EL4 update 8, 2.6.9-89.0.23.ELsmp
Quad Proc, Dual Core Xeon, 16GB RAM
Postgres 8.1.18

I'm having some trouble pinning down exactly what is causing our
Postgres cluster to run slowly. After some initial investigation, I
noticed that the disk write activity is consistently high, and (if I'm
reading the output of dstat correctly) the majority of it is being
caused by the stats collector process. Here's a snippet of what I
typically see in dstat.

./dstat -cd --top-bio
----total-cpu-usage---- -dsk/total- --------most-expensive---------
usr sys idl wai hiq siq| read writ| block i/o process
2 2 87 10 0 0|3964k 19M|postgres: stats coll 0 35M
2 1 85 12 0 0|4612k 20M|postgres: stats coll 0 18M
2 2 85 11 0 0|2360k 36M|postgres: stats coll 0 24M
1 2 83 14 0 0|1564k 36M|postgres: stats coll 0 29M
1 1 84 13 0 0|5556k 21M|postgres: stats coll 0 20M
2 2 82 14 0 0| 10M 19M|postgres: stats coll 0 33M
2 1 87 10 0 0|9864k 35M|postgres: stats coll 0 24M
2 2 87 10 0 0| 10M 19M|postgres: stats coll 0 29M
2 1 86 11 0 0| 10M 19M|postgres: stats coll 0 24M
3 2 84 12 0 0|8096k 19M|postgres: stats coll 0 29M
2 1 86 10 0 0|5432k 33M|postgres: stats coll 0 32M
2 2 86 10 0 0|9200k 19M|postgres: stats coll 0 21M
2 1 82 14 0 0|3344k 34M|postgres: stats coll 0 21M
2 2 86 11 0 0|8600k 19M|postgres: stats coll 0 31M
2 1 82 15 0 0|5392k 19M|postgres: stats coll 0 29M

If there are no queries going on, then the disk usage is virtually
nothing, but it only takes a query or two to make it shoot up to this
level. I have the following stats related options enabled in postgresql.conf

stats_command_string = on
stats_row_level = on

When I disabled stats_row_level (and even stats_command_string, I
believe) and restarted, I was still seeing some high disk I/O. If I
disable stats_start_collector, I'm pretty sure the I/O dropped
completely off (I can't verify right now since I'd need a maintenance
window). However, this make Postgres unable to keep track of database
connections/queries in pg_stat_activity, which is very important for us.
The odd thing is that when I was playing around with these options, I
restarted multiple times to apply them, eventually ending back where I
started, but after the final restart, the disk I/O actually dropped to
reasonable levels. This lasted for about a day, then went back up to
it's current levels (and once again showing the stats collector at the top).

I saw some previous posts with similar conditions (but different
Postgres version, high CPU load, not disk I/O, etc).
http://archives.postgresql.org/pgsql-performance/2010-04/msg00163.php
http://archives.postgresql.org/pgsql-general/2010-01/msg01076.php
http://archives.postgresql.org/pgsql-performance/2009-06/msg00088.php

I don't think there are a lot of CREATE/DROP table statements, but I do
know there are some larger update queries that run inside transactions
(large in the sense of data they have to read, not the number of
queries). Autovacuum is enabled on the server, and I haven't tweaked any
settings from the defaults. My $PGDATA/global/pgstat.stat file is about
18MB, if that helps. Does it really rewrite this entire file every
500ms? Alvaro suggested resetting the stats, but I'm having trouble
figuring out how to do that. Seems like pg_stat_reset() is post- 8.1 ...?

I have a strong suspicion it's ultimately due to some usage pattern of
the database, but I'm not sure what it could be. What type of operations
would typically cause the stats collector to be doing this much writing
to the filesystem? Is there any way to "see" what it's writing? Are
there other config options that can/should be tweaked to help this?

Thanks.


From: hubert depesz lubaczewski <depesz(at)depesz(dot)com>
To: Justin Pasher <justinp(at)newmediagateway(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Postgres stats collector showing high disk I/O
Date: 2010-04-23 21:40:35
Message-ID: 20100423214035.GB1818@depesz.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

On Fri, Apr 23, 2010 at 03:27:55PM -0500, Justin Pasher wrote:
> haven't tweaked any settings from the defaults. My
> $PGDATA/global/pgstat.stat file is about 18MB, if that helps. Does
> it really rewrite this entire file every 500ms? Alvaro suggested
> resetting the stats, but I'm having trouble figuring out how to do
> that. Seems like pg_stat_reset() is post- 8.1 ...?

I don't have 8.1 handy, but according to this:
http://www.postgresql.org/docs/8.1/interactive/monitoring-stats.html
pg_stat_reset() should be available in 8.1.

Best regards,

depesz

--
Linkedin: http://www.linkedin.com/in/depesz / blog: http://www.depesz.com/
jid/gtalk: depesz(at)depesz(dot)com / aim:depeszhdl / skype:depesz_hdl / gg:6749007


From: Justin Pasher <justinp(at)newmediagateway(dot)com>
To: depesz(at)depesz(dot)com
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Postgres stats collector showing high disk I/O
Date: 2010-04-23 21:49:38
Message-ID: 4BD215F2.307@newmediagateway.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

----- Original Message -----
From: hubert depesz lubaczewski <depesz(at)depesz(dot)com>
Date: Fri, 23 Apr 2010 23:40:35 +0200
Subject: Re: [GENERAL] Postgres stats collector showing high disk I/O
To: Justin Pasher <justinp(at)newmediagateway(dot)com>
CC: pgsql-general(at)postgresql(dot)org
> On Fri, Apr 23, 2010 at 03:27:55PM -0500, Justin Pasher wrote:
>
>> haven't tweaked any settings from the defaults. My
>> $PGDATA/global/pgstat.stat file is about 18MB, if that helps. Does
>> it really rewrite this entire file every 500ms? Alvaro suggested
>> resetting the stats, but I'm having trouble figuring out how to do
>> that. Seems like pg_stat_reset() is post- 8.1 ...?
>>
>
> I don't have 8.1 handy, but according to this:
> http://www.postgresql.org/docs/8.1/interactive/monitoring-stats.html
> pg_stat_reset() should be available in 8.1.
>
> Best regards,
>
> depesz
>

Agh... I used pg_stats_reset (with an s) when searching for it. I ran
the function and it returned true, but the stats file only shrunk by
~100k (still over 18MB total). Is there something else I need to do?
Does this mean the file is mostly bloated with bogus data that it can't
"reset"? I'm guessing I should just try to delete the file outright?

--
Justin Pasher


From: Justin Pasher <justinp(at)newmediagateway(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Postgres stats collector showing high disk I/O
Date: 2010-04-23 22:06:04
Message-ID: 4BD219CC.5050808@newmediagateway.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

> I'm guessing I should just try to delete the file outright?
>

Err... I meant "should NOT" delete.

--
Justin Pasher


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Justin Pasher <justinp(at)newmediagateway(dot)com>
Cc: depesz(at)depesz(dot)com, pgsql-general(at)postgresql(dot)org
Subject: Re: Postgres stats collector showing high disk I/O
Date: 2010-04-23 22:28:03
Message-ID: 20100423222803.GC2859@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Justin Pasher wrote:

> Agh... I used pg_stats_reset (with an s) when searching for it. I
> ran the function and it returned true, but the stats file only
> shrunk by ~100k (still over 18MB total). Is there something else I
> need to do? Does this mean the file is mostly bloated with bogus
> data that it can't "reset"? I'm guessing I should just try to delete
> the file outright?

Err, yeah, pg_stat_reset only resets the stats for the current database.
You need to reset for all databases, or alternatively, shut down the
server, remove the file, and restart

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Justin Pasher <justinp(at)newmediagateway(dot)com>
Cc: depesz(at)depesz(dot)com, pgsql-general(at)postgresql(dot)org
Subject: Re: Postgres stats collector showing high disk I/O
Date: 2010-04-23 22:38:03
Message-ID: 18564.1272062283@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Justin Pasher <justinp(at)newmediagateway(dot)com> writes:
> Agh... I used pg_stats_reset (with an s) when searching for it. I ran
> the function and it returned true, but the stats file only shrunk by
> ~100k (still over 18MB total). Is there something else I need to do?

pg_stat_reset only resets the data for the current database (the one
you issue it in). Apparently most of your bloat is for some other
database(s).

If you've got a whole lot of databases, a possibly less painful
alternative to zapping them one at a time is to stop the server,
manually remove the stats file, start the server.

regards, tom lane


From: Justin Pasher <justinp(at)newmediagateway(dot)com>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Postgres stats collector showing high disk I/O
Date: 2010-04-23 22:46:16
Message-ID: 4BD22338.2000307@newmediagateway.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

----- Original Message -----
From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Date: Fri, 23 Apr 2010 18:28:03 -0400
Subject: Re: [GENERAL] Postgres stats collector showing high disk I/O
To: Justin Pasher <justinp(at)newmediagateway(dot)com>
CC: depesz(at)depesz(dot)com, pgsql-general(at)postgresql(dot)org
> Justin Pasher wrote:
>
>
>> Agh... I used pg_stats_reset (with an s) when searching for it. I
>> ran the function and it returned true, but the stats file only
>> shrunk by ~100k (still over 18MB total). Is there something else I
>> need to do? Does this mean the file is mostly bloated with bogus
>> data that it can't "reset"? I'm guessing I should just try to delete
>> the file outright?
>>
>
> Err, yeah, pg_stat_reset only resets the stats for the current database.
> You need to reset for all databases, or alternatively, shut down the
> server, remove the file, and restart
>

Ahh, yes. I probably should have realized that. I ran the function on
all of the databases (138 total), and now the stats file is down to
~400k. The disk I/O is also practically nothing now.

So now as a continuation of my original message, what would cause the
stats file to get so big for what seems like (IMO) a small number of
databases? I have a Postgres 7.4 cluster that has about 250 databases,
but it's stats file is only 3.5MB. Do I need to look into avoiding a
bunch of CREATE/DROP statements (a suggestion in the archives)? I don't
know the actual usage patterns of the code base (I'll have to get with
the developers), but I think at worst, they might create temp tables
from time to time. As a matter of fact, I just checked the stats file
again (about 10 minutes later) and it's doubled to 800K. Is Postgres
just trying to store too much information in the statistics file?
Ultimately, the main statistics I care about are current connections and
queries being run.

A previous post in the archives from Tom said that vacuum (even
autovacuum) should clean up potential stat file bloat. Do I need to
tweak my autovacuum settings? Everything is currently set to the
defaults because I've never had any performance issues that warranted
tweaking the settings.

--
Justin Pasher


From: Justin Pasher <justinp(at)newmediagateway(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Postgres stats collector showing high disk I/O
Date: 2010-05-20 20:10:53
Message-ID: 4BF5974D.8030505@newmediagateway.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

----- Original Message -----
> From: Justin Pasher <justinp(at)newmediagateway(dot)com>
> Date: Fri, 23 Apr 2010 17:46:16 -0500
> Subject: Re: [GENERAL] Postgres stats collector showing high disk I/O
> To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
> CC: pgsql-general(at)postgresql(dot)org
>
> ----- Original Message -----
> From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
> Date: Fri, 23 Apr 2010 18:28:03 -0400
> Subject: Re: [GENERAL] Postgres stats collector showing high disk I/O
> To: Justin Pasher <justinp(at)newmediagateway(dot)com>
> CC: depesz(at)depesz(dot)com, pgsql-general(at)postgresql(dot)org
>> Justin Pasher wrote:
>>
>>
>>> Agh... I used pg_stats_reset (with an s) when searching for it. I
>>> ran the function and it returned true, but the stats file only
>>> shrunk by ~100k (still over 18MB total). Is there something else I
>>> need to do? Does this mean the file is mostly bloated with bogus
>>> data that it can't "reset"? I'm guessing I should just try to delete
>>> the file outright?
>>>
>>
>> Err, yeah, pg_stat_reset only resets the stats for the current database.
>> You need to reset for all databases, or alternatively, shut down the
>> server, remove the file, and restart
>>
>
> Ahh, yes. I probably should have realized that. I ran the function on
> all of the databases (138 total), and now the stats file is down to
> ~400k. The disk I/O is also practically nothing now.
>
> So now as a continuation of my original message, what would cause the
> stats file to get so big for what seems like (IMO) a small number of
> databases? I have a Postgres 7.4 cluster that has about 250 databases,
> but it's stats file is only 3.5MB. Do I need to look into avoiding a
> bunch of CREATE/DROP statements (a suggestion in the archives)? I
> don't know the actual usage patterns of the code base (I'll have to
> get with the developers), but I think at worst, they might create temp
> tables from time to time. As a matter of fact, I just checked the
> stats file again (about 10 minutes later) and it's doubled to 800K. Is
> Postgres just trying to store too much information in the statistics
> file? Ultimately, the main statistics I care about are current
> connections and queries being run.
>
> A previous post in the archives from Tom said that vacuum (even
> autovacuum) should clean up potential stat file bloat. Do I need to
> tweak my autovacuum settings? Everything is currently set to the
> defaults because I've never had any performance issues that warranted
> tweaking the settings.

Can anyone provide any more direction on this one?

Whenever I clear out the stats for all of the databases, the file
shrinks down to <1MB. However, it only takes about a day for it to get
back up to ~18MB and then the stats collector process start the heavy
disk writing again. I do know there are some tables in the database that
are filled and emptied quite a bit (they are used as temporary "queue"
tables). The code will VACUUM FULL ANALYZE after the table is emptied to
get the physical size back down and update the (empty) stats. A plain
ANALYZE is also run right after the table is filled but before it starts
processing, so the planner will have good stats on the contents of the
table. Would this lead to pg_stat file bloat like I'm seeing? Would a
CLUSTER then ANALYZE instead of a VACUUM FULL ANALYZE make any
difference? The VACUUM FULL code was setup quite a while back before the
coders knew about CLUSTER.

Thanks.

--
Justin Pasher


From: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To: Justin Pasher <justinp(at)newmediagateway(dot)com>
Cc: pgsql-general <pgsql-general(at)postgresql(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Postgres stats collector showing high disk I/O
Date: 2010-05-20 21:24:59
Message-ID: 1274390635-sup-4816@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Excerpts from Justin Pasher's message of jue may 20 16:10:53 -0400 2010:

> Whenever I clear out the stats for all of the databases, the file
> shrinks down to <1MB. However, it only takes about a day for it to get
> back up to ~18MB and then the stats collector process start the heavy
> disk writing again. I do know there are some tables in the database that
> are filled and emptied quite a bit (they are used as temporary "queue"
> tables). The code will VACUUM FULL ANALYZE after the table is emptied to
> get the physical size back down and update the (empty) stats. A plain
> ANALYZE is also run right after the table is filled but before it starts
> processing, so the planner will have good stats on the contents of the
> table. Would this lead to pg_stat file bloat like I'm seeing? Would a
> CLUSTER then ANALYZE instead of a VACUUM FULL ANALYZE make any
> difference? The VACUUM FULL code was setup quite a while back before the
> coders knew about CLUSTER.

I wonder if we should make pgstats write one file per database (plus one
for shared objects), instead of keeping everything in a single file.
That would reduce the need for reading and writing so much.

--


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc: Justin Pasher <justinp(at)newmediagateway(dot)com>, pgsql-general <pgsql-general(at)postgresql(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Postgres stats collector showing high disk I/O
Date: 2010-05-20 22:11:44
Message-ID: 9605.1274393504@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> writes:
> Excerpts from Justin Pasher's message of jue may 20 16:10:53 -0400 2010:
>> Whenever I clear out the stats for all of the databases, the file
>> shrinks down to <1MB. However, it only takes about a day for it to get
>> back up to ~18MB and then the stats collector process start the heavy
>> disk writing again.

> I wonder if we should make pgstats write one file per database (plus one
> for shared objects), instead of keeping everything in a single file.
> That would reduce the need for reading and writing so much.

Well, the real problem here is the OP is running 8.1, so he isn't
getting the benefit of any of the subsequent changes to reduce the
stats file I/O. I suspect the bloat also comes from a
subsequently-fixed bug, but not totally sure about that yet.

regards, tom lane