Re: stats collector process high CPU utilization

Lists: pgsql-performance
From: Benjamin Minshall <minshall(at)intellicon(dot)biz>
To: pgsql-performance(at)postgresql(dot)org
Subject: stats collector process high CPU utilization
Date: 2007-02-08 16:48:01
Message-ID: 45CB5441.1040707@intellicon.biz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Greetings,

Since upgrading to 8.2.3 yesterday, the stats collector process has had
very high CPU utilization; it is consuming roughly 80-90% of one CPU.
The server seems a lot more sluggish than it was before. Is this normal
operation for 8.2 or something I should look into correcting?

stats_start_collector = true
stats_command_string = true
stats_block_level = true
stats_row_level = true
stats_reset_on_server_start = false

--
Benjamin Minshall <minshall(at)intellicon(dot)biz>
Senior Developer -- Intellicon, Inc.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Benjamin Minshall <minshall(at)intellicon(dot)biz>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: stats collector process high CPU utilization
Date: 2007-02-08 17:17:56
Message-ID: 21013.1170955076@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Benjamin Minshall <minshall(at)intellicon(dot)biz> writes:
> Since upgrading to 8.2.3 yesterday, the stats collector process has had
> very high CPU utilization; it is consuming roughly 80-90% of one CPU.
> The server seems a lot more sluggish than it was before. Is this normal
> operation for 8.2 or something I should look into correcting?

What version did you update from, and what platform is this?

regards, tom lane


From: Benjamin Minshall <minshall(at)intellicon(dot)biz>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: stats collector process high CPU utilization
Date: 2007-02-08 17:25:26
Message-ID: 45CB5D06.7090004@intellicon.biz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Tom Lane wrote:
> Benjamin Minshall <minshall(at)intellicon(dot)biz> writes:
>> Since upgrading to 8.2.3 yesterday, the stats collector process has had
>> very high CPU utilization; it is consuming roughly 80-90% of one CPU.
>> The server seems a lot more sluggish than it was before. Is this normal
>> operation for 8.2 or something I should look into correcting?
>
> What version did you update from, and what platform is this?
>
> regards, tom lane

I upgraded from 8.1.5. The system is a dual Xeon 2.4Ghz, 4Gb RAM
running linux kernel 2.6 series.

--
Benjamin Minshall <minshall(at)intellicon(dot)biz>
Senior Developer -- Intellicon, Inc.
http://www.intellicon.biz


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Benjamin Minshall <minshall(at)intellicon(dot)biz>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: stats collector process high CPU utilization
Date: 2007-02-08 18:46:31
Message-ID: 7994.1170960391@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Benjamin Minshall <minshall(at)intellicon(dot)biz> writes:
> Tom Lane wrote:
>> Benjamin Minshall <minshall(at)intellicon(dot)biz> writes:
>>> Since upgrading to 8.2.3 yesterday, the stats collector process has had
>>> very high CPU utilization; it is consuming roughly 80-90% of one CPU.
>>> The server seems a lot more sluggish than it was before. Is this normal
>>> operation for 8.2 or something I should look into correcting?

>> What version did you update from, and what platform is this?

> I upgraded from 8.1.5. The system is a dual Xeon 2.4Ghz, 4Gb RAM
> running linux kernel 2.6 series.

OK, I was trying to correlate it with post-8.2.0 patches but evidently
that's the wrong tree to bark up. No, this isn't an expected behavior.
Is there anything unusual about your database (huge numbers of tables,
or some such)? Can you gather some info about what it's doing?
strace'ing the stats collector might prove interesting, also if you have
built it with --enable-debug then oprofile results would be helpful.

regards, tom lane


From: Benjamin Minshall <minshall(at)intellicon(dot)biz>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: stats collector process high CPU utilization
Date: 2007-02-08 21:16:32
Message-ID: 45CB9330.3000603@intellicon.biz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Tom Lane wrote:
>
> OK, I was trying to correlate it with post-8.2.0 patches but evidently
> that's the wrong tree to bark up. No, this isn't an expected behavior.

I talked with a co-worker and discovered that we went from 8.1.5 to
8.2.2, ran a few hours then went to 8.2.3 after the patch was released.
I do not know if the high utilization was a problem during the few
hours on 8.2.2.

> Is there anything unusual about your database (huge numbers of tables,
> or some such)?

Nothing unusual. I have a few databases of about 10GB each; the
workload is mostly inserts using COPY or parameterized INSERTS inside
transaction blocks.

> Can you gather some info about what it's doing?
> strace'ing the stats collector might prove interesting, also if you have
> built it with --enable-debug then oprofile results would be helpful.

I will gather some strace info later today when I have a chance to
shutdown the server.

Thanks.

--
Benjamin Minshall <minshall(at)intellicon(dot)biz>
Senior Developer -- Intellicon, Inc.
http://www.intellicon.biz


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Benjamin Minshall <minshall(at)intellicon(dot)biz>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: stats collector process high CPU utilization
Date: 2007-02-08 21:24:22
Message-ID: 19935.1170969862@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Benjamin Minshall <minshall(at)intellicon(dot)biz> writes:
> Tom Lane wrote:
>> Can you gather some info about what it's doing?
>> strace'ing the stats collector might prove interesting, also if you have
>> built it with --enable-debug then oprofile results would be helpful.

> I will gather some strace info later today when I have a chance to
> shutdown the server.

I don't see why you'd need to shut anything down. Just run
strace -p stats-process-ID
for a few seconds or minutes (enough to gather maybe a few thousand
lines of output).

regards, tom lane


From: Benjamin Minshall <minshall(at)intellicon(dot)biz>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: stats collector process high CPU utilization
Date: 2007-02-08 21:55:25
Message-ID: 45CB9C4D.30902@intellicon.biz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Tom Lane wrote:
> Benjamin Minshall <minshall(at)intellicon(dot)biz> writes:
>> Tom Lane wrote:
>>> Can you gather some info about what it's doing?
>>> strace'ing the stats collector might prove interesting, also if you have
>>> built it with --enable-debug then oprofile results would be helpful.
>
>> I will gather some strace info later today when I have a chance to
>> shutdown the server.
>
> I don't see why you'd need to shut anything down. Just run
> strace -p stats-process-ID
> for a few seconds or minutes (enough to gather maybe a few thousand
> lines of output).
>

Seems the problem may be related to a huge global/pgstat.stat file.
Under 8.1.5 it was about 1 MB; now it's 90 MB in 8.2.3.

I ran strace for 60 seconds:

% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
95.71 1.119004 48652 23 rename
4.29 0.050128 0 508599 write
0.00 0.000019 0 249 22 poll
0.00 0.000000 0 23 open
0.00 0.000000 0 23 close
0.00 0.000000 0 34 getppid
0.00 0.000000 0 23 munmap
0.00 0.000000 0 23 setitimer
0.00 0.000000 0 23 22 sigreturn
0.00 0.000000 0 23 mmap2
0.00 0.000000 0 23 fstat64
0.00 0.000000 0 216 recv
------ ----------- ----------- --------- --------- ----------------
100.00 1.169151 509282 44 total

I attached an excerpt of the full strace with the many thousands of
write calls filtered.

--
Benjamin Minshall <minshall(at)intellicon(dot)biz>
Senior Developer -- Intellicon, Inc.
http://www.intellicon.biz

Attachment Content-Type Size
strace.excerpt text/plain 2.1 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Benjamin Minshall <minshall(at)intellicon(dot)biz>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: stats collector process high CPU utilization
Date: 2007-02-08 22:07:56
Message-ID: 20601.1170972476@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Benjamin Minshall <minshall(at)intellicon(dot)biz> writes:
> Seems the problem may be related to a huge global/pgstat.stat file.
> Under 8.1.5 it was about 1 MB; now it's 90 MB in 8.2.3.

Yoi. We didn't do anything that would bloat that file if it were
storing the same information as before. What I'm betting is that it's
storing info on a whole lot more tables than before. Did you decide
to start running autovacuum when you updated to 8.2? How many tables
are visible in the pg_stats views?

regards, tom lane


From: Benjamin Minshall <minshall(at)intellicon(dot)biz>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: stats collector process high CPU utilization
Date: 2007-02-08 22:29:23
Message-ID: 45CBA443.9090101@intellicon.biz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Tom Lane wrote:
> Benjamin Minshall <minshall(at)intellicon(dot)biz> writes:
>> Seems the problem may be related to a huge global/pgstat.stat file.
>> Under 8.1.5 it was about 1 MB; now it's 90 MB in 8.2.3.
>
> Yoi. We didn't do anything that would bloat that file if it were
> storing the same information as before. What I'm betting is that it's
> storing info on a whole lot more tables than before.

The server is running on the same actual production data, schema and
workload as before.

> Did you decide to start running autovacuum when you updated to 8.2?

Autovacuum was on and functioning before the update.

> How many tables are visible in the pg_stats views?

There are about 15 databases in the cluster each with around 90 tables.
A count of pg_stats yields between 500 and 800 rows in each database.

select count(*) from (select distinct tablename from pg_stats) as i;
count
-------
92
(1 row)

select count(*) from pg_stats;
count
-------
628
(1 row)

--
Benjamin Minshall <minshall(at)intellicon(dot)biz>
Senior Developer -- Intellicon, Inc.
http://www.intellicon.biz


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Benjamin Minshall <minshall(at)intellicon(dot)biz>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: stats collector process high CPU utilization
Date: 2007-02-09 02:04:32
Message-ID: 7959.1170986672@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Benjamin Minshall <minshall(at)intellicon(dot)biz> writes:
> Tom Lane wrote:
>> How many tables are visible in the pg_stats views?

> There are about 15 databases in the cluster each with around 90 tables.
> A count of pg_stats yields between 500 and 800 rows in each database.

Sorry, I was imprecise. The view "pg_stats" doesn't have anything to do
with the stats collector; what I was interested in was the contents of
the "pg_stat_xxx" and "pg_statio_xxx" views. It'd be enough to check
pg_stat_all_indexes and pg_stat_all_tables, probably. Also, do you have
the 8.1 installation still available to get the comparable counts there?

regards, tom lane


From: minshall(at)intellicon(dot)biz
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: stats collector process high CPU utilization
Date: 2007-02-09 04:13:38
Message-ID: 62089.75.14.32.120.1170994418.squirrel@intellicon.biz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

> Benjamin Minshall <minshall(at)intellicon(dot)biz> writes:
>> Tom Lane wrote:
>>> How many tables are visible in the pg_stats views?
>
>> There are about 15 databases in the cluster each with around 90 tables.
>> A count of pg_stats yields between 500 and 800 rows in each database.
>
> Sorry, I was imprecise. The view "pg_stats" doesn't have anything to do
> with the stats collector; what I was interested in was the contents of
> the "pg_stat_xxx" and "pg_statio_xxx" views. It'd be enough to check
> pg_stat_all_indexes and pg_stat_all_tables, probably. Also, do you have
> the 8.1 installation still available to get the comparable counts there?
>

I checked all 15 databases on both 8.1 and 8.2; they were all quite
consistent:

pg_stat_all_indexes has about 315 rows per database
pg_stat_all_tables has about 260 rows per database

The pg_statio_* views match in count to the pg_stat_* views as well.

While exploring this problem, I've noticed that one of the frequent insert
processes creates a few temporary tables to do post-processing. Is it
possible that the stats collector is getting bloated with stats from these
short-lived temporary tables? During periods of high activity it could be
creating temporary tables as often as two per second.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: minshall(at)intellicon(dot)biz
Cc: pgsql-performance(at)postgresql(dot)org, Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Subject: Re: stats collector process high CPU utilization
Date: 2007-02-09 04:26:58
Message-ID: 14668.1170995218@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

minshall(at)intellicon(dot)biz writes:
> While exploring this problem, I've noticed that one of the frequent insert
> processes creates a few temporary tables to do post-processing. Is it
> possible that the stats collector is getting bloated with stats from these
> short-lived temporary tables? During periods of high activity it could be
> creating temporary tables as often as two per second.

Hmmm ... that's an interesting point, but offhand I don't see why it'd
cause more of a problem in 8.2 than 8.1. Alvaro, any thoughts?

regards, tom lane


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: minshall(at)intellicon(dot)biz, pgsql-performance(at)postgresql(dot)org
Subject: Re: stats collector process high CPU utilization
Date: 2007-02-09 14:24:59
Message-ID: 20070209142459.GB4253@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Tom Lane wrote:
> minshall(at)intellicon(dot)biz writes:
> > While exploring this problem, I've noticed that one of the frequent insert
> > processes creates a few temporary tables to do post-processing. Is it
> > possible that the stats collector is getting bloated with stats from these
> > short-lived temporary tables? During periods of high activity it could be
> > creating temporary tables as often as two per second.
>
> Hmmm ... that's an interesting point, but offhand I don't see why it'd
> cause more of a problem in 8.2 than 8.1. Alvaro, any thoughts?

No idea. I do have a very crude piece of code to read a pgstat.stat
file and output some info about what it finds (table OIDs basically
IIRC). Maybe it can be helpful to examine what's in the bloated stat
file.

Regarding temp tables, I'd think that the pgstat entries should be
getting dropped at some point in both releases. Maybe there's a bug
preventing that in 8.2?

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: minshall(at)intellicon(dot)biz, pgsql-performance(at)postgresql(dot)org
Subject: Re: stats collector process high CPU utilization
Date: 2007-02-09 15:08:19
Message-ID: 24717.1171033699@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> Regarding temp tables, I'd think that the pgstat entries should be
> getting dropped at some point in both releases. Maybe there's a bug
> preventing that in 8.2?

Hmmm ... I did rewrite the backend-side code for that just recently for
performance reasons ... could I have broken it? Anyone want to take a
second look at
http://archives.postgresql.org/pgsql-committers/2007-01/msg00171.php

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: minshall(at)intellicon(dot)biz, pgsql-performance(at)postgresql(dot)org
Subject: Re: stats collector process high CPU utilization
Date: 2007-02-09 15:55:57
Message-ID: 29832.1171036557@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

I wrote:
> Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
>> Regarding temp tables, I'd think that the pgstat entries should be
>> getting dropped at some point in both releases. Maybe there's a bug
>> preventing that in 8.2?

> Hmmm ... I did rewrite the backend-side code for that just recently for
> performance reasons ... could I have broken it?

I did some testing with HEAD and verified that pgstat_vacuum_tabstat()
still seems to do what it's supposed to, so that theory falls down.

Alvaro, could you send Benjamin your stat-file-dumper tool so we can
get some more info? Alternatively, if Benjamin wants to send me a copy
of his stats file (off-list), I'd be happy to take a look.

regards, tom lane


From: Benjamin Minshall <minshall(at)intellicon(dot)biz>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: stats collector process high CPU utilization
Date: 2007-02-09 16:27:07
Message-ID: 45CCA0DB.2010903@intellicon.biz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Tom Lane wrote:
> I wrote:
>> Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
>>> Regarding temp tables, I'd think that the pgstat entries should be
>>> getting dropped at some point in both releases. Maybe there's a bug
>>> preventing that in 8.2?
>
>> Hmmm ... I did rewrite the backend-side code for that just recently for
>> performance reasons ... could I have broken it?
>
> I did some testing with HEAD and verified that pgstat_vacuum_tabstat()
> still seems to do what it's supposed to, so that theory falls down.
>
> Alvaro, could you send Benjamin your stat-file-dumper tool so we can
> get some more info?

> Alternatively, if Benjamin wants to send me a copy
> of his stats file (off-list), I'd be happy to take a look.
>
> regards, tom lane

When I checked on the server this morning, the huge stats file has
returned to a normal size. I set up a script to track CPU usage and
stats file size, and it appears to have decreased from 90MB down to
about 2MB over roughly 6 hours last night. The CPU usage of the stats
collector also decreased accordingly.

The application logs indicate that there was no variation in the
workload over this time period, however the file size started to
decrease soon after the nightly pg_dump backups completed. Coincidence
perhaps?

Nonetheless, I would appreciate a copy of Alvaro's stat file tool just
to see if anything stands out in the collected stats.

Thanks for your help, Tom.

--
Benjamin Minshall <minshall(at)intellicon(dot)biz>
Senior Developer -- Intellicon, Inc.
http://www.intellicon.biz


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Benjamin Minshall <minshall(at)intellicon(dot)biz>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: stats collector process high CPU utilization
Date: 2007-02-09 16:33:06
Message-ID: 1116.1171038786@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Benjamin Minshall <minshall(at)intellicon(dot)biz> writes:
> When I checked on the server this morning, the huge stats file has
> returned to a normal size. I set up a script to track CPU usage and
> stats file size, and it appears to have decreased from 90MB down to
> about 2MB over roughly 6 hours last night. The CPU usage of the stats
> collector also decreased accordingly.

> The application logs indicate that there was no variation in the
> workload over this time period, however the file size started to
> decrease soon after the nightly pg_dump backups completed. Coincidence
> perhaps?

Well, that's pretty interesting. What are your vacuuming arrangements
for this installation? Could the drop in file size have coincided with
VACUUM operations? Because the ultimate backstop against bloated stats
files is pgstat_vacuum_tabstat(), which is run by VACUUM and arranges to
clean out any entries that shouldn't be there anymore.

It's sounding like what you had was just transient bloat, in which case
it might be useful to inquire whether anything out-of-the-ordinary had
been done to the database right before the excessive-CPU-usage problem
started.

regards, tom lane


From: Benjamin Minshall <minshall(at)intellicon(dot)biz>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: stats collector process high CPU utilization
Date: 2007-02-09 18:36:03
Message-ID: 45CCBF13.2000707@intellicon.biz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Tom Lane wrote:
> Well, that's pretty interesting. What are your vacuuming arrangements
> for this installation? Could the drop in file size have coincided with
> VACUUM operations? Because the ultimate backstop against bloated stats
> files is pgstat_vacuum_tabstat(), which is run by VACUUM and arranges to
> clean out any entries that shouldn't be there anymore.

VACUUM and ANALYZE are done by autovacuum only, no cron jobs.
autovacuum_naptime is 30 seconds so it should make it to each database
every 10 minutes or so. Do you think that more aggressive vacuuming
would prevent future swelling of the stats file?

> It's sounding like what you had was just transient bloat, in which case
> it might be useful to inquire whether anything out-of-the-ordinary had
> been done to the database right before the excessive-CPU-usage problem
> started.

I don't believe that there was any unusual activity on the server, but I
have set up some more detailed logging to hopefully identify a pattern
if the problem resurfaces.

Thanks.

--
Benjamin Minshall <minshall(at)intellicon(dot)biz>
Senior Developer -- Intellicon, Inc.
http://www.intellicon.biz


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Benjamin Minshall <minshall(at)intellicon(dot)biz>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: stats collector process high CPU utilization
Date: 2007-03-01 20:12:56
Message-ID: 16282.1172779976@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Benjamin Minshall <minshall(at)intellicon(dot)biz> writes:
> Tom Lane wrote:
>> It's sounding like what you had was just transient bloat, in which case
>> it might be useful to inquire whether anything out-of-the-ordinary had
>> been done to the database right before the excessive-CPU-usage problem
>> started.

> I don't believe that there was any unusual activity on the server, but I
> have set up some more detailed logging to hopefully identify a pattern
> if the problem resurfaces.

A further report led us to realize that 8.2.x in fact has a nasty bug
here: the stats collector is supposed to dump its stats to a file at
most every 500 milliseconds, but the code was actually waiting only
500 microseconds :-(. The larger the stats file, the more obvious
this problem gets.

If you want to patch this before 8.2.4, try this...

Index: pgstat.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/postmaster/pgstat.c,v
retrieving revision 1.140.2.2
diff -c -r1.140.2.2 pgstat.c
*** pgstat.c 26 Jan 2007 20:07:01 -0000 1.140.2.2
--- pgstat.c 1 Mar 2007 20:04:50 -0000
***************
*** 1689,1695 ****
/* Preset the delay between status file writes */
MemSet(&write_timeout, 0, sizeof(struct itimerval));
write_timeout.it_value.tv_sec = PGSTAT_STAT_INTERVAL / 1000;
! write_timeout.it_value.tv_usec = PGSTAT_STAT_INTERVAL % 1000;

/*
* Read in an existing statistics stats file or initialize the stats to
--- 1689,1695 ----
/* Preset the delay between status file writes */
MemSet(&write_timeout, 0, sizeof(struct itimerval));
write_timeout.it_value.tv_sec = PGSTAT_STAT_INTERVAL / 1000;
! write_timeout.it_value.tv_usec = (PGSTAT_STAT_INTERVAL % 1000) * 1000;

/*
* Read in an existing statistics stats file or initialize the stats to

regards, tom lane


From: Benjamin Minshall <minshall(at)intellicon(dot)biz>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: stats collector process high CPU utilization
Date: 2007-03-01 20:41:19
Message-ID: 45E73A6F.50800@intellicon.biz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Tom Lane wrote:
> A further report led us to realize that 8.2.x in fact has a nasty bug
> here: the stats collector is supposed to dump its stats to a file at
> most every 500 milliseconds, but the code was actually waiting only
> 500 microseconds :-(. The larger the stats file, the more obvious
> this problem gets.
>
> If you want to patch this before 8.2.4, try this...
>

Thanks for the follow-up on this issue, Tom. I was able to link the
original huge stats file problem to some long(ish) running transactions
which blocked VACUUM, but this patch will really help. Thanks.

-Ben


From: "Merlin Moncure" <mmoncure(at)gmail(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Benjamin Minshall" <minshall(at)intellicon(dot)biz>, pgsql-performance(at)postgresql(dot)org
Subject: Re: stats collector process high CPU utilization
Date: 2007-03-01 21:58:20
Message-ID: b42b73150703011358p7dfe77dft85ba7f3cdd3ce160@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On 3/1/07, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Benjamin Minshall <minshall(at)intellicon(dot)biz> writes:
> > Tom Lane wrote:
> >> It's sounding like what you had was just transient bloat, in which case
> >> it might be useful to inquire whether anything out-of-the-ordinary had
> >> been done to the database right before the excessive-CPU-usage problem
> >> started.
>
> > I don't believe that there was any unusual activity on the server, but I
> > have set up some more detailed logging to hopefully identify a pattern
> > if the problem resurfaces.
>
> A further report led us to realize that 8.2.x in fact has a nasty bug
> here: the stats collector is supposed to dump its stats to a file at
> most every 500 milliseconds, but the code was actually waiting only
> 500 microseconds :-(. The larger the stats file, the more obvious
> this problem gets.

I think this explains the trigger that was blowing up my FC4 box.

merlin


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Merlin Moncure" <mmoncure(at)gmail(dot)com>
Cc: "Benjamin Minshall" <minshall(at)intellicon(dot)biz>, pgsql-performance(at)postgresql(dot)org
Subject: Re: stats collector process high CPU utilization
Date: 2007-03-01 22:44:29
Message-ID: 17817.1172789069@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

"Merlin Moncure" <mmoncure(at)gmail(dot)com> writes:
> On 3/1/07, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> A further report led us to realize that 8.2.x in fact has a nasty bug
>> here: the stats collector is supposed to dump its stats to a file at
>> most every 500 milliseconds, but the code was actually waiting only
>> 500 microseconds :-(. The larger the stats file, the more obvious
>> this problem gets.

> I think this explains the trigger that was blowing up my FC4 box.

I dug in the archives a bit and couldn't find the report you're
referring to?

regards, tom lane


From: "Merlin Moncure" <mmoncure(at)gmail(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Benjamin Minshall" <minshall(at)intellicon(dot)biz>, pgsql-performance(at)postgresql(dot)org
Subject: Re: stats collector process high CPU utilization
Date: 2007-03-02 01:30:00
Message-ID: b42b73150703011730s7c6d4aaeo8d9206eab2594329@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On 3/2/07, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> "Merlin Moncure" <mmoncure(at)gmail(dot)com> writes:
> > On 3/1/07, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >> A further report led us to realize that 8.2.x in fact has a nasty bug
> >> here: the stats collector is supposed to dump its stats to a file at
> >> most every 500 milliseconds, but the code was actually waiting only
> >> 500 microseconds :-(. The larger the stats file, the more obvious
> >> this problem gets.
>
> > I think this explains the trigger that was blowing up my FC4 box.
>
> I dug in the archives a bit and couldn't find the report you're
> referring to?

I was referring to this:
http://archives.postgresql.org/pgsql-hackers/2007-02/msg01418.php

Even though the fundamental reason was obvious (and btw, I inherited
this server less than two months ago), I was still curious what was
making 8.2 blow up a box that was handling a million tps/hour for over
a year. :-)

merlin


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Merlin Moncure" <mmoncure(at)gmail(dot)com>
Cc: "Benjamin Minshall" <minshall(at)intellicon(dot)biz>, pgsql-performance(at)postgresql(dot)org
Subject: Re: stats collector process high CPU utilization
Date: 2007-03-02 01:35:53
Message-ID: 28377.1172799353@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

"Merlin Moncure" <mmoncure(at)gmail(dot)com> writes:
> On 3/2/07, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> "Merlin Moncure" <mmoncure(at)gmail(dot)com> writes:
>>> I think this explains the trigger that was blowing up my FC4 box.
>>
>> I dug in the archives a bit and couldn't find the report you're
>> referring to?

> I was referring to this:
> http://archives.postgresql.org/pgsql-hackers/2007-02/msg01418.php

Oh, the kernel-panic thing. Hm, I wouldn't have thought that replacing
a file at a huge rate would induce a kernel panic ... but who knows?
Do you want to try installing the one-liner patch and see if the panic
goes away?

Actually I was wondering a bit if that strange Windows error discussed
earlier today could be triggered by this behavior:
http://archives.postgresql.org/pgsql-general/2007-03/msg00000.php

regards, tom lane


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Merlin Moncure <mmoncure(at)gmail(dot)com>, Benjamin Minshall <minshall(at)intellicon(dot)biz>, pgsql-performance(at)postgresql(dot)org
Subject: Re: stats collector process high CPU utilization
Date: 2007-03-02 10:32:46
Message-ID: 45E7FD4E.6070101@hagander.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Tom Lane wrote:
> "Merlin Moncure" <mmoncure(at)gmail(dot)com> writes:
>> On 3/2/07, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> "Merlin Moncure" <mmoncure(at)gmail(dot)com> writes:
>>>> I think this explains the trigger that was blowing up my FC4 box.
>>> I dug in the archives a bit and couldn't find the report you're
>>> referring to?
>
>> I was referring to this:
>> http://archives.postgresql.org/pgsql-hackers/2007-02/msg01418.php
>
> Oh, the kernel-panic thing. Hm, I wouldn't have thought that replacing
> a file at a huge rate would induce a kernel panic ... but who knows?
> Do you want to try installing the one-liner patch and see if the panic
> goes away?
>
> Actually I was wondering a bit if that strange Windows error discussed
> earlier today could be triggered by this behavior:
> http://archives.postgresql.org/pgsql-general/2007-03/msg00000.php

I think that's very likely. If we're updaitng the file *that* often,
we're certainly doing something that's very unusual for the windows
filesystem, and possibly for the hardware as well :-)

//Magnus


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Benjamin Minshall <minshall(at)intellicon(dot)biz>, pgsql-performance(at)postgresql(dot)org
Subject: Re: stats collector process high CPU utilization
Date: 2007-03-02 23:06:39
Message-ID: 200703022306.l22N6dc21565@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance


Sorry, I introduced this bug.

---------------------------------------------------------------------------

Tom Lane wrote:
> Benjamin Minshall <minshall(at)intellicon(dot)biz> writes:
> > Tom Lane wrote:
> >> It's sounding like what you had was just transient bloat, in which case
> >> it might be useful to inquire whether anything out-of-the-ordinary had
> >> been done to the database right before the excessive-CPU-usage problem
> >> started.
>
> > I don't believe that there was any unusual activity on the server, but I
> > have set up some more detailed logging to hopefully identify a pattern
> > if the problem resurfaces.
>
> A further report led us to realize that 8.2.x in fact has a nasty bug
> here: the stats collector is supposed to dump its stats to a file at
> most every 500 milliseconds, but the code was actually waiting only
> 500 microseconds :-(. The larger the stats file, the more obvious
> this problem gets.
>
> If you want to patch this before 8.2.4, try this...
>
> Index: pgstat.c
> ===================================================================
> RCS file: /cvsroot/pgsql/src/backend/postmaster/pgstat.c,v
> retrieving revision 1.140.2.2
> diff -c -r1.140.2.2 pgstat.c
> *** pgstat.c 26 Jan 2007 20:07:01 -0000 1.140.2.2
> --- pgstat.c 1 Mar 2007 20:04:50 -0000
> ***************
> *** 1689,1695 ****
> /* Preset the delay between status file writes */
> MemSet(&write_timeout, 0, sizeof(struct itimerval));
> write_timeout.it_value.tv_sec = PGSTAT_STAT_INTERVAL / 1000;
> ! write_timeout.it_value.tv_usec = PGSTAT_STAT_INTERVAL % 1000;
>
> /*
> * Read in an existing statistics stats file or initialize the stats to
> --- 1689,1695 ----
> /* Preset the delay between status file writes */
> MemSet(&write_timeout, 0, sizeof(struct itimerval));
> write_timeout.it_value.tv_sec = PGSTAT_STAT_INTERVAL / 1000;
> ! write_timeout.it_value.tv_usec = (PGSTAT_STAT_INTERVAL % 1000) * 1000;
>
> /*
> * Read in an existing statistics stats file or initialize the stats to
>
>
> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
> message can get through to the mailing list cleanly

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Benjamin Minshall <minshall(at)intellicon(dot)biz>, pgsql-performance(at)postgresql(dot)org
Subject: Re: stats collector process high CPU utilization
Date: 2007-03-02 23:23:53
Message-ID: 45E8B209.1040209@commandprompt.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Bruce Momjian wrote:
> Sorry, I introduced this bug.

To the gallows with you! :) Don't feel bad, there were several hackers
that missed the math on that one.

Joshua D. Drake

>
> ---------------------------------------------------------------------------
>
> Tom Lane wrote:
>> Benjamin Minshall <minshall(at)intellicon(dot)biz> writes:
>>> Tom Lane wrote:
>>>> It's sounding like what you had was just transient bloat, in which case
>>>> it might be useful to inquire whether anything out-of-the-ordinary had
>>>> been done to the database right before the excessive-CPU-usage problem
>>>> started.
>>> I don't believe that there was any unusual activity on the server, but I
>>> have set up some more detailed logging to hopefully identify a pattern
>>> if the problem resurfaces.
>> A further report led us to realize that 8.2.x in fact has a nasty bug
>> here: the stats collector is supposed to dump its stats to a file at
>> most every 500 milliseconds, but the code was actually waiting only
>> 500 microseconds :-(. The larger the stats file, the more obvious
>> this problem gets.
>>
>> If you want to patch this before 8.2.4, try this...
>>
>> Index: pgstat.c
>> ===================================================================
>> RCS file: /cvsroot/pgsql/src/backend/postmaster/pgstat.c,v
>> retrieving revision 1.140.2.2
>> diff -c -r1.140.2.2 pgstat.c
>> *** pgstat.c 26 Jan 2007 20:07:01 -0000 1.140.2.2
>> --- pgstat.c 1 Mar 2007 20:04:50 -0000
>> ***************
>> *** 1689,1695 ****
>> /* Preset the delay between status file writes */
>> MemSet(&write_timeout, 0, sizeof(struct itimerval));
>> write_timeout.it_value.tv_sec = PGSTAT_STAT_INTERVAL / 1000;
>> ! write_timeout.it_value.tv_usec = PGSTAT_STAT_INTERVAL % 1000;
>>
>> /*
>> * Read in an existing statistics stats file or initialize the stats to
>> --- 1689,1695 ----
>> /* Preset the delay between status file writes */
>> MemSet(&write_timeout, 0, sizeof(struct itimerval));
>> write_timeout.it_value.tv_sec = PGSTAT_STAT_INTERVAL / 1000;
>> ! write_timeout.it_value.tv_usec = (PGSTAT_STAT_INTERVAL % 1000) * 1000;
>>
>> /*
>> * Read in an existing statistics stats file or initialize the stats to
>>
>>
>> regards, tom lane
>>
>> ---------------------------(end of broadcast)---------------------------
>> TIP 1: if posting/reading through Usenet, please send an appropriate
>> subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
>> message can get through to the mailing list cleanly
>

--

=== The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive PostgreSQL solutions since 1997
http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/