Re: Possible Bug: high CPU usage for stats collector in 8.2

Lists: pgsql-hackers
From: Darcy Buskermolen <darcyb(at)commandprompt(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: high CPU usage for stats collector in 8.2
Date: 2007-02-27 21:14:20
Message-ID: 200702271314.21002.darcyb@commandprompt.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I'm observing high CPU usage (95%) of a 2.6GHz opteron by the stats collector
on an 8.2.3 box investigation has lead me to belive that the stats file is
written a lot more often that once every 500ms the following shows this
behavior.

PostgreSQL 8.2.3 on x86_64-redhat-linux-gnu, compiled by GCC gcc (GCC) 3.4.6
20060404 (Red Hat 3.4.6-3)

I ran a
time for i in `psql -qtc "select generate_series(1,10000)"`; do psql -qtc
"select 1 from test where msg_id=$i" ; done

which took
real 1m23.288s
user 0m24.142s
sys 0m21.536s

to execute, during which time I ran a strace on the stats collector which
produces the following output. From this it looks like the stats file is
getting rewritten for each connection teardown, not just every 500ms.

Process 10061 attached - interrupt to quit
Process 10061 detached
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
68.14 28.811963 17 1663827 write
18.22 7.701885 123 62808 12793 poll
11.31 4.783082 365 13101 rename
0.58 0.246169 5 50006 recvfrom
0.57 0.241073 18 13101 open
0.43 0.182816 14 13101 munmap
0.18 0.076176 6 13101 mmap
0.17 0.072746 6 13101 close
0.14 0.060483 5 13101 setitimer
0.10 0.041344 3 13101 12793 rt_sigreturn
0.09 0.039240 3 13101 fstat
0.06 0.024041 2 13110 getppid
------ ----------- ----------- --------- --------- ----------------
100.00 42.281018 1894559 25586 total

As you can see rename was called more than the theroitcal 167 times for 500ms
slices that elapsed during the test

Compared to PostgreSQL 8.1.8 on x86_64-pc-linux-gnu, compiled by GCC
gcc-4.0.gcc-opt (GCC) 4.0.3 (Ubuntu 4.0.3-1ubuntu5)
[All be it this is slower hardware..]

time for i in `psql -qtc "select generate_series(1,10000)"`; do psql -qtc
"select 1 from test where msg_id=$i" ; done

which took
real 9m25.380s
user 6m51.254s
sys 1m47.687s
(and therefor should be about 1130 stat write cycles)

and yielded the following strace

% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
93.64 20.422006 334 61212 select
3.49 0.760963 7 110192 read
1.82 0.396654 19 21128 write
0.64 0.139679 126 1112 rename
0.27 0.057970 52 1112 open
0.06 0.012177 11 1112 munmap
0.04 0.008901 8 1112 mmap
0.03 0.006402 6 1112 close
0.02 0.004282 4 1112 fstat
------ ----------- ----------- --------- --------- ----------------
100.00 21.809034 199204 total

During this run the stats collector does not even show and CPU usage according
to top.

both 8.1 and 8.2 have the following postgresql.conf parameters

stats_command_string = off
stats_start_collector = on
stats_block_level = on
stats_row_level = on

--
Darcy Buskermolen
Command Prompt, Inc.
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
PostgreSQL solutions since 1997
http://www.commandprompt.com/


From: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To: Darcy Buskermolen <darcyb(at)commandprompt(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: high CPU usage for stats collector in 8.2
Date: 2007-02-28 07:23:57
Message-ID: 45E52E0D.6090608@commandprompt.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Darcy Buskermolen wrote:
> I'm observing high CPU usage (95%) of a 2.6GHz opteron by the stats collector
> on an 8.2.3 box investigation has lead me to belive that the stats file is
> written a lot more often that once every 500ms the following shows this
> behavior.

Any thoughts on the below?

>
> PostgreSQL 8.2.3 on x86_64-redhat-linux-gnu, compiled by GCC gcc (GCC) 3.4.6
> 20060404 (Red Hat 3.4.6-3)
>
> I ran a
> time for i in `psql -qtc "select generate_series(1,10000)"`; do psql -qtc
> "select 1 from test where msg_id=$i" ; done
>
> which took
> real 1m23.288s
> user 0m24.142s
> sys 0m21.536s
>
>
> to execute, during which time I ran a strace on the stats collector which
> produces the following output. From this it looks like the stats file is
> getting rewritten for each connection teardown, not just every 500ms.
>
> Process 10061 attached - interrupt to quit
> Process 10061 detached
> % time seconds usecs/call calls errors syscall
> ------ ----------- ----------- --------- --------- ----------------
> 68.14 28.811963 17 1663827 write
> 18.22 7.701885 123 62808 12793 poll
> 11.31 4.783082 365 13101 rename
> 0.58 0.246169 5 50006 recvfrom
> 0.57 0.241073 18 13101 open
> 0.43 0.182816 14 13101 munmap
> 0.18 0.076176 6 13101 mmap
> 0.17 0.072746 6 13101 close
> 0.14 0.060483 5 13101 setitimer
> 0.10 0.041344 3 13101 12793 rt_sigreturn
> 0.09 0.039240 3 13101 fstat
> 0.06 0.024041 2 13110 getppid
> ------ ----------- ----------- --------- --------- ----------------
> 100.00 42.281018 1894559 25586 total
>
> As you can see rename was called more than the theroitcal 167 times for 500ms
> slices that elapsed during the test
>
> Compared to PostgreSQL 8.1.8 on x86_64-pc-linux-gnu, compiled by GCC
> gcc-4.0.gcc-opt (GCC) 4.0.3 (Ubuntu 4.0.3-1ubuntu5)
> [All be it this is slower hardware..]
>
> time for i in `psql -qtc "select generate_series(1,10000)"`; do psql -qtc
> "select 1 from test where msg_id=$i" ; done
>
> which took
> real 9m25.380s
> user 6m51.254s
> sys 1m47.687s
> (and therefor should be about 1130 stat write cycles)
>
> and yielded the following strace
>
> % time seconds usecs/call calls errors syscall
> ------ ----------- ----------- --------- --------- ----------------
> 93.64 20.422006 334 61212 select
> 3.49 0.760963 7 110192 read
> 1.82 0.396654 19 21128 write
> 0.64 0.139679 126 1112 rename
> 0.27 0.057970 52 1112 open
> 0.06 0.012177 11 1112 munmap
> 0.04 0.008901 8 1112 mmap
> 0.03 0.006402 6 1112 close
> 0.02 0.004282 4 1112 fstat
> ------ ----------- ----------- --------- --------- ----------------
> 100.00 21.809034 199204 total
>
>
>
> During this run the stats collector does not even show and CPU usage according
> to top.
>
>
> both 8.1 and 8.2 have the following postgresql.conf parameters
>
> stats_command_string = off
> stats_start_collector = on
> stats_block_level = on
> stats_row_level = on
>
>
>

--

=== The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive PostgreSQL solutions since 1997
http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/


From: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To: Darcy Buskermolen <darcyb(at)commandprompt(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Possible Bug: high CPU usage for stats collector in 8.2
Date: 2007-02-28 18:26:37
Message-ID: 45E5C95D.1080900@commandprompt.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Darcy Buskermolen wrote:
> I'm observing high CPU usage (95%) of a 2.6GHz opteron by the stats collector
> on an 8.2.3 box investigation has lead me to belive that the stats file is
> written a lot more often that once every 500ms the following shows this
> behavior.
>
> PostgreSQL 8.2.3 on x86_64-redhat-linux-gnu, compiled by GCC gcc (GCC) 3.4.6
> 20060404 (Red Hat 3.4.6-3)
>
> I ran a
> time for i in `psql -qtc "select generate_series(1,10000)"`; do psql -qtc
> "select 1 from test where msg_id=$i" ; done
>
> which took
> real 1m23.288s
> user 0m24.142s
> sys 0m21.536s
>
>
> to execute, during which time I ran a strace on the stats collector which
> produces the following output. From this it looks like the stats file is
> getting rewritten for each connection teardown, not just every 500ms.
>
> Process 10061 attached - interrupt to quit
> Process 10061 detached
> % time seconds usecs/call calls errors syscall
> ------ ----------- ----------- --------- --------- ----------------
> 68.14 28.811963 17 1663827 write
> 18.22 7.701885 123 62808 12793 poll
> 11.31 4.783082 365 13101 rename
> 0.58 0.246169 5 50006 recvfrom
> 0.57 0.241073 18 13101 open
> 0.43 0.182816 14 13101 munmap
> 0.18 0.076176 6 13101 mmap
> 0.17 0.072746 6 13101 close
> 0.14 0.060483 5 13101 setitimer
> 0.10 0.041344 3 13101 12793 rt_sigreturn
> 0.09 0.039240 3 13101 fstat
> 0.06 0.024041 2 13110 getppid
> ------ ----------- ----------- --------- --------- ----------------
> 100.00 42.281018 1894559 25586 total
>
> As you can see rename was called more than the theroitcal 167 times for 500ms
> slices that elapsed during the test
>
> Compared to PostgreSQL 8.1.8 on x86_64-pc-linux-gnu, compiled by GCC
> gcc-4.0.gcc-opt (GCC) 4.0.3 (Ubuntu 4.0.3-1ubuntu5)
> [All be it this is slower hardware..]
>
> time for i in `psql -qtc "select generate_series(1,10000)"`; do psql -qtc
> "select 1 from test where msg_id=$i" ; done
>
> which took
> real 9m25.380s
> user 6m51.254s
> sys 1m47.687s
> (and therefor should be about 1130 stat write cycles)
>
> and yielded the following strace
>
> % time seconds usecs/call calls errors syscall
> ------ ----------- ----------- --------- --------- ----------------
> 93.64 20.422006 334 61212 select
> 3.49 0.760963 7 110192 read
> 1.82 0.396654 19 21128 write
> 0.64 0.139679 126 1112 rename
> 0.27 0.057970 52 1112 open
> 0.06 0.012177 11 1112 munmap
> 0.04 0.008901 8 1112 mmap
> 0.03 0.006402 6 1112 close
> 0.02 0.004282 4 1112 fstat
> ------ ----------- ----------- --------- --------- ----------------
> 100.00 21.809034 199204 total
>
>
>
> During this run the stats collector does not even show and CPU usage according
> to top.
>
>
> both 8.1 and 8.2 have the following postgresql.conf parameters
>
> stats_command_string = off
> stats_start_collector = on
> stats_block_level = on
> stats_row_level = on
>
>
>

--

=== The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive PostgreSQL solutions since 1997
http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/


From: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
Cc: Darcy Buskermolen <darcyb(at)commandprompt(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Possible Bug: high CPU usage for stats collector in 8.2
Date: 2007-02-28 22:31:48
Message-ID: 45E602D4.50803@commandprompt.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Joshua D. Drake wrote:
> Darcy Buskermolen wrote:
>> I'm observing high CPU usage (95%) of a 2.6GHz opteron by the stats collector
>> on an 8.2.3 box investigation has lead me to belive that the stats file is
>> written a lot more often that once every 500ms the following shows this
>> behavior.

I have just done a test separate from darcy on my workstation:

jd(at)scratch:~/82$ strace -c -p 16130
Process 16130 attached - interrupt to quit
Process 16130 detached
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
50.00 0.167103 3 60264 10225 poll
28.31 0.094600 9 10398 rename
16.89 0.056457 5 10398 open
2.48 0.008278 0 50023 recvfrom
1.14 0.003804 0 10398 setitimer
0.53 0.001781 0 20796 write
0.43 0.001432 0 10398 close
0.21 0.000690 0 10398 munmap
0.02 0.000057 0 10398 mmap
0.00 0.000000 0 10398 fstat
0.00 0.000000 0 10398 10225 rt_sigreturn
0.00 0.000000 0 10414 getppid
------ ----------- ----------- --------- --------- ----------------
100.00 0.334202 224681 20450 total

Query:

time for i in `bin/psql -p8000 -d postgres -c "select
generate_series(1,10000)"`; do bin/psql -p8000 -d postgres -qc "select 1
from pg_database where datname = 'postgres'"; done;

Time:

real 2m5.077s
user 0m28.414s
sys 0m39.762s

PostgreSQL 8.2.0, Ubuntu Edgy 64bit.

Seems like something is extremely wonky here.

Joshua D. Drake

>>
>> PostgreSQL 8.2.3 on x86_64-redhat-linux-gnu, compiled by GCC gcc (GCC) 3.4.6
>> 20060404 (Red Hat 3.4.6-3)
>>
>> I ran a
>> time for i in `psql -qtc "select generate_series(1,10000)"`; do psql -qtc
>> "select 1 from test where msg_id=$i" ; done
>>
>> which took
>> real 1m23.288s
>> user 0m24.142s
>> sys 0m21.536s
>>
>>
>> to execute, during which time I ran a strace on the stats collector which
>> produces the following output. From this it looks like the stats file is
>> getting rewritten for each connection teardown, not just every 500ms.
>>
>> Process 10061 attached - interrupt to quit
>> Process 10061 detached
>> % time seconds usecs/call calls errors syscall
>> ------ ----------- ----------- --------- --------- ----------------
>> 68.14 28.811963 17 1663827 write
>> 18.22 7.701885 123 62808 12793 poll
>> 11.31 4.783082 365 13101 rename
>> 0.58 0.246169 5 50006 recvfrom
>> 0.57 0.241073 18 13101 open
>> 0.43 0.182816 14 13101 munmap
>> 0.18 0.076176 6 13101 mmap
>> 0.17 0.072746 6 13101 close
>> 0.14 0.060483 5 13101 setitimer
>> 0.10 0.041344 3 13101 12793 rt_sigreturn
>> 0.09 0.039240 3 13101 fstat
>> 0.06 0.024041 2 13110 getppid
>> ------ ----------- ----------- --------- --------- ----------------
>> 100.00 42.281018 1894559 25586 total
>>
>> As you can see rename was called more than the theroitcal 167 times for 500ms
>> slices that elapsed during the test
>>
>> Compared to PostgreSQL 8.1.8 on x86_64-pc-linux-gnu, compiled by GCC
>> gcc-4.0.gcc-opt (GCC) 4.0.3 (Ubuntu 4.0.3-1ubuntu5)
>> [All be it this is slower hardware..]
>>
>> time for i in `psql -qtc "select generate_series(1,10000)"`; do psql -qtc
>> "select 1 from test where msg_id=$i" ; done
>>
>> which took
>> real 9m25.380s
>> user 6m51.254s
>> sys 1m47.687s
>> (and therefor should be about 1130 stat write cycles)
>>
>> and yielded the following strace
>>
>> % time seconds usecs/call calls errors syscall
>> ------ ----------- ----------- --------- --------- ----------------
>> 93.64 20.422006 334 61212 select
>> 3.49 0.760963 7 110192 read
>> 1.82 0.396654 19 21128 write
>> 0.64 0.139679 126 1112 rename
>> 0.27 0.057970 52 1112 open
>> 0.06 0.012177 11 1112 munmap
>> 0.04 0.008901 8 1112 mmap
>> 0.03 0.006402 6 1112 close
>> 0.02 0.004282 4 1112 fstat
>> ------ ----------- ----------- --------- --------- ----------------
>> 100.00 21.809034 199204 total
>>
>>
>>
>> During this run the stats collector does not even show and CPU usage according
>> to top.
>>
>>
>> both 8.1 and 8.2 have the following postgresql.conf parameters
>>
>> stats_command_string = off
>> stats_start_collector = on
>> stats_block_level = on
>> stats_row_level = on
>>
>>
>>
>
>

--

=== The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive PostgreSQL solutions since 1997
http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/


From: Darcy Buskermolen <darcyb(at)commandprompt(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Possible Bug: high CPU usage for stats collector in 8.2
Date: 2007-03-01 18:07:27
Message-ID: 200703011007.27717.darcyb@commandprompt.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wednesday 28 February 2007 14:31, Joshua D. Drake wrote:
> Joshua D. Drake wrote:
> > Darcy Buskermolen wrote:
> >> I'm observing high CPU usage (95%) of a 2.6GHz opteron by the stats
> >> collector on an 8.2.3 box investigation has lead me to belive that the
> >> stats file is written a lot more often that once every 500ms the
> >> following shows this behavior.

And just so that noone thinks it's linux specific, I tried the same test on a
FreeBSD box with 8.2 on it and got the following. Ignore the time portion of
the results, but what is important is the number of calls occuring in the
timeframe (given this test took about 38:30, it should have resulted in about
4620 calls to rename)

% time seconds usecs/call calls err syscall
------ ----------- ----------- --------- --------- ----------------
149.23 1170797349.233781 464000015 2519188 2 write
57.71 452768432.575458 2036010019 71515 20238 poll
17.59 138017435.049569 -1944030463 20766 getppid
-136.71 -1072628999.112811 -1254412922 20317 rename
-136.71 -1072629000.340448 -1254412960 20317 close
-136.71 -1072629001.605166 -1254412996 20317 setitimer
-146.81 -1151850347.801977 -861500230 20316 open
-146.81 -1151850349.967150 -861500320 20316 fstat
-226.75 -1779034289.1666093 -639263959 50831 recvfrom
------ ----------- ----------- --------- --------- ----------------

real 38m29.897s
user 0m40.641s
sys 0m47.487s

....

>
> I have just done a test separate from darcy on my workstation:
>
>
> jd(at)scratch:~/82$ strace -c -p 16130
> Process 16130 attached - interrupt to quit
> Process 16130 detached
> % time seconds usecs/call calls errors syscall
> ------ ----------- ----------- --------- --------- ----------------
> 50.00 0.167103 3 60264 10225 poll
> 28.31 0.094600 9 10398 rename
> 16.89 0.056457 5 10398 open
> 2.48 0.008278 0 50023 recvfrom
> 1.14 0.003804 0 10398 setitimer
> 0.53 0.001781 0 20796 write
> 0.43 0.001432 0 10398 close
> 0.21 0.000690 0 10398 munmap
> 0.02 0.000057 0 10398 mmap
> 0.00 0.000000 0 10398 fstat
> 0.00 0.000000 0 10398 10225 rt_sigreturn
> 0.00 0.000000 0 10414 getppid
> ------ ----------- ----------- --------- --------- ----------------
> 100.00 0.334202 224681 20450 total
>
>
> Query:
>
> time for i in `bin/psql -p8000 -d postgres -c "select
> generate_series(1,10000)"`; do bin/psql -p8000 -d postgres -qc "select 1
> from pg_database where datname = 'postgres'"; done;
>
> Time:
>
> real 2m5.077s
> user 0m28.414s
> sys 0m39.762s
>
>
> PostgreSQL 8.2.0, Ubuntu Edgy 64bit.
>
> Seems like something is extremely wonky here.
>
> Joshua D. Drake
>
> >> PostgreSQL 8.2.3 on x86_64-redhat-linux-gnu, compiled by GCC gcc (GCC)
> >> 3.4.6 20060404 (Red Hat 3.4.6-3)
> >>
> >> I ran a
> >> time for i in `psql -qtc "select generate_series(1,10000)"`; do psql
> >> -qtc "select 1 from test where msg_id=$i" ; done
> >>
> >> which took
> >> real 1m23.288s
> >> user 0m24.142s
> >> sys 0m21.536s
> >>
> >>
> >> to execute, during which time I ran a strace on the stats collector
> >> which produces the following output. From this it looks like the stats
> >> file is getting rewritten for each connection teardown, not just every
> >> 500ms.
> >>
> >> Process 10061 attached - interrupt to quit
> >> Process 10061 detached
> >> % time seconds usecs/call calls errors syscall
> >> ------ ----------- ----------- --------- --------- ----------------
> >> 68.14 28.811963 17 1663827 write
> >> 18.22 7.701885 123 62808 12793 poll
> >> 11.31 4.783082 365 13101 rename
> >> 0.58 0.246169 5 50006 recvfrom
> >> 0.57 0.241073 18 13101 open
> >> 0.43 0.182816 14 13101 munmap
> >> 0.18 0.076176 6 13101 mmap
> >> 0.17 0.072746 6 13101 close
> >> 0.14 0.060483 5 13101 setitimer
> >> 0.10 0.041344 3 13101 12793 rt_sigreturn
> >> 0.09 0.039240 3 13101 fstat
> >> 0.06 0.024041 2 13110 getppid
> >> ------ ----------- ----------- --------- --------- ----------------
> >> 100.00 42.281018 1894559 25586 total
> >>
> >> As you can see rename was called more than the theroitcal 167 times for
> >> 500ms slices that elapsed during the test
> >>
> >> Compared to PostgreSQL 8.1.8 on x86_64-pc-linux-gnu, compiled by GCC
> >> gcc-4.0.gcc-opt (GCC) 4.0.3 (Ubuntu 4.0.3-1ubuntu5)
> >> [All be it this is slower hardware..]
> >>
> >> time for i in `psql -qtc "select generate_series(1,10000)"`; do psql
> >> -qtc "select 1 from test where msg_id=$i" ; done
> >>
> >> which took
> >> real 9m25.380s
> >> user 6m51.254s
> >> sys 1m47.687s
> >> (and therefor should be about 1130 stat write cycles)
> >>
> >> and yielded the following strace
> >>
> >> % time seconds usecs/call calls errors syscall
> >> ------ ----------- ----------- --------- --------- ----------------
> >> 93.64 20.422006 334 61212 select
> >> 3.49 0.760963 7 110192 read
> >> 1.82 0.396654 19 21128 write
> >> 0.64 0.139679 126 1112 rename
> >> 0.27 0.057970 52 1112 open
> >> 0.06 0.012177 11 1112 munmap
> >> 0.04 0.008901 8 1112 mmap
> >> 0.03 0.006402 6 1112 close
> >> 0.02 0.004282 4 1112 fstat
> >> ------ ----------- ----------- --------- --------- ----------------
> >> 100.00 21.809034 199204 total
> >>
> >>
> >>
> >> During this run the stats collector does not even show and CPU usage
> >> according to top.
> >>
> >>
> >> both 8.1 and 8.2 have the following postgresql.conf parameters
> >>
> >> stats_command_string = off
> >> stats_start_collector = on
> >> stats_block_level = on
> >> stats_row_level = on

--

Darcy Buskermolen
The PostgreSQL company, Command Prompt Inc.
http://www.commandprompt.com/


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Darcy Buskermolen <darcyb(at)commandprompt(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Possible Bug: high CPU usage for stats collector in 8.2
Date: 2007-03-01 19:57:13
Message-ID: 15236.1172779033@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Darcy Buskermolen <darcyb(at)commandprompt(dot)com> writes:
> I'm observing high CPU usage (95%) of a 2.6GHz opteron by the stats
> collector on an 8.2.3 box investigation has lead me to belive that the
> stats file is written a lot more often that once every 500ms the
> following shows this behavior.

Looks like someone broke the usage of PGSTAT_STAT_INTERVAL between 8.1
and 8.2 ... it's waiting 500 microseconds, not the intended 500
milliseconds.

regards, tom lane


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Darcy Buskermolen <darcyb(at)commandprompt(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Possible Bug: high CPU usage for stats collector in 8.2
Date: 2007-03-01 20:20:05
Message-ID: 45E73575.90706@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> Darcy Buskermolen <darcyb(at)commandprompt(dot)com> writes:
>
>> I'm observing high CPU usage (95%) of a 2.6GHz opteron by the stats
>> collector on an 8.2.3 box investigation has lead me to belive that the
>> stats file is written a lot more often that once every 500ms the
>> following shows this behavior.
>>
>
> Looks like someone broke the usage of PGSTAT_STAT_INTERVAL between 8.1
> and 8.2 ... it's waiting 500 microseconds, not the intended 500
> milliseconds.
>

Good catch. I am also a bit dubious about this code:

input_fd.fd = pgStatSock;
input_fd.events = POLLIN | POLLERR;
input_fd.revents = 0;

if (poll(&input_fd, 1, PGSTAT_SELECT_TIMEOUT * 1000) < 0)
{
if (errno == EINTR)
continue;
ereport(ERROR,
(errcode_for_socket_access(),
errmsg("poll() failed in statistics collector: %m")));
}

got_data = (input_fd.revents != 0);

AIUI you are not supposed to put POLLERR in the events field. We should
probably be setting POLLIN | POLLPRI, and we should also probably check
exactly what event was returned in revents.

cheers

andrew


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Darcy Buskermolen <darcyb(at)commandprompt(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Possible Bug: high CPU usage for stats collector in 8.2
Date: 2007-03-01 20:42:29
Message-ID: 16546.1172781749@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> Good catch. I am also a bit dubious about this code:

> input_fd.fd = pgStatSock;
> input_fd.events = POLLIN | POLLERR;
> input_fd.revents = 0;

Hm. The Single Unix Spec saith that POLLERR is ignored in the events
field, but it is not clear to me that older systems might not treat it
as a condition bit. For instance on HPUX the poll man page says only

The condition flags POLLERR, POLLHUP, and POLLNVAL are always set in
revents if the conditions they indicate are true for the specified
file descriptor, whether or not these flags are set in events.

> AIUI you are not supposed to put POLLERR in the events field. We should
> probably be setting POLLIN | POLLPRI, and we should also probably check
> exactly what event was returned in revents.

We don't need to check what was returned because the action is the same
either way: do a recv(). I'm not seeing the point of setting POLLPRI.

regards, tom lane


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Darcy Buskermolen <darcyb(at)commandprompt(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Possible Bug: high CPU usage for stats collector in 8.2
Date: 2007-03-01 20:53:42
Message-ID: 45E73D56.5060809@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
>
>> Good catch. I am also a bit dubious about this code:
>>
>
>
>> input_fd.fd = pgStatSock;
>> input_fd.events = POLLIN | POLLERR;
>> input_fd.revents = 0;
>>
>
> Hm. The Single Unix Spec saith that POLLERR is ignored in the events
> field, but it is not clear to me that older systems might not treat it
> as a condition bit. For instance on HPUX the poll man page says only
>
> The condition flags POLLERR, POLLHUP, and POLLNVAL are always set in
> revents if the conditions they indicate are true for the specified
> file descriptor, whether or not these flags are set in events.
>

yeah.

http://www.opengroup.org/onlinepubs/009695399/functions/poll.html says:

"An error has occurred on the device or stream. This flag is only valid
in the /revents/ bitmask; it shall be ignored in the /events/ member."

>
>> AIUI you are not supposed to put POLLERR in the events field. We should
>> probably be setting POLLIN | POLLPRI, and we should also probably check
>> exactly what event was returned in revents.
>>
>
> We don't need to check what was returned because the action is the same
> either way: do a recv(). I'm not seeing the point of setting POLLPRI.
>
>
>

Maybe none in this case - I guess we're not sending priority data.

This might all be OK - I just noticed it as I was looking at the
problem, so I though I'd mention it. I have no doubt your fix is the
correct one, though.

cheers

andrew


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Darcy Buskermolen <darcyb(at)commandprompt(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Possible Bug: high CPU usage for stats collector in 8.2
Date: 2007-03-01 21:00:53
Message-ID: 16755.1172782853@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> This might all be OK - I just noticed it as I was looking at the
> problem, so I though I'd mention it.

I'm inclined to leave that code alone unless someone can point to a
platform where setting POLLERR in events actually causes a problem.
The pgstat code was modeled on libpq, which has been passing POLLERR
ever since its poll() support was written (for 7.4), so I think we
have lots of evidence that this way works everywhere, and none that
the other one does.

regards, tom lane


From: Darcy Buskermolen <darcyb(at)commandprompt(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: Possible Bug: high CPU usage for stats collector in 8.2
Date: 2007-03-02 21:41:18
Message-ID: 200703021341.19299.darcyb@commandprompt.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thursday 01 March 2007 11:57, Tom Lane wrote:
> Darcy Buskermolen <darcyb(at)commandprompt(dot)com> writes:
> > I'm observing high CPU usage (95%) of a 2.6GHz opteron by the stats
> > collector on an 8.2.3 box investigation has lead me to belive that the
> > stats file is written a lot more often that once every 500ms the
> > following shows this behavior.
>
> Looks like someone broke the usage of PGSTAT_STAT_INTERVAL between 8.1
> and 8.2 ... it's waiting 500 microseconds, not the intended 500
> milliseconds.

I can confirm that rev 1.140.2.3 of pgstat.c does fix this issue.

The stats collector CPU usage has dropped from inexcess of 95% to 5%

Thanks.

>
> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
> http://archives.postgresql.org

--

Darcy Buskermolen
The PostgreSQL company, Command Prompt Inc.
http://www.commandprompt.com/


From: Larry Rosenman <ler(at)lerctr(dot)org>
To: Darcy Buskermolen <darcyb(at)commandprompt(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: Possible Bug: high CPU usage for stats collector in 8.2
Date: 2007-03-02 22:14:51
Message-ID: 20070302161425.W38110@thebighonker.lerctr.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, 2 Mar 2007, Darcy Buskermolen wrote:

> On Thursday 01 March 2007 11:57, Tom Lane wrote:
>> Darcy Buskermolen <darcyb(at)commandprompt(dot)com> writes:
>>> I'm observing high CPU usage (95%) of a 2.6GHz opteron by the stats
>>> collector on an 8.2.3 box investigation has lead me to belive that the
>>> stats file is written a lot more often that once every 500ms the
>>> following shows this behavior.
>>
>> Looks like someone broke the usage of PGSTAT_STAT_INTERVAL between 8.1
>> and 8.2 ... it's waiting 500 microseconds, not the intended 500
>> milliseconds.
>
> I can confirm that rev 1.140.2.3 of pgstat.c does fix this issue.
>
> The stats collector CPU usage has dropped from inexcess of 95% to 5%
>
> Thanks.
>
Any guess on when we'd see an 8.2.4? I have a business reason for asking.

Thanks!

--
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 512-248-2683 E-Mail: ler(at)lerctr(dot)org
US Mail: 430 Valona Loop, Round Rock, TX 78681-3893


From: Jim Nasby <decibel(at)decibel(dot)org>
To: Larry Rosenman <ler(at)lerctr(dot)org>
Cc: Darcy Buskermolen <darcyb(at)commandprompt(dot)com>, pgsql-hackers(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: Possible Bug: high CPU usage for stats collector in 8.2
Date: 2007-03-05 23:42:26
Message-ID: 02C601F8-16DE-4CDC-A1B4-4B949DB3C619@decibel.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mar 2, 2007, at 4:14 PM, Larry Rosenman wrote:
> Any guess on when we'd see an 8.2.4? I have a business reason for
> asking.

If history is any guide, probably when a 'reasonably large fix' to
something comes around, though reviewing early 8.1 release history
might prove enlightening...
--
Jim Nasby jim(at)nasby(dot)net
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jim Nasby <decibel(at)decibel(dot)org>
Cc: Larry Rosenman <ler(at)lerctr(dot)org>, Darcy Buskermolen <darcyb(at)commandprompt(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Possible Bug: high CPU usage for stats collector in 8.2
Date: 2007-03-06 07:15:14
Message-ID: 28077.1173165314@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Jim Nasby <decibel(at)decibel(dot)org> writes:
> On Mar 2, 2007, at 4:14 PM, Larry Rosenman wrote:
>> Any guess on when we'd see an 8.2.4? I have a business reason for
>> asking.

> If history is any guide, probably when a 'reasonably large fix' to
> something comes around,

Well, what with Bruce on vacation for the next ten days, and tremendous
pressure to review various people's random 8.3 patches, you should not
expect to see any new back-branch releases in March unless the sky
starts falling on us. I'm not real happy about that, because we've
accumulated several moderately significant bug fixes in the 8.2 branch
since 8.2.3, but I don't currently see a sufficient case for expending
a day on a new release...

regards, tom lane