freebsd + postgresql 8.3.3 = 100% of cpu usage on stats collector?

Lists: pgsql-general
From: hubert depesz lubaczewski <depesz(at)depesz(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: freebsd + postgresql 8.3.3 = 100% of cpu usage on stats collector?
Date: 2008-06-27 11:16:22
Message-ID: 20080627111622.GA17406@depesz.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

hi
first of all - i know next to nothing about bsd.

friend asked me to setup replication on their machines, and i noticed
that one of cpus (2 quad xeons) is used to 100%:
pgsql 58241 99.0 0.2 22456 7432 ?? Rs Thu10AM 1530:35.93 postgres: stats collector process (postgres)

what might be the reason?

here are settings:
name | setting
---------------------------------+---------------------------------------
add_missing_from | off
allow_system_table_mods | off
archive_command | (disabled)
archive_mode | off
archive_timeout | 0
array_nulls | on
authentication_timeout | 60
autovacuum | on
autovacuum_analyze_scale_factor | 0.1
autovacuum_analyze_threshold | 50
autovacuum_freeze_max_age | 200000000
autovacuum_max_workers | 3
autovacuum_naptime | 60
autovacuum_vacuum_cost_delay | 20
autovacuum_vacuum_cost_limit | -1
autovacuum_vacuum_scale_factor | 0.2
autovacuum_vacuum_threshold | 50
backslash_quote | safe_encoding
bgwriter_delay | 200
bgwriter_lru_maxpages | 100
bgwriter_lru_multiplier | 2
block_size | 8192
bonjour_name |
check_function_bodies | on
checkpoint_completion_target | 0.5
checkpoint_segments | 3
checkpoint_timeout | 300
checkpoint_warning | 30
client_encoding | UTF8
client_min_messages | notice
commit_delay | 0
commit_siblings | 5
config_file | /usr/local/pgsql/data/postgresql.conf
constraint_exclusion | off
cpu_index_tuple_cost | 0.005
cpu_operator_cost | 0.0025
cpu_tuple_cost | 0.01
custom_variable_classes |
data_directory | /usr/local/pgsql/data
DateStyle | ISO, YMD
db_user_namespace | off
deadlock_timeout | 1000
debug_assertions | off
debug_pretty_print | off
debug_print_parse | off
debug_print_plan | off
debug_print_rewritten | off
default_statistics_target | 10
default_tablespace |
default_text_search_config | pg_catalog.simple
default_transaction_isolation | read committed
default_transaction_read_only | off
default_with_oids | off

dynamic_library_path | $libdir
effective_cache_size | 16384
enable_bitmapscan | on
enable_hashagg | on
enable_hashjoin | on
enable_indexscan | on
enable_mergejoin | on
enable_nestloop | on
enable_seqscan | on
enable_sort | on
enable_tidscan | on
escape_string_warning | on
explain_pretty_print | on
external_pid_file |
extra_float_digits | 0
from_collapse_limit | 8
fsync | on
full_page_writes | on
geqo | on
geqo_effort | 5
geqo_generations | 0
geqo_pool_size | 0
geqo_selection_bias | 2
geqo_threshold | 12
gin_fuzzy_search_limit | 0
hba_file | /usr/local/pgsql/data/pg_hba.conf
ident_file | /usr/local/pgsql/data/pg_ident.conf
ignore_system_indexes | off
integer_datetimes | off
join_collapse_limit | 8
krb_caseins_users | off
krb_realm |
krb_server_hostname |
krb_server_keyfile |
krb_srvname | postgres
lc_collate | C
lc_ctype | pl_PL.UTF-8
lc_messages | pl_PL.UTF-8
lc_monetary | pl_PL.UTF-8
lc_numeric | pl_PL.UTF-8
lc_time | pl_PL.UTF-8
listen_addresses | 192.168.0.160
local_preload_libraries |
log_autovacuum_min_duration | -1
log_checkpoints | off
log_connections | off
log_destination | syslog
log_directory | pg_log
log_disconnections | off
log_duration | off
log_error_verbosity | default
log_executor_stats | off
log_filename | postgresql-%Y-%m-%d_%H%M%S.log
log_hostname | off
log_line_prefix |
log_lock_waits | off
log_min_duration_statement | -1
log_min_error_statement | error
log_min_messages | notice
log_parser_stats | off
log_planner_stats | off
log_rotation_age | 1440
log_rotation_size | 10240
log_statement | none
log_statement_stats | off
log_temp_files | -1
log_timezone | Poland
log_truncate_on_rotation | off
logging_collector | off
maintenance_work_mem | 16384
max_connections | 180
max_files_per_process | 1000
max_fsm_pages | 204800
max_fsm_relations | 1000
max_function_args | 100
max_identifier_length | 63
max_index_keys | 32
max_locks_per_transaction | 64
max_prepared_transactions | 5
max_stack_depth | 2048
password_encryption | on
port | 5432
post_auth_delay | 0
pre_auth_delay | 0
random_page_cost | 4
regex_flavor | advanced
search_path | "$user",public
seq_page_cost | 1
server_encoding | UTF8
server_version | 8.3.3
server_version_num | 80303
session_replication_role | origin
shared_buffers | 4096
shared_preload_libraries |
silent_mode | on
sql_inheritance | on
ssl | off
ssl_ciphers | ALL:!ADH:!LOW:!EXP:!MD5:@STRENGTH
standard_conforming_strings | off
statement_timeout | 0
superuser_reserved_connections | 3
synchronize_seqscans | on
synchronous_commit | on
syslog_facility | LOCAL0
syslog_ident | postgres
tcp_keepalives_count | 0
tcp_keepalives_idle | 0
tcp_keepalives_interval | 0
temp_buffers | 1024
temp_tablespaces |
TimeZone | Poland
timezone_abbreviations | Default
trace_notify | off
trace_sort | off
track_activities | on
track_counts | on
transaction_isolation | read committed
transaction_read_only | off
transform_null_equals | off
unix_socket_directory |
unix_socket_group |
unix_socket_permissions | 511
update_process_title | off
vacuum_cost_delay | 0
vacuum_cost_limit | 200
vacuum_cost_page_dirty | 20
vacuum_cost_page_hit | 1
vacuum_cost_page_miss | 10
vacuum_freeze_min_age | 100000000
wal_buffers | 8
wal_sync_method | fsync
wal_writer_delay | 200
work_mem | 1024
xmlbinary | base64
xmloption | content
zero_damaged_pages | off
(189 rows)

any ideas?

Best regards,

depesz


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: depesz(at)depesz(dot)com
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: freebsd + postgresql 8.3.3 = 100% of cpu usage on stats collector?
Date: 2008-06-27 22:04:19
Message-ID: 12211.1214604259@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

hubert depesz lubaczewski <depesz(at)depesz(dot)com> writes:
> friend asked me to setup replication on their machines, and i noticed
> that one of cpus (2 quad xeons) is used to 100%:
> pgsql 58241 99.0 0.2 22456 7432 ?? Rs Thu10AM 1530:35.93 postgres: stats collector process (postgres)

Hmm, we had some problems with the stats collector going nuts a couple
of versions ago; maybe you've found another way to trigger that.
Anything unusual about this DB (lots of tables, for instance)?
Can you try strace and/or gdb to figure out what the collector is doing?

regards, tom lane


From: Bill Moran <wmoran(at)collaborativefusion(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: depesz(at)depesz(dot)com, pgsql-general(at)postgresql(dot)org
Subject: Re: freebsd + postgresql 8.3.3 = 100% of cpu usage on stats collector?
Date: 2008-06-27 22:37:45
Message-ID: 20080627183745.510e78d7.wmoran@collaborativefusion.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Fri, 27 Jun 2008 18:04:19 -0400
Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> hubert depesz lubaczewski <depesz(at)depesz(dot)com> writes:
> > friend asked me to setup replication on their machines, and i noticed
> > that one of cpus (2 quad xeons) is used to 100%:
> > pgsql 58241 99.0 0.2 22456 7432 ?? Rs Thu10AM 1530:35.93 postgres: stats collector process (postgres)
>
> Hmm, we had some problems with the stats collector going nuts a couple
> of versions ago; maybe you've found another way to trigger that.
> Anything unusual about this DB (lots of tables, for instance)?
> Can you try strace and/or gdb to figure out what the collector is doing?

Just in case you're not a FreeBSD expert, it's ktrace on FreeBSD. strace
is the Linux equivalent.

--
Bill Moran <wmoran(at)collaborativefusion(dot)com>


From: hubert depesz lubaczewski <depesz(at)depesz(dot)com>
To: Bill Moran <wmoran(at)collaborativefusion(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-general(at)postgresql(dot)org
Subject: Re: freebsd + postgresql 8.3.3 = 100% of cpu usage on stats collector?
Date: 2008-06-28 04:41:25
Message-ID: 20080628044125.GD11312@depesz.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Fri, Jun 27, 2008 at 06:37:45PM -0400, Bill Moran wrote:
> Just in case you're not a FreeBSD expert, it's ktrace on FreeBSD. strace
> is the Linux equivalent.

i'm not an freebsd expert.

i ran ktrace -p <pid> - it exited immediately.
then i ran ktrace -p <pid> -f stats.ktrace.log -t\+
and it also exited immediately.
in cwd i have 2 binary files - not text ones as i would expect from
strace.

ktrace.out and stats.ktrace.log.

are these files "ok"? i mean - i can make them available, but i'm not
sure if they're ok given the fact that they don't look like text output
of strace.

Best regards,

depesz


From: Rodrigo Gonzalez <rjgonzale(at)gmail(dot)com>
To: depesz(at)depesz(dot)com
Cc: Bill Moran <wmoran(at)collaborativefusion(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-general(at)postgresql(dot)org
Subject: Re: freebsd + postgresql 8.3.3 = 100% of cpu usage on stats collector?
Date: 2008-06-28 05:22:26
Message-ID: 4865CA92.4040201@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

hubert depesz lubaczewski wrote:
> On Fri, Jun 27, 2008 at 06:37:45PM -0400, Bill Moran wrote:
>> Just in case you're not a FreeBSD expert, it's ktrace on FreeBSD. strace
>> is the Linux equivalent.
>
> i'm not an freebsd expert.
>
> i ran ktrace -p <pid> - it exited immediately.
> then i ran ktrace -p <pid> -f stats.ktrace.log -t\+
> and it also exited immediately.
> in cwd i have 2 binary files - not text ones as i would expect from
> strace.
>
> ktrace.out and stats.ktrace.log.
>
> are these files "ok"? i mean - i can make them available, but i'm not
> sure if they're ok given the fact that they don't look like text output
> of strace.
>
> Best regards,
>
> depesz
>
>
>From ktrace man page....

DESCRIPTION
The ktrace utility enables kernel trace logging for the specified pro-
cesses. Kernel trace data is logged to the file ktrace.out. The kernel
operations that are traced include system calls, namei translations,sig-
nal processing, and I/O.

Once tracing is enabled on a process, trace data will be logged until
either the process exits or the trace point is cleared. A traced
process can generate enormous amounts of log data quickly; It is
strongly suggested that users memorize how to disable tracing before
attempting to trace a process. The following command is sufficient to
disable tracing on all user owned processes, and, if executed by root,
all processes:

$ ktrace -C

So, read man page, and at least ktrace -C and read the output at
ktrace.out file.

Best regards

Rodrigo Gonzalez


From: hubert depesz lubaczewski <depesz(at)depesz(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: freebsd + postgresql 8.3.3 = 100% of cpu usage on stats collector?
Date: 2008-06-28 09:19:09
Message-ID: 20080628091909.GA25484@depesz.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Fri, Jun 27, 2008 at 06:04:19PM -0400, Tom Lane wrote:
> Anything unusual about this DB (lots of tables, for instance)?

sorry, missed that question - nothing unusual. couple of dbs (10-15),
around 20 tables per databases, only 1 database really used - all others
tend to be used only for testing purposes.

Best regards,

depesz


From: hubert depesz lubaczewski <depesz(at)depesz(dot)com>
To: Rodrigo Gonzalez <rjgonzale(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: freebsd + postgresql 8.3.3 = 100% of cpu usage on stats collector?
Date: 2008-06-28 09:35:24
Message-ID: 20080628093524.GB25484@depesz.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Sat, Jun 28, 2008 at 02:22:26AM -0300, Rodrigo Gonzalez wrote:
> So, read man page, and at least ktrace -C and read the output at
> ktrace.out file.

ok, i've read it and didn't understand. it says how to disable tracing
but it doesn't say anything about enabling tracing.

yesterday ktrace -p <pid> produced output, and today it generated empty
file.

so - apparently it's much more complicated than linux strace, and since
i generally never use bsd, i i'm not really interested in learning what
to do in this system to get trace. the only problem is that there
definitelly is some kind of problem in stats collector on bsd, but i'm
not able to help diagnose it at this moment.

Best regards,

depesz


From: Bill Moran <wmoran(at)collaborativefusion(dot)com>
To: depesz(at)depesz(dot)com
Cc: Rodrigo Gonzalez <rjgonzale(at)gmail(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: freebsd + postgresql 8.3.3 = 100% of cpu usage on stats collector?
Date: 2008-06-29 21:13:59
Message-ID: 20080629171359.737afc57.wmoran@collaborativefusion.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Sat, 28 Jun 2008 11:35:24 +0200
hubert depesz lubaczewski <depesz(at)depesz(dot)com> wrote:

> On Sat, Jun 28, 2008 at 02:22:26AM -0300, Rodrigo Gonzalez wrote:
> > So, read man page, and at least ktrace -C and read the output at
> > ktrace.out file.
>
> ok, i've read it and didn't understand. it says how to disable tracing
> but it doesn't say anything about enabling tracing.

Use "ktrace -p [pid]" to start tracing. Then use "ktrace -C" to stop
tracing. Trace data is dumped in binary format to the file ktrace.out
(unless you use the -f option to specify another file)

Use the kdump utility to convert the ktrace.out file to something usable.
Something like "kdump > ktrace.txt" will probably get you what you want,
assuming your ktrace file is ktrace.out.

--
Bill Moran <wmoran(at)collaborativefusion(dot)com>


From: hubert depesz lubaczewski <depesz(at)depesz(dot)com>
To: Bill Moran <wmoran(at)collaborativefusion(dot)com>
Cc: Rodrigo Gonzalez <rjgonzale(at)gmail(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: freebsd + postgresql 8.3.3 = 100% of cpu usage on stats collector?
Date: 2008-06-30 08:58:32
Message-ID: 20080630085832.GA10876@depesz.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Sun, Jun 29, 2008 at 05:13:59PM -0400, Bill Moran wrote:
> Use the kdump utility to convert the ktrace.out file to something usable.
> Something like "kdump > ktrace.txt" will probably get you what you want,
> assuming your ktrace file is ktrace.out.

ok. did it. in about 7 seconds, i got 250mb of ktrace.out.
ktrace.txt is 176 megabytes.
content of this is pretty simple:
58241 postgres CALL poll(0x7fffffffd4e0,0x1,0x7d0)
58241 postgres RET poll -1 errno 4 Interrupted system call
58241 postgres CALL poll(0x7fffffffd4e0,0x1,0x7d0)
58241 postgres RET poll -1 errno 4 Interrupted system call
58241 postgres CALL poll(0x7fffffffd4e0,0x1,0x7d0)
58241 postgres RET poll -1 errno 4 Interrupted system call
58241 postgres CALL poll(0x7fffffffd4e0,0x1,0x7d0)
58241 postgres RET poll -1 errno 4 Interrupted system call
58241 postgres CALL poll(0x7fffffffd4e0,0x1,0x7d0)
58241 postgres RET poll -1 errno 4 Interrupted system call
58241 postgres CALL poll(0x7fffffffd4e0,0x1,0x7d0)
58241 postgres RET poll -1 errno 4 Interrupted system call
58241 postgres CALL poll(0x7fffffffd4e0,0x1,0x7d0)

should i make the ktrace.txt available, or should i do something else to
help you ( i mean, the pg-hackers community) diagnose the issue?

Best regards,

depesz


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: depesz(at)depesz(dot)com
Cc: Bill Moran <wmoran(at)collaborativefusion(dot)com>, Rodrigo Gonzalez <rjgonzale(at)gmail(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: freebsd + postgresql 8.3.3 = 100% of cpu usage on stats collector?
Date: 2008-06-30 15:15:42
Message-ID: 28740.1214838942@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

hubert depesz lubaczewski <depesz(at)depesz(dot)com> writes:
> 58241 postgres CALL poll(0x7fffffffd4e0,0x1,0x7d0)
> 58241 postgres RET poll -1 errno 4 Interrupted system call
> 58241 postgres CALL poll(0x7fffffffd4e0,0x1,0x7d0)
> 58241 postgres RET poll -1 errno 4 Interrupted system call
> 58241 postgres CALL poll(0x7fffffffd4e0,0x1,0x7d0)
> 58241 postgres RET poll -1 errno 4 Interrupted system call
> 58241 postgres CALL poll(0x7fffffffd4e0,0x1,0x7d0)
> 58241 postgres RET poll -1 errno 4 Interrupted system call
> 58241 postgres CALL poll(0x7fffffffd4e0,0x1,0x7d0)
> 58241 postgres RET poll -1 errno 4 Interrupted system call

Hmm. The wait-for-input loop in the stats collector assumes that
if it gets EINTR, it should just retry the poll() call immediately.
Which AFAIK is correct. I think you're looking at a kernel or libc
bug here --- somehow the EINTR failure is recurring continuously.
It's probably time to ask about this on a FreeBSD-specific list.

regards, tom lane