Re: WARNING: pgstat wait timeout

Lists: pgsql-hackers
From: "Sergey E(dot) Koposov" <math(at)sai(dot)msu(dot)ru>
To: pgsql-hackers(at)postgresql(dot)org
Subject: WARNING: pgstat wait timeout
Date: 2010-01-20 23:20:08
Message-ID: alpine.LRH.2.00.1001210139500.15783@lnfm1.sai.msu.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hello hackers,

I've recently hit the message "WARNING: pgstat wait timeout" with PG
8.4.2.
I saw some reports about that message in the -bugs mailing list
http://archives.postgresql.org/pgsql-bugs/2009-12/msg00175.php
http://archives.postgresql.org/pgsql-bugs/2009-07/msg00081.php
where the backtrace from the statisctic collector was requested.
Although I don't have any other bad sympthoms in the system, I still
obtained a backtrace from the statistics collector process. Since I'm not
100% sure that the message is really a bug, feel free to ignore. But
if needed I have PG still running, so I can check something else if
needed.

Here is the (rather innocent IMHO) backtrace of the statistic collector
process:
(gdb) bt
#0 0x00007f31ddfc4b1f in poll () from /lib/libc.so.6
#1 0x00000000005bf7da in PgstatCollectorMain (argc=<value optimized out>,
# argv=<value optimized out>)
at pgstat.c:2718
#2 0x00000000005c0131 in pgstat_start () at pgstat.c:631
#3 0x00000000005c474d in reaper (postgres_signal_arg=<value optimized
out>)
#at postmaster.c:2322
#4 <signal handler called>
#5 0x00007f31ddfc6c83 in select () from /lib/libc.so.6
#6 0x00000000005c20fc in ServerLoop () at postmaster.c:1347
#7 0x00000000005c34a7 in PostmasterMain (argc=3, argv=0x144fd20) at
#postmaster.c:1040
#8 0x000000000056cdc8 in main (argc=3, argv=0x144fd20) at main.c:188
(gdb) quit
The program is running. Quit anyway (and detach it)? (y or n) y
Detaching from program: /opt/pgsql/bin/postgres, process 24677
-------------

Bt full:

(gdb) bt full
#0 0x00007f31ddfc4b1f in poll () from /lib/libc.so.6
No symbol table info available.
#1 0x00000000005bf7da in PgstatCollectorMain (argc=<value optimized out>, argv=<value optimized out>)
at pgstat.c:2718
len = 64
msg = {msg_hdr = {m_type = PGSTAT_MTYPE_BGWRITER, m_size = 64}, msg_dummy = {m_hdr = {
m_type = PGSTAT_MTYPE_BGWRITER, m_size = 64}}, msg_inquiry = {m_hdr = {m_type = PGSTAT_MTYPE_BGWRITER,
m_size = 64}, inquiry_time = 0}, msg_tabstat = {m_hdr = {m_type = PGSTAT_MTYPE_BGWRITER, m_size = 64},
m_databaseid = 0, m_nentries = 0, m_xact_commit = 0, m_xact_rollback = 0, m_entry = {{t_id = 0, t_counts = {
t_numscans = 0, t_tuples_returned = 0, t_tuples_fetched = 138, t_tuples_inserted = 138,
t_tuples_updated = 0, t_tuples_deleted = 0, t_tuples_hot_updated = 0, t_new_live_tuples = 0,
t_new_dead_tuples = 0, t_blocks_fetched = 2, t_blocks_hit = 2}}, {t_id = 2672, t_counts = {t_numscans = 1,
t_tuples_returned = 1, t_tuples_fetched = 1, t_tuples_inserted = 0, t_tuples_updated = 0,
t_tuples_deleted = 0, t_tuples_hot_updated = 0, t_new_live_tuples = 0, t_new_dead_tuples = 0,
t_blocks_fetched = 2, t_blocks_hit = 2}}, {t_id = 1259, t_counts = {t_numscans = 4,
t_tuples_returned = 553, t_tuples_fetched = 0, t_tuples_inserted = 0, t_tuples_updated = 0,
t_tuples_deleted = 0, t_tuples_hot_updated = 0, t_new_live_tuples = 0, t_new_dead_tuples = 0,
t_blocks_fetched = 24, t_blocks_hit = 24}}, {t_id = 2615, t_counts = {t_numscans = 0,
t_tuples_returned = 0, t_tuples_fetched = 0, t_tuples_inserted = 0, t_tuples_updated = 0,
t_tuples_deleted = 0, t_tuples_hot_updated = 0, t_new_live_tuples = 0, t_new_dead_tuples = 0,
t_blocks_fetched = 1, t_blocks_hit = 1}}, {t_id = 2685, t_counts = {t_numscans = 1, t_tuples_returned = 1,
t_tuples_fetched = 1, t_tuples_inserted = 0, t_tuples_updated = 0, t_tuples_deleted = 0,
t_tuples_hot_updated = 0, t_new_live_tuples = 0, t_new_dead_tuples = 0, t_blocks_fetched = 2,
t_blocks_hit = 2}}, {t_id = 1172815, t_counts = {t_numscans = 0, t_tuples_returned = 0,
t_tuples_fetched = 0, t_tuples_inserted = 500000, t_tuples_updated = 0, t_tuples_deleted = 0,
t_tuples_hot_updated = 0, t_new_live_tuples = 500000, t_new_dead_tuples = 0, t_blocks_fetched = 23077,
t_blocks_hit = 15381}}, {t_id = 1172684, t_counts = {t_numscans = 0, t_tuples_returned = 0,
msg = {msg_hdr = {m_type = PGSTAT_MTYPE_BGWRITER, m_size = 64}, msg_dummy = {m_hdr = {
m_type = PGSTAT_MTYPE_BGWRITER, m_size = 64}}, msg_inquiry = {m_hdr = {m_type = PGSTAT_MTYPE_BGWRITER,
m_size = 64}, inquiry_time = 0}, msg_tabstat = {m_hdr = {m_type = PGSTAT_MTYPE_BGWRITER, m_size = 64},
m_databaseid = 0, m_nentries = 0, m_xact_commit = 0, m_xact_rollback = 0, m_entry = {{t_id = 0, t_counts = {
t_numscans = 0, t_tuples_returned = 0, t_tuples_fetched = 138, t_tuples_inserted = 138,
t_tuples_updated = 0, t_tuples_deleted = 0, t_tuples_hot_updated = 0, t_new_live_tuples = 0,
t_new_dead_tuples = 0, t_blocks_fetched = 2, t_blocks_hit = 2}}, {t_id = 2672, t_counts = {t_numscans = 1,
t_tuples_returned = 1, t_tuples_fetched = 1, t_tuples_inserted = 0, t_tuples_updated = 0,
t_tuples_deleted = 0, t_tuples_hot_updated = 0, t_new_live_tuples = 0, t_new_dead_tuples = 0,
t_blocks_fetched = 2, t_blocks_hit = 2}}, {t_id = 1259, t_counts = {t_numscans = 4,
t_tuples_returned = 553, t_tuples_fetched = 0, t_tuples_inserted = 0, t_tuples_updated = 0,
t_tuples_deleted = 0, t_tuples_hot_updated = 0, t_new_live_tuples = 0, t_new_dead_tuples = 0,
t_blocks_fetched = 24, t_blocks_hit = 24}}, {t_id = 2615, t_counts = {t_numscans = 0,
t_tuples_returned = 0, t_tuples_fetched = 0, t_tuples_inserted = 0, t_tuples_updated = 0,
t_tuples_deleted = 0, t_tuples_hot_updated = 0, t_new_live_tuples = 0, t_new_dead_tuples = 0,
t_blocks_fetched = 1, t_blocks_hit = 1}}, {t_id = 2685, t_counts = {t_numscans = 1, t_tuples_returned = 1,
t_tuples_fetched = 1, t_tuples_inserted = 0, t_tuples_updated = 0, t_tuples_deleted = 0,
t_tuples_hot_updated = 0, t_new_live_tuples = 0, t_new_dead_tuples = 0, t_blocks_fetched = 2,
t_blocks_hit = 2}}, {t_id = 1172815, t_counts = {t_numscans = 0, t_tuples_returned = 0,
t_tuples_fetched = 0, t_tuples_inserted = 500000, t_tuples_updated = 0, t_tuples_deleted = 0,
t_tuples_hot_updated = 0, t_new_live_tuples = 500000, t_new_dead_tuples = 0, t_blocks_fetched = 23077,
t_blocks_hit = 15381}}, {t_id = 1172684, t_counts = {t_numscans = 0, t_tuples_returned = 0,
t_tuples_fetched = 0, t_tuples_inserted = 0, t_tuples_updated = 0, t_tuples_deleted = 0,
t_tuples_hot_updated = 0, t_new_live_tuples = 0, t_new_dead_tuples = 0, t_blocks_fetched = 1018,
t_blocks_hit = 1017}}, {t_id = 2678, t_counts = {t_numscans = 5, t_tuples_returned = 1,
t_tuples_fetched = 1, t_tuples_inserted = 0, t_tuples_updated = 0, t_tuples_deleted = 0,
t_tuples_hot_updated = 0, t_new_live_tuples = 0, t_new_dead_tuples = 0, t_blocks_fetched = 5,
t_blocks_hit = 5}}, {t_id = 2619, t_counts = {t_numscans = 0, t_tuples_returned = 0, t_tuples_fetched = 0,
t_tuples_inserted = 0, t_tuples_updated = 24, t_tuples_deleted = 0, t_tuples_hot_updated = 24,
t_new_live_tuples = 0, t_new_dead_tuples = 9, t_blocks_fetched = 72, t_blocks_hit = 72}}, {t_id = 2696,
t_counts = {t_numscans = 48, t_tuples_returned = 48, t_tuples_fetched = 48, t_tuples_inserted = 0,
t_tuples_updated = 0, t_tuples_deleted = 0, t_tuples_hot_updated = 0, t_new_live_tuples = 0,
t_new_dead_tuples = 0, t_blocks_fetched = 96, t_blocks_hit = 96}}}}, msg_tabpurge = {m_hdr = {
m_type = PGSTAT_MTYPE_BGWRITER, m_size = 64}, m_databaseid = 0, m_nentries = 0, m_tableid = {0, 0, 0, 0, 0, 0,
0, 0, 138, 0, 138, 0 <repeats 11 times>, 2, 0, 2, 0, 2672, 32767, 1, 0, 1, 0, 1, 0 <repeats 13 times>, 2, 0,
2, 0, 1259, 0, 4, 0, 553, 0 <repeats 15 times>, 24, 0, 24, 0, 2615, 0 <repeats 19 times>, 1, 0, 1, 0, 2685, 0,
1, 0, 1, 0, 1, 0 <repeats 13 times>, 2, 0, 2, 0, 1172815, 32561, 0, 0, 0, 0, 0, 0, 500000, 0, 0, 0, 0, 0, 0,
0, 500000, 0, 0, 0, 23077, 0, 15381, 0, 1172684, 0 <repeats 19 times>, 1018, 0, 1017, 0, 2678, 0, 5, 0, 1, 0,
1, 0 <repeats 13 times>, 5, 0, 5, 0, 2619, 3, 0, 0, 0, 0, 0, 0, 0, 0, 24, 0, 0, 0, 24, 0, 0, 0, 9, 0, 72, 0,
---Type <return> to continue, or q <return> to quit---
72, 0, 2696, 32767, 48, 0, 48, 0, 48, 0 <repeats 13 times>...}}, msg_dropdb = {m_hdr = {
m_type = PGSTAT_MTYPE_BGWRITER, m_size = 64}, m_databaseid = 0}, msg_resetcounter = {m_hdr = {
m_type = PGSTAT_MTYPE_BGWRITER, m_size = 64}, m_databaseid = 0}, msg_autovacuum = {m_hdr = {
m_type = PGSTAT_MTYPE_BGWRITER, m_size = 64}, m_databaseid = 0, m_start_time = 0}, msg_vacuum = {m_hdr = {
m_type = PGSTAT_MTYPE_BGWRITER, m_size = 64}, m_databaseid = 0, m_tableoid = 0, m_analyze = 0 '\0',
m_autovacuum = 0 '\0', m_scanned_all = 0 '\0', m_vacuumtime = 0, m_tuples = 0}, msg_analyze = {m_hdr = {
m_type = PGSTAT_MTYPE_BGWRITER, m_size = 64}, m_databaseid = 0, m_tableoid = 0, m_autovacuum = 0 '\0',
m_analyzetime = 0, m_live_tuples = 0, m_dead_tuples = 0}, msg_bgwriter = {m_hdr = {
m_type = PGSTAT_MTYPE_BGWRITER, m_size = 64}, m_timed_checkpoints = 0, m_requested_checkpoints = 0,
m_buf_written_checkpoints = 0, m_buf_written_clean = 0, m_maxwritten_clean = 0, m_buf_written_backend = 138,
m_buf_alloc = 138}, msg_funcstat = {m_hdr = {m_type = PGSTAT_MTYPE_BGWRITER, m_size = 64}, m_databaseid = 0,
m_nentries = 0, m_entry = {{f_id = 0, f_numcalls = 0, f_time = 0, f_time_self = 0}, {f_id = 138,
f_numcalls = 138, f_time = 0, f_time_self = 0}, {f_id = 0, f_numcalls = 0, f_time = 0, f_time_self = 2}, {
f_id = 2, f_numcalls = 140733193390704, f_time = 1, f_time_self = 1}, {f_id = 1, f_numcalls = 0, f_time = 0,
f_time_self = 0}, {f_id = 0, f_numcalls = 0, f_time = 0, f_time_self = 2}, {f_id = 2, f_numcalls = 1259,
f_time = 4, f_time_self = 553}, {f_id = 0, f_numcalls = 0, f_time = 0, f_time_self = 0}, {f_id = 0,
f_numcalls = 0, f_time = 0, f_time_self = 24}, {f_id = 24, f_numcalls = 2615, f_time = 0, f_time_self = 0}, {
f_id = 0, f_numcalls = 0, f_time = 0, f_time_self = 0}, {f_id = 0, f_numcalls = 0, f_time = 0,
f_time_self = 1}, {f_id = 1, f_numcalls = 2685, f_time = 1, f_time_self = 1}, {f_id = 1, f_numcalls = 0,
f_time = 0, f_time_self = 0}, {f_id = 0, f_numcalls = 0, f_time = 0, f_time_self = 2}, {f_id = 2,
f_numcalls = 139848431297871, f_time = 0, f_time_self = 0}, {f_id = 0, f_numcalls = 500000, f_time = 0,
f_time_self = 0}, {f_id = 0, f_numcalls = 500000, f_time = 0, f_time_self = 23077}, {f_id = 15381,
f_numcalls = 1172684, f_time = 0, f_time_self = 0}, {f_id = 0, f_numcalls = 0, f_time = 0, f_time_self = 0},
{f_id = 0, f_numcalls = 0, f_time = 0, f_time_self = 1018}, {f_id = 1017, f_numcalls = 2678, f_time = 5,
f_time_self = 1}, {f_id = 1, f_numcalls = 0, f_time = 0, f_time_self = 0}, {f_id = 0, f_numcalls = 0,
f_time = 0, f_time_self = 5}, {f_id = 5, f_numcalls = 12884904507, f_time = 0, f_time_self = 0}, {f_id = 0,
f_numcalls = 0, f_time = 24, f_time_self = 0}, {f_id = 24, f_numcalls = 0, f_time = 9, f_time_self = 72}, {
f_id = 72, f_numcalls = 140733193390728, f_time = 48, f_time_self = 48}, {f_id = 48, f_numcalls = 0,
f_time = 0, f_time_self = 0}, {f_id = 0, f_numcalls = 0, f_time = 0, f_time_self = 96}}}, msg_funcpurge = {
m_hdr = {m_type = PGSTAT_MTYPE_BGWRITER, m_size = 64}, m_databaseid = 0, m_nentries = 0, m_functionid = {0, 0,
0, 0, 0, 0, 0, 0, 138, 0, 138, 0 <repeats 11 times>, 2, 0, 2, 0, 2672, 32767, 1, 0, 1, 0, 1,
0 <repeats 13 times>, 2, 0, 2, 0, 1259, 0, 4, 0, 553, 0 <repeats 15 times>, 24, 0, 24, 0, 2615,
0 <repeats 19 times>, 1, 0, 1, 0, 2685, 0, 1, 0, 1, 0, 1, 0 <repeats 13 times>, 2, 0, 2, 0, 1172815, 32561, 0,
0, 0, 0, 0, 0, 500000, 0, 0, 0, 0, 0, 0, 0, 500000, 0, 0, 0, 23077, 0, 15381, 0, 1172684,
0 <repeats 19 times>, 1018, 0, 1017, 0, 2678, 0, 5, 0, 1, 0, 1, 0 <repeats 13 times>, 5, 0, 5, 0, 2619, 3, 0,
0, 0, 0, 0, 0, 0, 0, 24, 0, 0, 0, 24, 0, 0, 0, 9, 0, 72, 0, 72, 0, 2696, 32767, 48, 0, 48, 0, 48,
0 <repeats 13 times>...}}}
input_fd = {fd = 8, events = 9, revents = 0}
__func__ = "PgstatCollectorMain"
#2 0x00000000005c0131 in pgstat_start () at pgstat.c:631
curtime = <value optimized out>
pgStatPid = 0
__func__ = "pgstat_start"
#3 0x00000000005c474d in reaper (postgres_signal_arg=<value optimized out>) at postmaster.c:2322
---Type <return> to continue, or q <return> to quit---
save_errno = 17
pid = 24673
exitstatus = 0
status = 0
__func__ = "reaper"
#4 <signal handler called>
#4 <signal handler called>
No symbol table info available.
#5 0x00007f31ddfc6c83 in select () from /lib/libc.so.6
No symbol table info available.
#6 0x00000000005c20fc in ServerLoop () at postmaster.c:1347
timeout = {tv_sec = 59, tv_usec = 916269}
rmask = {fds_bits = {120, 0 <repeats 15 times>}}
selres = <value optimized out>
readmask = {fds_bits = {120, 0 <repeats 15 times>}}
nSockets = 7
now = 64
last_touch_time = 1264024914
__func__ = "ServerLoop"
#7 0x00000000005c34a7 in PostmasterMain (argc=3, argv=0x144fd20) at postmaster.c:1040
opt = <value optimized out>
status = <value optimized out>
userDoption = <value optimized out>
__func__ = "PostmasterMain"
#8 0x000000000056cdc8 in main (argc=3, argv=0x144fd20) at main.c:188

----------

The message occurs with PG 8.4.2 on a powerful 16-CPU, 12Gb RAM machine
with RAID50 during heavy COPY/INSERTs/UPDATEs from one Java client (some
of the INSERTS actually take up to 100 sec). The OS is Linux Debian 5.0 with
2.6.30 kernel.

I see these WARNINGs appearing for the second time and as before I
saw them staring after I did a manual vacuum of one of the tables in which
the data is inserted. But I have no idea whether that's really a cause or
just a coincidence.

Regards,
S.

*******************************************************************
Sergey E. Koposov, PhD
Institute for Astronomy, Cambridge, UK
Tel: +44-1223-337-551
Web: http://lnfm1.sai.msu.ru/~math
E-mail: math(at)sai(dot)msu(dot)ru


From: Jaime Casanova <jcasanov(at)systemguards(dot)com(dot)ec>
To: "Sergey E(dot) Koposov" <math(at)sai(dot)msu(dot)ru>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: WARNING: pgstat wait timeout
Date: 2010-01-21 02:32:08
Message-ID: 3073cc9b1001201832v2da6b791g56dbc03fc56ee242@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Jan 20, 2010 at 6:20 PM, Sergey E. Koposov <math(at)sai(dot)msu(dot)ru> wrote:
> Hello hackers,
>
> I've recently hit the message "WARNING:  pgstat wait timeout" with PG 8.4.2.

i see the same yesterday when initdb a freshly compiled 8.5dev +
lock_timeout patch
i thought maybe it was related to that patch and was thinking in
recompile without the patch but hadn't time, obviously i was wrong

--
Atentamente,
Jaime Casanova
Soporte y capacitación de PostgreSQL
Asesoría y desarrollo de sistemas
Guayaquil - Ecuador
Cel. +59387171157


From: Jaime Casanova <jcasanov(at)systemguards(dot)com(dot)ec>
To: "Sergey E(dot) Koposov" <math(at)sai(dot)msu(dot)ru>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: WARNING: pgstat wait timeout
Date: 2010-01-21 02:33:08
Message-ID: 3073cc9b1001201833oa53ec97kd347b2689d394a34@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Jan 20, 2010 at 9:32 PM, Jaime Casanova
<jcasanov(at)systemguards(dot)com(dot)ec> wrote:
> On Wed, Jan 20, 2010 at 6:20 PM, Sergey E. Koposov <math(at)sai(dot)msu(dot)ru> wrote:
>> Hello hackers,
>>
>> I've recently hit the message "WARNING:  pgstat wait timeout" with PG 8.4.2.
>
> i see the same yesterday when initdb a freshly compiled 8.5dev +
> lock_timeout patch
> i thought maybe it was related to that patch and was thinking in
> recompile without the patch but hadn't time, obviously i was wrong
>

ah! i forgot to say that it was on win32 + mingw, to confirme that
patch works fin in that os

--
Atentamente,
Jaime Casanova
Soporte y capacitación de PostgreSQL
Asesoría y desarrollo de sistemas
Guayaquil - Ecuador
Cel. +59387171157


From: Matteo Beccati <php(at)beccati(dot)com>
To: Jaime Casanova <jcasanov(at)systemguards(dot)com(dot)ec>
Cc: "Sergey E(dot) Koposov" <math(at)sai(dot)msu(dot)ru>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: WARNING: pgstat wait timeout
Date: 2010-01-21 08:51:40
Message-ID: 4B58159C.1040003@beccati.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Il 21/01/2010 03:33, Jaime Casanova ha scritto:
> On Wed, Jan 20, 2010 at 9:32 PM, Jaime Casanova
> <jcasanov(at)systemguards(dot)com(dot)ec> wrote:
>> On Wed, Jan 20, 2010 at 6:20 PM, Sergey E. Koposov<math(at)sai(dot)msu(dot)ru> wrote:
>>> Hello hackers,
>>>
>>> I've recently hit the message "WARNING: pgstat wait timeout" with PG 8.4.2.
>>
>> i see the same yesterday when initdb a freshly compiled 8.5dev +
>> lock_timeout patch
>> i thought maybe it was related to that patch and was thinking in
>> recompile without the patch but hadn't time, obviously i was wrong
>>
>
> ah! i forgot to say that it was on win32 + mingw, to confirme that
> patch works fin in that os

I've seen it a few days ago with 8.5alpha3 on NetBSD when I left the
backend running for a few days. Backend was completely inactive but the
massage was repeated 3-4 times. Googling didn't help and I didnt' know
how to replicate it.

Cheers
--
Matteo Beccati

Development & Consulting - http://www.beccati.com/


From: Greg Smith <greg(at)2ndquadrant(dot)com>
To: Matteo Beccati <php(at)beccati(dot)com>
Cc: Jaime Casanova <jcasanov(at)systemguards(dot)com(dot)ec>, "Sergey E(dot) Koposov" <math(at)sai(dot)msu(dot)ru>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: WARNING: pgstat wait timeout
Date: 2010-01-29 07:45:05
Message-ID: 4B629201.8040507@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I just found a few of these errors in a log file during some pgbench
testing tonight. Linux, recent CVS HEAD; given the range of systems and
versions this has been reported against now, this bug doesn't look like
a platform or version/build specific issue.

Unfortunately the instance I had up wasn't setup very well for logging,
but I did notice that all of the ones I saw had nearby messages about
autovacuum issues, which seems to match Tom's earlier suggestion at
http://archives.postgresql.org/pgsql-bugs/2009-07/msg00083.php that the
source of the warning (but not necessarily the underlying problem) for
these is the autovacuum launcher complaining; here's two different sets:

ERROR: canceling autovacuum task
CONTEXT: automatic analyze of table "pgbench.public.pgbench_accounts"
WARNING: pgstat wait timeout
WARNING: pgstat wait timeout
ERROR: canceling autovacuum task
CONTEXT: automatic analyze of table "pgbench.public.pgbench_accounts"

WARNING: pgstat wait timeout
ERROR: canceling autovacuum task
CONTEXT: automatic analyze of table "pgbench.public.pgbench_accounts"
ERROR: canceling autovacuum task
CONTEXT: automatic analyze of table "pgbench.public.pgbench_accounts"

Because of what I'm (not) seeing in pg_stat_bgwriter, I'm suspicious
that its underlying work or messages may be involved here. I'm not
seeing the sort of totals I expect in that view after these large bouts
of activity. Now, my use tonight has included the new
pg_stat_reset_shared('bgwriter') to clear out those stats between runs,
so perhaps that's involved too. Guessing not only because of the
reports going back to 8.4 that also have this error message.

Will keep an eye out for this one now that I know I might run into it,
have already cranked up the logging and will look for something
reproducible to gather more info.

--
Greg Smith 2ndQuadrant Baltimore, MD
PostgreSQL Training, Services and Support
greg(at)2ndQuadrant(dot)com www.2ndQuadrant.com


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Greg Smith <greg(at)2ndquadrant(dot)com>
Cc: Matteo Beccati <php(at)beccati(dot)com>, Jaime Casanova <jcasanov(at)systemguards(dot)com(dot)ec>, "Sergey E(dot) Koposov" <math(at)sai(dot)msu(dot)ru>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WARNING: pgstat wait timeout
Date: 2010-01-29 13:16:04
Message-ID: 9837222c1001290516j461e0e5eu19aa92a7da583f56@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2010/1/29 Greg Smith <greg(at)2ndquadrant(dot)com>:
> I just found a few of these errors in a log file during some pgbench testing tonight.  Linux, recent CVS HEAD; given the range of systems and versions this has been reported against now, this bug doesn't look like a platform or version/build specific issue.
>
> Unfortunately the instance I had up wasn't setup very well for logging, but I did notice that all of the ones I saw had nearby messages about autovacuum issues, which seems to match Tom's earlier suggestion at http://archives.postgresql.org/pgsql-bugs/2009-07/msg00083.php that the source of the warning (but not necessarily the underlying problem) for these is the autovacuum launcher complaining; here's two different sets:
>
> ERROR:  canceling autovacuum task
> CONTEXT:  automatic analyze of table "pgbench.public.pgbench_accounts"
> WARNING:  pgstat wait timeout
> WARNING:  pgstat wait timeout
> ERROR:  canceling autovacuum task
> CONTEXT:  automatic analyze of table "pgbench.public.pgbench_accounts"
>
> WARNING:  pgstat wait timeout
> ERROR:  canceling autovacuum task
> CONTEXT:  automatic analyze of table "pgbench.public.pgbench_accounts"
> ERROR:  canceling autovacuum task
> CONTEXT:  automatic analyze of table "pgbench.public.pgbench_accounts"
>
> Because of what I'm (not) seeing in pg_stat_bgwriter, I'm suspicious that its underlying work or messages may be involved here.  I'm not seeing the sort of totals I expect in that view after these large bouts of activity.  Now, my use tonight has included the new pg_stat_reset_shared('bgwriter') to clear out those stats between runs, so perhaps that's involved too.  Guessing not only because of the reports going back to 8.4 that also have this error message.
>
> Will keep an eye out for this one now that I know I might run into it, have already cranked up the logging and will look for something reproducible to gather more info.

I've seen this happen semi-frequently during initdb on win32 on an
Amazon EC2 image. I attributed it to the combination of windows and
overloaded virtualization, but it may just be that it shows up more
often there.

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/