Re: pgstat wait timeout

Lists: pgsql-hackers
From: Steve Crawford <scrawford(at)pinpointresearch(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: pgstat wait timeout
Date: 2011-12-28 01:51:06
Message-ID: 4EFA760A.6040204@pinpointresearch.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I have a system (9.0.4 on Ubuntu Server 10.04 LTS x86_64) that is
currently in test/dev mode. I'm currently seeing the following messages
occurring every few seconds:

...
Dec 27 17:43:22 foo postgres[23693]: [6-1] : WARNING: pgstat wait timeout
Dec 27 17:43:27 foo postgres[27324]: [71400-1] : WARNING: pgstat wait
timeout
Dec 27 17:43:33 foo postgres[23695]: [6-1] : WARNING: pgstat wait timeout
Dec 27 17:43:54 foo postgres[27324]: [71401-1] : WARNING: pgstat wait
timeout
Dec 27 17:43:59 foo postgres[23697]: [6-1] : WARNING: pgstat wait timeout
Dec 27 17:44:04 foo postgres[27324]: [71402-1] : WARNING: pgstat wait
timeout
Dec 27 17:44:09 foo postgres[23715]: [6-1] : WARNING: pgstat wait timeout
Dec 27 17:44:17 foo postgres[27324]: [71403-1] : WARNING: pgstat wait
timeout
Dec 27 17:44:22 foo postgres[23716]: [6-1] : WARNING: pgstat wait timeout
Dec 27 17:44:27 foo postgres[27324]: [71404-1] : WARNING: pgstat wait
timeout
Dec 27 17:44:33 foo postgres[23718]: [6-1] : WARNING: pgstat wait timeout
Dec 27 17:44:54 foo postgres[27324]: [71405-1] : WARNING: pgstat wait
timeout
Dec 27 17:44:59 foo postgres[23824]: [6-1] : WARNING: pgstat wait timeout
Dec 27 17:45:04 foo postgres[27324]: [71406-1] : WARNING: pgstat wait
timeout

I can't correlate events exactly, but the messages seem to have started
shortly after I dropped a pgbench user and database. My Googling turned
up various requests for debugging info on "hackers". Since the system
isn't live, I haven't touched it in case anyone wants me to collect
debugging info.

Otherwise, I plan on just blowing the install away and replacing it with 9.1

Cheers,
Steve


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Steve Crawford <scrawford(at)pinpointresearch(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pgstat wait timeout
Date: 2011-12-28 13:05:41
Message-ID: 1325077369-sup-2321@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Excerpts from Steve Crawford's message of mar dic 27 22:51:06 -0300 2011:
> I have a system (9.0.4 on Ubuntu Server 10.04 LTS x86_64) that is
> currently in test/dev mode. I'm currently seeing the following messages
> occurring every few seconds:
>
> ...
> Dec 27 17:43:22 foo postgres[23693]: [6-1] : WARNING: pgstat wait timeout
> Dec 27 17:43:27 foo postgres[27324]: [71400-1] : WARNING: pgstat wait
> timeout
> Dec 27 17:43:33 foo postgres[23695]: [6-1] : WARNING: pgstat wait timeout
> Dec 27 17:43:54 foo postgres[27324]: [71401-1] : WARNING: pgstat wait
> timeout

Hm, so can you strace the stats collector to see what it's doing? Maybe
grab a backtrace with GDB from it before anything else.

My guess is 27324 is the autovac launcher and the others are autovac
workers just as they die.

--
Álvaro Herrera <alvherre(at)commandprompt(dot)com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: Steve Crawford <scrawford(at)pinpointresearch(dot)com>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pgstat wait timeout
Date: 2011-12-28 16:24:37
Message-ID: 4EFB42C5.2060103@pinpointresearch.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 12/28/2011 05:05 AM, Alvaro Herrera wrote:
> Excerpts from Steve Crawford's message of mar dic 27 22:51:06 -0300 2011:
>> I have a system (9.0.4 on Ubuntu Server 10.04 LTS x86_64) that is
>> currently in test/dev mode. I'm currently seeing the following messages
>> occurring every few seconds:
>>
>> ...
>> Dec 27 17:43:22 foo postgres[23693]: [6-1] : WARNING: pgstat wait timeout
>> Dec 27 17:43:27 foo postgres[27324]: [71400-1] : WARNING: pgstat wait
>> timeout
>> Dec 27 17:43:33 foo postgres[23695]: [6-1] : WARNING: pgstat wait timeout
>> Dec 27 17:43:54 foo postgres[27324]: [71401-1] : WARNING: pgstat wait
>> timeout
> Hm, so can you strace the stats collector to see what it's doing? Maybe
> grab a backtrace with GDB from it before anything else.
>
> My guess is 27324 is the autovac launcher and the others are autovac
> workers just as they die.
>
You are correct. 27324 is the launcher and the others are autovac
workers. Here's the strace of the stats collector process:

getppid() = 27320
poll([{fd=8, events=POLLIN|POLLERR}], 1, 2000) = 0 (Timeout)
getppid() = 27320
poll([{fd=8, events=POLLIN|POLLERR}], 1, 2000) = 0 (Timeout)
getppid() = 27320
poll([{fd=8, events=POLLIN|POLLERR}], 1, 2000) = 0 (Timeout)
....rinse...lather...repeat...ad nauseum...

And the backtrace:

#0 0x00007ff4d2e80f58 in poll () from /lib/libc.so.6
#1 0x00007ff4d4e6f465 in ?? ()
#2 0x00007ff4d4e6fd83 in pgstat_start ()
#3 0x00007ff4d4e73475 in ?? ()
#4 <signal handler called>
#5 0x00007ff4d2e85fd3 in select () from /lib/libc.so.6
#6 0x00007ff4d4e71b93 in ?? ()
#7 0x00007ff4d4e74b01 in PostmasterMain ()
#8 0x00007ff4d4e193b3 in main ()

Cheers,
Steve


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Steve Crawford <scrawford(at)pinpointresearch(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pgstat wait timeout
Date: 2011-12-28 17:34:56
Message-ID: 1325093390-sup-4050@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Excerpts from Steve Crawford's message of mié dic 28 13:24:37 -0300 2011:
> On 12/28/2011 05:05 AM, Alvaro Herrera wrote:
> > Excerpts from Steve Crawford's message of mar dic 27 22:51:06 -0300 2011:
> >> I have a system (9.0.4 on Ubuntu Server 10.04 LTS x86_64) that is
> >> currently in test/dev mode. I'm currently seeing the following messages
> >> occurring every few seconds:
> >>
> >> ...
> >> Dec 27 17:43:22 foo postgres[23693]: [6-1] : WARNING: pgstat wait timeout
> >> Dec 27 17:43:27 foo postgres[27324]: [71400-1] : WARNING: pgstat wait
> >> timeout
> >> Dec 27 17:43:33 foo postgres[23695]: [6-1] : WARNING: pgstat wait timeout
> >> Dec 27 17:43:54 foo postgres[27324]: [71401-1] : WARNING: pgstat wait
> >> timeout
> > Hm, so can you strace the stats collector to see what it's doing? Maybe
> > grab a backtrace with GDB from it before anything else.
> >
> > My guess is 27324 is the autovac launcher and the others are autovac
> > workers just as they die.
> >
> You are correct. 27324 is the launcher and the others are autovac
> workers. Here's the strace of the stats collector process:
>
> getppid() = 27320
> poll([{fd=8, events=POLLIN|POLLERR}], 1, 2000) = 0 (Timeout)
> getppid() = 27320
> poll([{fd=8, events=POLLIN|POLLERR}], 1, 2000) = 0 (Timeout)
> getppid() = 27320
> poll([{fd=8, events=POLLIN|POLLERR}], 1, 2000) = 0 (Timeout)
> ....rinse...lather...repeat...ad nauseum...

Weird ... even across more "pgstat wait timeout" messages? It's like
it's not getting the "inquiry" messages that would tell it to write the
file ... something wrong with the UDP socket perhaps?

--
Álvaro Herrera <alvherre(at)commandprompt(dot)com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: Steve Crawford <scrawford(at)pinpointresearch(dot)com>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pgstat wait timeout
Date: 2011-12-28 18:05:49
Message-ID: 4EFB5A7D.70904@pinpointresearch.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 12/28/2011 09:34 AM, Alvaro Herrera wrote:
> Excerpts from Steve Crawford's message of mié dic 28 13:24:37 -0300 2011:
>> On 12/28/2011 05:05 AM, Alvaro Herrera wrote:
>>> Excerpts from Steve Crawford's message of mar dic 27 22:51:06 -0300 2011:
>>>> I have a system (9.0.4 on Ubuntu Server 10.04 LTS x86_64) that is
>>>> currently in test/dev mode. I'm currently seeing the following messages
>>>> occurring every few seconds:
>>>>
>>>> ...
>>>> Dec 27 17:43:22 foo postgres[23693]: [6-1] : WARNING: pgstat wait timeout
>>>> Dec 27 17:43:27 foo postgres[27324]: [71400-1] : WARNING: pgstat wait
>>>> timeout
>>>> Dec 27 17:43:33 foo postgres[23695]: [6-1] : WARNING: pgstat wait timeout
>>>> Dec 27 17:43:54 foo postgres[27324]: [71401-1] : WARNING: pgstat wait
>>>> timeout
>>> Hm, so can you strace the stats collector to see what it's doing? Maybe
>>> grab a backtrace with GDB from it before anything else.
>>>
>>> My guess is 27324 is the autovac launcher and the others are autovac
>>> workers just as they die.
>>>
>> You are correct. 27324 is the launcher and the others are autovac
>> workers. Here's the strace of the stats collector process:
>>
>> getppid() = 27320
>> poll([{fd=8, events=POLLIN|POLLERR}], 1, 2000) = 0 (Timeout)
>> getppid() = 27320
>> poll([{fd=8, events=POLLIN|POLLERR}], 1, 2000) = 0 (Timeout)
>> getppid() = 27320
>> poll([{fd=8, events=POLLIN|POLLERR}], 1, 2000) = 0 (Timeout)
>> ....rinse...lather...repeat...ad nauseum...
> Weird ... even across more "pgstat wait timeout" messages? It's like
> it's not getting the "inquiry" messages that would tell it to write the
> file ... something wrong with the UDP socket perhaps?
>
Bingo!

postgres 27325 postgres 8u *IPv6* 5379428
0t0 UDP localhost:47204->localhost:47204

In working on diagnosing a network timeout issue over an IPv4 to IPv4
VPN I disabled IPv6 via sysctl on this machine and pretty much forgot
about it since we are still IPv4 internally. But PostgreSQL had already
established a (now non-functional) IPv6 local connection. Re-enabling
IPv6, as it was not related to the VPN timeouts, corrected the "pgstat
wait timeout" issue.

Cheers,
Steve