Lists: | pgsql-general |
---|
From: | Marc Munro <marc(at)bloodnok(dot)com> |
---|---|
To: | pgsql-general(at)postgresql(dot)org |
Subject: | Something blocking in libpq_gettext? |
Date: | 2006-08-25 18:55:49 |
Message-ID: | 1156532149.28313.26.camel@bloodnok.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-general |
Often, when we get short-lived network problems (like when our network
admins break the firewall), postgres client apps will lock-up. They do
not recover once the network is back to normal, do not time-out, and do
not fail with any sort of error.
This has been happening since at least postgres 7.4.7. We are currently
running 8.0.3 (now upgrading to 8.0.8).
As near as we can tell, it looks like libpq blocked inside poll inside
libpq_gettext.
#0 0xffffe405 in __kernel_vsyscall ()
#1 0x005a31d4 in poll () from /lib/tls/libc.so.6
#2 0xf7fd71ff in libpq_gettext () from /usr/lib/libpq.so.3
#3 0xf7fd7331 in pqWaitTimed () from /usr/lib/libpq.so.3
#4 0xf7fd73a1 in pqWait () from /usr/lib/libpq.so.3
#5 0xf7fd53fb in PQgetResult () from /usr/lib/libpq.so.3
#6 0xf7fd5524 in PQgetResult () from /usr/lib/libpq.so.3
#7 0x081b43b3 in SQLInterface::execute (this=0xf7ce3080,
cmd=0xf7ce0074 "execute lock_games ( '100' )") at SQLInterface.cpp:138
Can anyone offer any solutions, suggestions, fixes? We cannot reproduce
this at will, but are willing to provide more information when next it
occurs.
__
Marc
From: | Alvaro Herrera <alvherre(at)commandprompt(dot)com> |
---|---|
To: | Marc Munro <marc(at)bloodnok(dot)com> |
Cc: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: Something blocking in libpq_gettext? |
Date: | 2006-08-25 19:10:08 |
Message-ID: | 20060825191008.GO14622@alvh.no-ip.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-general |
Marc Munro wrote:
> Often, when we get short-lived network problems (like when our network
> admins break the firewall), postgres client apps will lock-up. They do
> not recover once the network is back to normal, do not time-out, and do
> not fail with any sort of error.
>
> This has been happening since at least postgres 7.4.7. We are currently
> running 8.0.3 (now upgrading to 8.0.8).
>
> As near as we can tell, it looks like libpq blocked inside poll inside
> libpq_gettext.
>
> #0 0xffffe405 in __kernel_vsyscall ()
> #1 0x005a31d4 in poll () from /lib/tls/libc.so.6
> #2 0xf7fd71ff in libpq_gettext () from /usr/lib/libpq.so.3
> #3 0xf7fd7331 in pqWaitTimed () from /usr/lib/libpq.so.3
Wow, that's strange. Maybe it's trying to fetch something in the
message catalogs. What are your locale settings?
--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
From: | Marc Munro <marc(at)bloodnok(dot)com> |
---|---|
To: | Alvaro Herrera <alvherre(at)commandprompt(dot)com> |
Cc: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: Something blocking in libpq_gettext? |
Date: | 2006-08-25 19:37:36 |
Message-ID: | 1156534657.28313.30.camel@bloodnok.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-general |
On Fri, 2006-08-25 at 15:10 -0400, Alvaro Herrera wrote:
> Wow, that's strange. Maybe it's trying to fetch something in the
> message catalogs. What are your locale settings?
>
I'm not sure exactly what you need to know. I do have this tho:
LANG=en_US.UTF-8
What other information would be helpful?
__
Marc
From: | Alvaro Herrera <alvherre(at)commandprompt(dot)com> |
---|---|
To: | Marc Munro <marc(at)bloodnok(dot)com> |
Cc: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: Something blocking in libpq_gettext? |
Date: | 2006-08-25 19:42:32 |
Message-ID: | 20060825194232.GP14622@alvh.no-ip.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-general |
Marc Munro wrote:
> On Fri, 2006-08-25 at 15:10 -0400, Alvaro Herrera wrote:
>
> > Wow, that's strange. Maybe it's trying to fetch something in the
> > message catalogs. What are your locale settings?
>
> I'm not sure exactly what you need to know. I do have this tho:
>
> LANG=en_US.UTF-8
>
> What other information would be helpful?
SHOW lc_messages, I think. Is it something else than C? If so, it will
probably try to read the corresponding postgres.mo file. Do the hung
processes have that file open? (I'm not sure if you can find that out
from only the core file.)
--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.
From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Marc Munro <marc(at)bloodnok(dot)com> |
Cc: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: Something blocking in libpq_gettext? |
Date: | 2006-08-25 19:43:50 |
Message-ID: | 4945.1156535030@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-general |
Marc Munro <marc(at)bloodnok(dot)com> writes:
> As near as we can tell, it looks like libpq blocked inside poll inside
> libpq_gettext.
> #0 0xffffe405 in __kernel_vsyscall ()
> #1 0x005a31d4 in poll () from /lib/tls/libc.so.6
> #2 0xf7fd71ff in libpq_gettext () from /usr/lib/libpq.so.3
> #3 0xf7fd7331 in pqWaitTimed () from /usr/lib/libpq.so.3
> #4 0xf7fd73a1 in pqWait () from /usr/lib/libpq.so.3
> #5 0xf7fd53fb in PQgetResult () from /usr/lib/libpq.so.3
> #6 0xf7fd5524 in PQgetResult () from /usr/lib/libpq.so.3
> #7 0x081b43b3 in SQLInterface::execute (this=3D0xf7ce3080,=20
> cmd=3D0xf7ce0074 "execute lock_games ( '100' )") at SQLInterface.cpp:138
That backtrace is silly on its face --- apparently you are using a
stripped executable and gdb is providing the nearest global symbol
rather than the actual function name. I think you can safely assume
however that you are looking at libpq waiting for input from the
backend. Does the kernel at each end still think the connection is
live? (Try netstat) If this is a common result from short-lived
network problems then you have a beef with the TCP stack at one end
or the other ... TCP is supposed to be more robust than that.
regards, tom lane
From: | Marc Munro <marc(at)bloodnok(dot)com> |
---|---|
To: | Alvaro Herrera <alvherre(at)commandprompt(dot)com> |
Cc: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: Something blocking in libpq_gettext? |
Date: | 2006-08-25 19:59:07 |
Message-ID: | 1156535947.28313.34.camel@bloodnok.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-general |
On Fri, 2006-08-25 at 15:42 -0400, Alvaro Herrera wrote:
> SHOW lc_messages, I think. Is it something else than C? If so, it will
> probably try to read the corresponding postgres.mo file. Do the hung
> processes have that file open? (I'm not sure if you can find that out
> from only the core file.)
>
Database=# SHOW lc_messages;
lc_messages
-------------
en_US.UTF-8
(1 row)
Database=#
We don't think we can find this from the core file either. Next time it
happens we will check whether postgres.mo is in use. Unfortunately we
are no longer seeing the problem. We'll let you know when we have more.
Thanks for the quick response.
__
Marc
From: | Marc Munro <marc(at)bloodnok(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: Something blocking in libpq_gettext? |
Date: | 2006-08-25 20:12:22 |
Message-ID: | 1156536742.28313.39.camel@bloodnok.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-general |
On Fri, 2006-08-25 at 15:43 -0400, Tom Lane wrote:
> Marc Munro <marc(at)bloodnok(dot)com> writes:
> > As near as we can tell, it looks like libpq blocked inside poll inside
> > libpq_gettext.
>
> > #0 0xffffe405 in __kernel_vsyscall ()
> > #1 0x005a31d4 in poll () from /lib/tls/libc.so.6
> > #2 0xf7fd71ff in libpq_gettext () from /usr/lib/libpq.so.3
> > #3 0xf7fd7331 in pqWaitTimed () from /usr/lib/libpq.so.3
> > #4 0xf7fd73a1 in pqWait () from /usr/lib/libpq.so.3
> > #5 0xf7fd53fb in PQgetResult () from /usr/lib/libpq.so.3
> > #6 0xf7fd5524 in PQgetResult () from /usr/lib/libpq.so.3
> > #7 0x081b43b3 in SQLInterface::execute (this=3D0xf7ce3080,=20
> > cmd=3D0xf7ce0074 "execute lock_games ( '100' )") at SQLInterface.cpp:138
>
> That backtrace is silly on its face --- apparently you are using a
> stripped executable and gdb is providing the nearest global symbol
> rather than the actual function name. I think you can safely assume
> however that you are looking at libpq waiting for input from the
> backend. Does the kernel at each end still think the connection is
> live? (Try netstat) If this is a common result from short-lived
> network problems then you have a beef with the TCP stack at one end
> or the other ... TCP is supposed to be more robust than that.
Yes indeed, it is a stripped executable. We are having issues with rpms
and debug symbols and this is the best we have been able to do so far.
We will try netstat next time this happens, and we are (still) trying to
get proper debug information. More and better information will follow
when we have it.
__
Marc
From: | Gregory Stark <gsstark(at)mit(dot)edu> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Marc Munro <marc(at)bloodnok(dot)com>, pgsql-general(at)postgresql(dot)org |
Subject: | Re: Something blocking in libpq_gettext? |
Date: | 2006-08-26 20:47:33 |
Message-ID: | 87odu71axm.fsf@stark.xeocode.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-general |
Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
> If this is a common result from short-lived
> network problems then you have a beef with the TCP stack at one end
> or the other ... TCP is supposed to be more robust than that.
Or a beef with some firewall or router along the way. NAT routers are
particularly prone to breaking TCP's robustness guarantees.
--
greg