Re: stopping processes, preventing connections

From: Greg Smith <greg(at)2ndquadrant(dot)com>
To: Herouth Maoz <herouth(at)unicell(dot)co(dot)il>
Cc: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, pgsql-general(at)postgresql(dot)org
Subject: Re: stopping processes, preventing connections
Date: 2010-03-17 18:16:22
Message-ID: 4BA11C76.8070605@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Herouth Maoz wrote:
> Aren't socket writes supposed to have time outs of some sort? Stupid policies notwithstanding, processes on the client side can disappear for any number of reasons - bugs, power failures, whatever - and this is not something that is supposed to cause a backend to hang, I would assume.
>

Note that you're not in the PostgreSQL code at the point where this is
stuck at--you're deep in the libc socket code. Making sure that sockets
will always have well behaved behavior at the OS level is not always
possible, due to the TPC/IP's emphasis on robust delivery. See section
2.8 "Why does it take so long to detect that the peer died?" at
http://www.faqs.org/faqs/unix-faq/socket/ for some background here, and
note that the point you're stuck in is inside of keepalive handling in
the database trying to do the right thing here.

As a general commentary on this area, in most cases where I've seen an
unkillable backend, which usually becomes noticed when the server won't
shutdown, have resulted from bad socket behavior. It's really a tricky
area to get right, and presuming the database backends will be robust in
the case of every possible weird OS behavior is hard to guarantee.

However, if you can repeatably get the server into this bad state at
will, it may be worth spending some more time digging into this in hopes
there is something valuable to learn about your situation that can
improve the keepalive handling on the server side. Did you mention your
PostgreSQL server version and platform? I didn't see the exact code
path you're stuck in during a quick look at the code involved (using a
snapshot of recent development), which makes me wonder if this isn't
already a resolved problem in a newer version.

--
Greg Smith 2ndQuadrant US Baltimore, MD
PostgreSQL Training, Services and Support
greg(at)2ndQuadrant(dot)com www.2ndQuadrant.us

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Stuart McGraw 2010-03-17 18:27:50 building a c function
Previous Message Tom Lane 2010-03-17 14:50:44 Re: stopping processes, preventing connections