Re: Escaping from blocked send() reprised.

From: Noah Misch <noah(at)leadboat(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, robertmhaas(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Escaping from blocked send() reprised.
Date: 2015-01-12 00:37:53
Message-ID: 20150112003753.GB2722746@tornado.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jan 12, 2015 at 12:40:50AM +0100, Andres Freund wrote:
> On 2015-01-11 16:36:07 -0500, Noah Misch wrote:
> > On Sat, Jan 10, 2015 at 03:25:42AM +0100, Andres Freund wrote:
> > > 0001-Allow-latches-to-wait-for-socket-writability-without.patch
> > > Imo pretty close to commit and can be committed independently.
> >
> > The key open question is whether all platforms of interest can reliably detect
> > end-of-file when poll()ing or select()ing for write only. Older GNU/Linux
> > select() cannot; see attached test program.
>
> Yuck. By my reading that's a violation of posix.

Agreed.

> I did test it a bit, and I didn't see problems, but that obviously
> doesn't say much about old versions.
>
> Afaics we interestingly don't have any poll-less buildfarm animals that
> use unix_latch.c...

More likely is that some system will have a poll() exhibiting the same bug,
possibly via poll() being a wrapper around select(). Systems without poll()
are hard to find; the gnulib manual lists only Windows, BeOS and HP NonStop
OS. HP NonStop OS is the one possibly of interest here. I have never
personally seen a machine running it.

I recommend either (a) taking no action or (b) adding a regression test
verifying WaitLatchOrSocket() conformance in this scenario. Then we can
decide what more to do if failure evidence emerges.

> > We use poll() there anyway, so the bug in that configuration does not
> > affect PostgreSQL. Is it a bellwether of similar bugs in other
> > implementations, bugs that will affect PostgreSQL?
>
> Hm. I can think of two stopgap measures we could add:
>
> 1) If we're using select() and WL_SOCKET_WRITEABLE is set without
> _READABLE, add a timeout of Min(1s, Max(passed_timeout, 1s)). As the
> time spent waiting only for writable normally shouldn't be very long,
> that shouldn't be noticeably bad for power usage.
> 2) Add a SIGPIPE handler that just does a SetLatch(MyLach).

I'm having trouble visualizing those proposed measures in detail, but I trust
that a decent workaround would emerge.

> > > + if (pfds[0].revents & (POLLHUP | POLLERR | POLLNVAL))
> > > + {
> > > + /* EOF/error condition */
> > > + if (wakeEvents & WL_SOCKET_READABLE)
> > > + result |= WL_SOCKET_READABLE;
> > > + if (wakeEvents & WL_SOCKET_WRITEABLE)
> > > + result |= WL_SOCKET_WRITEABLE;
> > > + }
> >
> > With some poll() implementations (e.g. OS X), this can wrongly report
> > WL_SOCKET_WRITEABLE if the peer used shutdown(SHUT_WR). I tentatively think
> > that's acceptable. libpq does not use shutdown(), and other client interfaces
> > would do so at their own risk. Should we worry about hostile clients creating
> > a denial-of-service by causing a server send() to block unexpectedly?
> > Probably not; a user able to send arbitrary TCP traffic to the postmaster port
> > can already achieve that.
>
> Yea, this doesn't seem particularly concerning.
>
> a) They can just stop consuming writes and use noticeable amounts of
> memory by doing output intensive queries. That uses significant os
> resources and is much harder to detect - today.

If there's anything to worry about here (unlikely), it would be with respect
to not-yet-authenticated connections only.

> b) does accepting WL_SOCKET_WRITEABLE without _READABLE change anything
> here? We already allow _WRITABLE...

Today's code translates POLLHUP to WL_SOCKET_READABLE; it must see POLLOUT to
set WL_SOCKET_WRITEABLE. Your patch changes that.

> What happens if you write/send() in
> that state, btw?

write() reports EAGAIN.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2015-01-12 00:45:41 Re: Escaping from blocked send() reprised.
Previous Message Andres Freund 2015-01-12 00:37:28 Re: Using 128-bit integers for sum, avg and statistics aggregates