Re: libpq SSL with non-blocking sockets

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Martin Pihlak <martin(dot)pihlak(at)gmail(dot)com>, Steve Singer <ssinger_pg(at)sympatico(dot)ca>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: libpq SSL with non-blocking sockets
Date: 2011-07-15 21:46:49
Message-ID: 10265.1310766409@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Fri, Jul 8, 2011 at 9:36 AM, Martin Pihlak <martin(dot)pihlak(at)gmail(dot)com> wrote:
>> On 07/03/2011 05:08 AM, Steve Singer wrote:
>>> Since the original patch was submitted as a WIP patch and this version
>>> wasn't sent until well into the commit fest I am not sure if it
>>> qualifies for a committer during this commitfest or if it needs to wait
>>> until the next one.

>> If possible I would like the fix to be backported as well. This is
>> quite a nasty bug and difficult to track down. Especially as the
>> actual SSL error messages are masked by the "server closed the
>> connection unexpectedly" message.

> I would not be inclined to back-patch this straight away. I agree
> it's an important fix, but I am a little worried about the chances of
> breaking something else... then again, I don't have the only vote
> here.

I reviewed this patch a bit. I agree that we have a bug here that
should be fixed and backported, but I don't think this patch is ready.
Some problems:

1. The behavior under low-memory conditions is pretty intolerable.
As coded, a malloc failure here:

+ conn->outBufSize = Max(16 * 1024, remaining);
+ conn->outBuffer = malloc(conn->outBufSize);
+ if (conn->outBuffer == NULL)
+ {
+ printfPQExpBuffer(&conn->errorMessage,
+ "cannot allocate memory for output buffer\n");
+ return -1;
+ }

leaves libpq's internal state completely corrupt --- outBuffer is null,
which will result in instant coredump on any future access, but that's
just the tip of the iceberg because we've also lost buffered data and
messed up the none-too-clearly-defined-anyway state of which pending
data is in which buffer.

In general, I don't think it's a smart idea to proceed by duplicating
the buffer contents in this situation, particularly not if the most
obvious way to cause the problem is to have a very large buffer :-(.
I think the direction to move in ought to be to use the existing buffer
as-is, and have pqCheckOutBufferSpace refuse to enlarge the buffer while
we are in "write frozen" state. It should be OK to append data to the
buffer, though, so long as we remember how much we're allowed to pass to
SSL_write when we next try to write.

2. According to the OpenSSL man pages, *both* SSL_read and SSL_write are
defined to need to be re-called with the identical arguments after a
WANT_READ or WANT_WRITE return. This means that pqReadData() is also at
risk for this type of bug. I believe it could only happen in code paths
where we call pqReadData when there is already some data in the buffer:
if the caller then consumes some of that data, the next call to
pqReadData would try to compact the buffer contents before making the
pqsecure_read call. I think we probably need a "read frozen" state in
which we won't enlarge or compact the inBuffer until SSL_read succeeds.

3. As of 9.0, somebody's decided that the backend needs nonblock read
semantics in some cases. I probably should have complained harder about
that before it went in, but since it's in, the backend is at risk for
this same type of issue in secure_read().

I will mark the patch Returned With Feedback. I think that we need to
address all these issues before we consider applying it. If we weren't
hard up against the end of the CommitFest I might have a go at it
myself, but I can't justify the time right now.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Josh Berkus 2011-07-15 21:52:07 Re: Is there a committer in the house?
Previous Message Žiga Kranjec 2011-07-15 21:37:54 Mysterious server crashes