Quick Links

Re: Escaping from blocked send() reprised.

Lists:	pgsql-hackers

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	Escaping from blocked send() reprised.
Date:	2014-06-30 08:13:47
Message-ID:	20140630.171347.220032486.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hello, I have received inquiries related to blocked communication
several times for these weeks with different symptoms. Then I
found this message from archive,

http://postgresql.1045698.n5.nabble.com/Escaping-a-blocked-sendto-syscall-without-causing-a-restart-td5740855.html

> Subject: Escaping a blocked sendto() syscall without causing a restart

Mr. Tom Lane gave a comment replying it,

> Offhand it looks to me like most signals would kick the backend off the
> send() call ... but it would loop right back and try again. See
> internal_flush() in pqcomm.c. (If you're using SSL, this diagnosis
> may or may not apply.)
>
> We can't do anything except repeat the send attempt if the client
> connection is to be kept in a sane state.
(snipped)
> And I'm not at all sure if we could get it to work in SSL mode...

That's true for timeouts that should continue the connection,
say, statement_timeout, but focusing on intentional backend
termination, I think it does no harm to break it up abruptly,
even if it was on SSL. On the other hand it seems still
preferable to keep a connection when not blocked. The following
expression would detects such a blocking state at just before
next send(2) after the previous try exited by signals.

(ProcDiePending && select(1, NULL, fd, NULL, '1 sec') == 0)

Finally, pg_terminate_backend() works even when send is blocked
for both SSL and non-SSL connections after 1 second delay with
this patch (break_socket_blocking_on_termination_v1.patch).

Nevetheless, of course statement_timeout cannot become effective
by this method since it breaks the consistency in the client
protocol. It needs change in client protocol to have "out of
band" mechanism or something, maybe.

Any suggestions?

Attached patches are:

- break_socket_blocking_on_termination_v1.patch : The patch to
break blocked state of send(2) for pg_terminate_backend().

- socket_block_test.patch : debug printing and changing send
buffer of libpq for reproducing the blocked situation.

Some point of discussion follows,

==== Discussion about the appropriateness of looking into
ProcDiePending there and calling CHECK_FOR_INTERRUPTS()
seeing it.

I have somewhat uneasiness of these things, but what we can at
most seems to be replacing ProcDiePending here with some another
variable, say ImmediatelyExitFromBlockedState, and somehow go
upstairs through normal return path. Additional Try-Catch seems
can do that but it looks no benefit for the added complexity..

==== Discussion on breaking up connetion especially for SSL

Breaking an SSL connection up in my_sock_write() cause the
following message on client side if it still lives and resumes to
receive from the connection, this seems to show that the client
handles the event properly.

| SSL SYSCALL error: EOF detected
| The connection to the server was lost. Attempting reset: Succeeded.

==== Discussion on reliability of select(2)

This method is not a perfect solution, since the select(2)
sometimes gives a wrong oracle about wheather the follwing
send(2) will be blocked.

Even so, as far as I see, select(2) just after exiting from
blocked send(2) by signal seems always says 'write will be
blocked', so what this patch does seems to save most cases except
when the any amount of socket buffer is vacated just before the
following select. The second chance to exit from blocked send(2)
won't come after this(, before one more pg_terminate_backend() ?).

Removing the select(2) from the condition (that is,
CHECK_FOR_INTERRUPTS() is called always ProcDiePending is true)
prevents such a possibility, in exchange for killing 'really
live' connection but IMHO it's no problem on intentional server
termination.

More reliable measure for this would be non-blocking IO but it
changes more of the code.

==== Reproducing the situation.

Another possible question would be about the possibility of such
blocking, or how to reproduce the situation. I found that send(2)
on CentOS6.5 somehow sends successfully, for most cases, the
remaining data at the retry after exiting by signal during being
blocked with buffer full, in spite of no change in environment.

So reproducing the stucked situation is rather difficult on the
server as is. But such situation would be reproduced quite easily
with some cheat, that is, enlarging PQ send buffer, say by ten
times.

Applying the attached testing patch (socket_block_test.patch),
the following steps will make the stucked situation.

1. Do a select which returns big result and enter Ctrl-Z just
after invoking.

cl> $ psql -h localhost postgres
cl> postgres=# select 1 from generate_series(0, 9999999);
cl> ^Z
cl> [4]+ Stopped psql -h localhost postgres

2. Watch the server to stuck.

The server starts to print lines like following to console
after a while, then stops. The number enclosed by the square
brackets is PID of the server.

sv> #### [8809] (bare) rest = 0 / 81920 bytes, ProcDiePending = 0

3. Do pg_terminate_backend().

cl> $ psql postgres -c "select pg_terminate_backend(8809)"

The server will stuck like follows, PID=8811 is the another
session made by the command just above.

sv> #### [8809] (bare) rest = 0 / 81920 bytes, ProcDiePending = 0
sv> #### [8811] (bare) rest = 0 / 327 bytes, ProcDiePending = 0
sv> #### [8809] (bare) rest = 500 / 81920 bytes, ProcDiePending = 1
sv> #### [8811] (bare) rest = 0 / 78 bytes, ProcDiePending = 0

The server 8809 is blocked during sending the remaining 500
bytes and won't come back forever except for SIGKILL, or
possible data reading on the client (fg does it).

cl> $ fg

sv> #### [8809] (bare) rest = 0 / 500 bytes, ProcDiePending = 1
sv> FATAL: terminating connection due to administrator command
sv> STATEMENT: select 1 from generate_series(0, 9999999);
sv> #### [8809] (bare) rest = 0 / 116 bytes, ProcDiePending = 0
sv> #### [8883] (bare) rest = 0 / 327 bytes, ProcDiePending = 0

If you don't see the situation to occur, changing the value of
select clause (by its length, not by value:) would be
effective, or entering Ctrl-Z after some debug output also
would be effective.

For SSL connections, the debug output looks like the following,

sv> #### [20064] (bare) rest = 0 / 81920 bytes, ProcDiePending = 0
sv> #### [20064] (SSL) rest = 0 / 16413 bytes, ProcDiePending = 0
sv> #### [20064] (SSL) rest = 0 / 16413 bytes, ProcDiePending = 0
sv> #### [20064] (SSL) rest = 0 / 16413 bytes, ProcDiePending = 0
sv> #### [20064] (SSL) rest = 980 / 16413 bytes, ProcDiePending = 1
sv> #### [20064] (SSL) rest = 0 / 980 bytes, ProcDiePending = 1
sv> #### [20064] (SSL) rest = 1029 / 16413 bytes, ProcDiePending = 1

"bare" here in turn means the status of SSL_write and "SSL"
means the status of the underlying 'bare' socket of SSL
connection. (Sorry for the confising labelings..)

The (bare) line above is not corresponding to the following
bunch of (SSL) lines, but its precedents. At the fifth line,
send(2) exits by signal issued by pg_teminate_backend() then
retry (somehow) successfully at sixth line but SSL layer gave
another 16413 bytes and only 1029 bytes of that is sent by the
first try and the server stucked at the second try for the
seventh line. The control doesn't come back to secure_write()
during this sequence.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment	Content-Type	Size
break_socketblocking_on_termination_v1.patch	text/x-patch	2.0 KB
socket_block_test.patch	text/x-patch	1.3 KB

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc:	"pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Escaping from blocked send() reprised.
Date:	2014-06-30 15:27:47
Message-ID:	CA+TgmoZfcGzAEmtbyoCe6VdHnq085x+ox752zuJ2AKN=Wc8PnQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Jun 30, 2014 at 4:13 AM, Kyotaro HORIGUCHI
<horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> Hello, I have received inquiries related to blocked communication
> several times for these weeks with different symptoms. Then I
> found this message from archive,
>
> http://postgresql.1045698.n5.nabble.com/Escaping-a-blocked-sendto-syscall-without-causing-a-restart-td5740855.html
>
>> Subject: Escaping a blocked sendto() syscall without causing a restart
>
> Mr. Tom Lane gave a comment replying it,
>
>> Offhand it looks to me like most signals would kick the backend off the
>> send() call ... but it would loop right back and try again. See
>> internal_flush() in pqcomm.c. (If you're using SSL, this diagnosis
>> may or may not apply.)
>>
>> We can't do anything except repeat the send attempt if the client
>> connection is to be kept in a sane state.
> (snipped)
>> And I'm not at all sure if we could get it to work in SSL mode...
>
> That's true for timeouts that should continue the connection,
> say, statement_timeout, but focusing on intentional backend
> termination, I think it does no harm to break it up abruptly,
> even if it was on SSL. On the other hand it seems still
> preferable to keep a connection when not blocked. The following
> expression would detects such a blocking state at just before
> next send(2) after the previous try exited by signals.
>
> (ProcDiePending && select(1, NULL, fd, NULL, '1 sec') == 0)
>
> Finally, pg_terminate_backend() works even when send is blocked
> for both SSL and non-SSL connections after 1 second delay with
> this patch (break_socket_blocking_on_termination_v1.patch).
>
> Nevetheless, of course statement_timeout cannot become effective
> by this method since it breaks the consistency in the client
> protocol. It needs change in client protocol to have "out of
> band" mechanism or something, maybe.
>
> Any suggestions?

You should probably add your patch here, so it doesn't get forgotten about:

https://commitfest.postgresql.org/action/commitfest_view/open

We're focused on reviewing patches for the current CommitFest, so your
patch might not get attention right away. A couple of general
thoughts on this topic:

1. I think it's the case that there are platforms around where a
signal won't cause send() to return EINTR.... and I'd be entirely
unsurprised if SSL_write() doesn't necessarily return EINTR in that
case. I'm not sure what, if anything, we can do about that.

2. I think it would be reasonable to try to kill off the connection
without notifying the client if we're unable to send the data to the
client in a reasonable period of time. But I'm unsure what "a
reasonable period of time" means. This patch would basically do it
after no delay at all, which seems like it might be too aggressive.
However, I'm not sure.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	robertmhaas(at)gmail(dot)com
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Escaping from blocked send() reprised.
Date:	2014-07-01 03:26:43
Message-ID:	20140701.122643.156649402.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hello, The replies follow are mainly as a memo for myself so
please don't be bothered to answer until the time comes.

At Mon, 30 Jun 2014 11:27:47 -0400, Robert Haas <robertmhaas(at)gmail(dot)com> wrote in <CA+TgmoZfcGzAEmtbyoCe6VdHnq085x+ox752zuJ2AKN=Wc8PnQ(at)mail(dot)gmail(dot)com>
> You should probably add your patch here, so it doesn't get forgotten about:
>
> https://commitfest.postgresql.org/action/commitfest_view/open

Ok, I'll put this there.

> We're focused on reviewing patches for the current CommitFest, so your
> patch might not get attention right away. A couple of general
> thoughts on this topic:

Thank you for suggestions. I'll consider on them.

> 1. I think it's the case that there are platforms around where a
> signal won't cause send() to return EINTR.... and I'd be entirely
> unsurprised if SSL_write() doesn't necessarily return EINTR in that
> case. I'm not sure what, if anything, we can do about that.

man 2 send on FreeBSD has not description about EINTR.. And even
on linux, send won't return EINTR for most cases, at least I
haven't seen that. So send()=-1,EINTR seems to me as only an
equivalent of send() = 0. I have no idea about what the
implementer thought the difference is.

> 2. I think it would be reasonable to try to kill off the connection
> without notifying the client if we're unable to send the data to the
> client in a reasonable period of time. But I'm unsure what "a
> reasonable period of time" means. This patch would basically do it
> after no delay at all, which seems like it might be too aggressive.
> However, I'm not sure.

I think there's no such a reasonable time. The behavior might
should be determined from another point.. On alternative would be
let pg_terminate_backend() have a parameter instructing force
shutodwn (how to propagate it?), or make a forced shutdown on
duplicate invocation of pg_terminate_backend().

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc:	"pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Escaping from blocked send() reprised.
Date:	2014-07-01 15:20:00
Message-ID:	CA+TgmobDRRCa8+zopatCemBaY+MO-Ov0t4jgQTu0r9-ZCBV02A@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Jun 30, 2014 at 11:26 PM, Kyotaro HORIGUCHI
<horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>> 2. I think it would be reasonable to try to kill off the connection
>> without notifying the client if we're unable to send the data to the
>> client in a reasonable period of time. But I'm unsure what "a
>> reasonable period of time" means. This patch would basically do it
>> after no delay at all, which seems like it might be too aggressive.
>> However, I'm not sure.
>
> I think there's no such a reasonable time. The behavior might
> should be determined from another point.. On alternative would be
> let pg_terminate_backend() have a parameter instructing force
> shutodwn (how to propagate it?), or make a forced shutdown on
> duplicate invocation of pg_terminate_backend().

Well, I think that when people call pg_terminate_backend() just once,
they expect it to kill the target backend. I think people will
tolerate a short delay, like a few seconds; after all, there's no
guarantee, even today, that the backend will hit a
CHECK_FOR_INTERRUPTS() in less than a few hundred milliseconds. But
they are not going to want to have to take a second action to kill the
backend - killing it once should be sufficient.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Martijn van Oosterhout <kleptog(at)svana(dot)org>
To:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc:	robertmhaas(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Escaping from blocked send() reprised.
Date:	2014-07-01 19:21:38
Message-ID:	20140701192138.GB20140@svana.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Jul 01, 2014 at 12:26:43PM +0900, Kyotaro HORIGUCHI wrote:
> > 1. I think it's the case that there are platforms around where a
> > signal won't cause send() to return EINTR.... and I'd be entirely
> > unsurprised if SSL_write() doesn't necessarily return EINTR in that
> > case. I'm not sure what, if anything, we can do about that.
>
> man 2 send on FreeBSD has not description about EINTR.. And even
> on linux, send won't return EINTR for most cases, at least I
> haven't seen that. So send()=-1,EINTR seems to me as only an
> equivalent of send() = 0. I have no idea about what the
> implementer thought the difference is.

Whether send() returns EINTR or not depends on whether the signal has
been marked restartable or not. This is configurable per signal, see
sigaction(). If the signal is marked to restart, the kernel returns
ERESTARTHAND (IIRC) and the libc will redo the call internally.

Default BSD does not return EINTR normally, but supports sigaction().

Have a nice day,
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> He who writes carelessly confesses thereby at the very outset that he does
> not attach much importance to his own thoughts.
-- Arthur Schopenhauer

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	kleptog(at)svana(dot)org
Cc:	robertmhaas(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Escaping from blocked send() reprised.
Date:	2014-07-04 09:45:35
Message-ID:	20140704.184535.120681779.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hello,

At Tue, 1 Jul 2014 21:21:38 +0200, Martijn van Oosterhout <kleptog(at)svana(dot)org> wrote in <20140701192138(dot)GB20140(at)svana(dot)org>
> On Tue, Jul 01, 2014 at 12:26:43PM +0900, Kyotaro HORIGUCHI wrote:
> > > 1. I think it's the case that there are platforms around where a
> > > signal won't cause send() to return EINTR.... and I'd be entirely
> > > unsurprised if SSL_write() doesn't necessarily return EINTR in that
> > > case. I'm not sure what, if anything, we can do about that.
> >
> > man 2 send on FreeBSD has not description about EINTR.. And even
> > on linux, send won't return EINTR for most cases, at least I
> > haven't seen that. So send()=-1,EINTR seems to me as only an
> > equivalent of send() = 0. I have no idea about what the
> > implementer thought the difference is.
>
> Whether send() returns EINTR or not depends on whether the signal has
> been marked restartable or not. This is configurable per signal, see
> sigaction(). If the signal is marked to restart, the kernel returns
> ERESTARTHAND (IIRC) and the libc will redo the call internally.

Wow, thank you for detailed information. I'll study that and take
it into future discussion.

> Default BSD does not return EINTR normally, but supports sigaction().

I guess it is for easiness-to-keep-compatibility, seems
reasonable.

have a nice day,

--
Kyotaro Horiguchi
NTT Open Source Software Center

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	robertmhaas(at)gmail(dot)com
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Escaping from blocked send() reprised.
Date:	2014-07-04 09:54:04
Message-ID:	20140704.185404.135768374.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hello, thank you for keeping this discussion moving.

> > I think there's no such a reasonable time. The behavior might
> > should be determined from another point.. On alternative would be
> > let pg_terminate_backend() have a parameter instructing force
> > shutodwn (how to propagate it?), or make a forced shutdown on
> > duplicate invocation of pg_terminate_backend().
>
> Well, I think that when people call pg_terminate_backend() just once,
> they expect it to kill the target backend. I think people will
> tolerate a short delay, like a few seconds; after all, there's no
> guarantee, even today, that the backend will hit a
> CHECK_FOR_INTERRUPTS() in less than a few hundred milliseconds.

Sure.

> But they are not going to want to have to take a second action
> to kill the backend - killing it once should be sufficient.

Hmm, it sounds persuasive. Well, do you think they tolerate
-force option? (Even though its technical practicality is not
clear)

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

From:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, <robertmhaas(at)gmail(dot)com>
Cc:	<pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Escaping from blocked send() reprised.
Date:	2014-08-25 12:29:49
Message-ID:	53FB2C3D.6030303@vmware.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 07/01/2014 06:26 AM, Kyotaro HORIGUCHI wrote:
> At Mon, 30 Jun 2014 11:27:47 -0400, Robert Haas <robertmhaas(at)gmail(dot)com> wrote in <CA+TgmoZfcGzAEmtbyoCe6VdHnq085x+ox752zuJ2AKN=Wc8PnQ(at)mail(dot)gmail(dot)com>
>> 1. I think it's the case that there are platforms around where a
>> signal won't cause send() to return EINTR.... and I'd be entirely
>> unsurprised if SSL_write() doesn't necessarily return EINTR in that
>> case. I'm not sure what, if anything, we can do about that.

We use a custom "write" routine with SSL_write, where we call send()
ourselves, so that's not a problem as long as we put the check in the
right place (in secure_raw_write(), after my recent SSL refactoring -
the patch needs to be rebased).

> man 2 send on FreeBSD has not description about EINTR.. And even
> on linux, send won't return EINTR for most cases, at least I
> haven't seen that. So send()=-1,EINTR seems to me as only an
> equivalent of send() = 0. I have no idea about what the
> implementer thought the difference is.

As the patch stands, there's a race condition: if the SIGTERM arrives
*before* the send() call, the send() won't return EINTR anyway. So
there's a chance that you still block. Calling pq_terminate_backend()
again will dislodge it (assuming send() returns with EINTR on signal),
but I don't think we want to define the behavior as "usually,
pq_terminate_backend() will kill a backend that's blocked on sending to
the client, but sometimes you have to call it twice (or more!) to really
kill it".

A more robust way is to set ImmediateInterruptOK before calling send().
That wouldn't let you send data that can be sent without blocking
though. For that, you could put the socket to non-blocking mode, and
sleep with select(), also waiting for the process' latch at the same
time (die() sets the latch, so that will wake up the select() if a
termination request arrives).

Is it actually safe to process the die-interrupt where send() is called?
ProcessInterrupts() does "ereport(FATAL, ...)", which will attempt to
send a message to the client. If that happens in the middle of
constructing some other message, that will violate the protocol.

>> 2. I think it would be reasonable to try to kill off the connection
>> without notifying the client if we're unable to send the data to the
>> client in a reasonable period of time. But I'm unsure what "a
>> reasonable period of time" means. This patch would basically do it
>> after no delay at all, which seems like it might be too aggressive.
>> However, I'm not sure.
>
> I think there's no such a reasonable time.

I agree it's pretty hard to define any reasonable timeout here. I think
it would be fine to just cut the connection; even if you don't block
while sending, you'll probably reach a CHECK_FOR_INTERRUPT() somewhere
higher in the stack and kill the connection almost as abruptly anyway.
(you can't violate the protocol, however)

- Heikki

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	hlinnakangas(at)vmware(dot)com
Cc:	robertmhaas(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Escaping from blocked send() reprised.
Date:	2014-08-26 06:17:08
Message-ID:	20140826.151708.233374120.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Sorry, I was absorbed by other tasks..

Thank you for reviewing thiis.

> On 07/01/2014 06:26 AM, Kyotaro HORIGUCHI wrote:
> > At Mon, 30 Jun 2014 11:27:47 -0400, Robert Haas
> > <robertmhaas(at)gmail(dot)com> wrote in
> > <CA+TgmoZfcGzAEmtbyoCe6VdHnq085x+ox752zuJ2AKN=Wc8PnQ(at)mail(dot)gmail(dot)com>
> >> 1. I think it's the case that there are platforms around where a
> >> signal won't cause send() to return EINTR.... and I'd be entirely
> >> unsurprised if SSL_write() doesn't necessarily return EINTR in that
> >> case. I'm not sure what, if anything, we can do about that.
>
> We use a custom "write" routine with SSL_write, where we call send()
> ourselves, so that's not a problem as long as we put the check in the
> right place (in secure_raw_write(), after my recent SSL refactoring -
> the patch needs to be rebased).
>
> > man 2 send on FreeBSD has not description about EINTR.. And even
> > on linux, send won't return EINTR for most cases, at least I
> > haven't seen that. So send()=-1,EINTR seems to me as only an
> > equivalent of send() = 0. I have no idea about what the
> > implementer thought the difference is.
>
> As the patch stands, there's a race condition: if the SIGTERM arrives
> *before* the send() call, the send() won't return EINTR anyway. So
> there's a chance that you still block. Calling pq_terminate_backend()
> again will dislodge it (assuming send() returns with EINTR on signal),

Yes, that window would'nt be extinguished without introducing
something more. EINTR is set only when nothing sent by the
call. So AFAIS the chance of getting EINTR is far small than
expectation.

> but I don't think we want to define the behavior as "usually,
> pq_terminate_backend() will kill a backend that's blocked on sending
> to the client, but sometimes you have to call it twice (or more!) to
> really kill it".

I agree that it is desirable behavior, if any measure to avoid
that. But I think it's better than doing kill -9 engulfing all
innocent backends.

> A more robust way is to set ImmediateInterruptOK before calling
> send(). That wouldn't let you send data that can be sent without
> blocking though. For that, you could put the socket to non-blocking
> mode, and sleep with select(), also waiting for the process' latch at
> the same time (die() sets the latch, so that will wake up the select()
> if a termination request arrives).

I condiered it but select() frequently (rather in most cases when
send() blocks by send buffer exhaustion) fails to predict that
following send() will be blocked. (If my memory is correct.) So
the final problem would be blocked send()...

> Is it actually safe to process the die-interrupt where send() is
> called? ProcessInterrupts() does "ereport(FATAL, ...)", which will
> attempt to send a message to the client. If that happens in the middle
> of constructing some other message, that will violate the protocol.

So I strongly agree to you if select() works as the impression
when reading the man document.

> >> 2. I think it would be reasonable to try to kill off the connection
> >> without notifying the client if we're unable to send the data to the
> >> client in a reasonable period of time. But I'm unsure what "a
> >> reasonable period of time" means. This patch would basically do it
> >> after no delay at all, which seems like it might be too aggressive.
> >> However, I'm not sure.
> >
> > I think there's no such a reasonable time.
>
> I agree it's pretty hard to define any reasonable timeout here. I
> think it would be fine to just cut the connection; even if you don't
> block while sending, you'll probably reach a CHECK_FOR_INTERRUPT()
> somewhere higher in the stack and kill the connection almost as
> abruptly anyway. (you can't violate the protocol, however)

Yes, closing the blocked connection seems one of the most smarter
way, checking the occurred interrupt could avoid protocol
violation. But the problem for that is that there seems no means
to close sockets elsewhere the blocking handle. dup(2)'ed handle
cannot release the resource by only itself.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

From:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc:	<robertmhaas(at)gmail(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Escaping from blocked send() reprised.
Date:	2014-08-26 06:55:28
Message-ID:	53FC2F60.6050907@vmware.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 08/26/2014 09:17 AM, Kyotaro HORIGUCHI wrote:
>> but I don't think we want to define the behavior as "usually,
>> pq_terminate_backend() will kill a backend that's blocked on sending
>> to the client, but sometimes you have to call it twice (or more!) to
>> really kill it".
>
> I agree that it is desirable behavior, if any measure to avoid
> that. But I think it's better than doing kill -9 engulfing all
> innocent backends.
>
>> A more robust way is to set ImmediateInterruptOK before calling
>> send(). That wouldn't let you send data that can be sent without
>> blocking though. For that, you could put the socket to non-blocking
>> mode, and sleep with select(), also waiting for the process' latch at
>> the same time (die() sets the latch, so that will wake up the select()
>> if a termination request arrives).
>
> I condiered it but select() frequently (rather in most cases when
> send() blocks by send buffer exhaustion) fails to predict that
> following send() will be blocked. (If my memory is correct.) So
> the final problem would be blocked send()...

My point was to put the socket in non-blocking mode, so that send() will
return immediately with EAGAIN instead of blocking, if the send buffer
is full. See WalSndWriteData for how that would work, it does something
similar.

>> Is it actually safe to process the die-interrupt where send() is
>> called? ProcessInterrupts() does "ereport(FATAL, ...)", which will
>> attempt to send a message to the client. If that happens in the middle
>> of constructing some other message, that will violate the protocol.
>
> So I strongly agree to you if select() works as the impression
> when reading the man document.

Not sure what you mean, but the above is a fatal problem with the patch
right now, regardless of how you do the sleeping.

>>>> 2. I think it would be reasonable to try to kill off the connection
>>>> without notifying the client if we're unable to send the data to the
>>>> client in a reasonable period of time. But I'm unsure what "a
>>>> reasonable period of time" means. This patch would basically do it
>>>> after no delay at all, which seems like it might be too aggressive.
>>>> However, I'm not sure.
>>>
>>> I think there's no such a reasonable time.
>>
>> I agree it's pretty hard to define any reasonable timeout here. I
>> think it would be fine to just cut the connection; even if you don't
>> block while sending, you'll probably reach a CHECK_FOR_INTERRUPT()
>> somewhere higher in the stack and kill the connection almost as
>> abruptly anyway. (you can't violate the protocol, however)
>
> Yes, closing the blocked connection seems one of the most smarter
> way, checking the occurred interrupt could avoid protocol
> violation. But the problem for that is that there seems no means
> to close sockets elsewhere the blocking handle. dup(2)'ed handle
> cannot release the resource by only itself.

I didn't understand that, surely you can just close() the socket? There
is no dup(2) involved. And we don't necessarily need to close the
socket, we just need to avoid writing to it when we're already in the
middle of sending a message.

I'm marking this as Waiting on Author in the commitfest app, because:
1. the protocol violation needs to be avoided one way or another, and
2. the behavior needs to be consistent so that a single
pg_terminate_backend() is enough to always kill the connection.

- Heikki

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	hlinnakangas(at)vmware(dot)com
Cc:	robertmhaas(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Escaping from blocked send() reprised.
Date:	2014-08-26 07:11:31
Message-ID:	20140826.161131.252087554.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hello,

> > I condiered it but select() frequently (rather in most cases when
> > send() blocks by send buffer exhaustion) fails to predict that
> > following send() will be blocked. (If my memory is correct.) So
> > the final problem would be blocked send()...
>
> My point was to put the socket in non-blocking mode, so that send()
> will return immediately with EAGAIN instead of blocking, if the send
> buffer is full. See WalSndWriteData for how that would work, it does
> something similar.

I confused it with what I did during writing this patch. select()
- blocking send(). Sorry for confusing the discussion. I
understand correctly what you mean and It sounds reasonable.

> >> I agree it's pretty hard to define any reasonable timeout here. I
> >> think it would be fine to just cut the connection; even if you don't
> >> block while sending, you'll probably reach a CHECK_FOR_INTERRUPT()
> >> somewhere higher in the stack and kill the connection almost as
> >> abruptly anyway. (you can't violate the protocol, however)
> >
> > Yes, closing the blocked connection seems one of the most smarter
> > way, checking the occurred interrupt could avoid protocol
> > violation. But the problem for that is that there seems no means
> > to close sockets elsewhere the blocking handle. dup(2)'ed handle
> > cannot release the resource by only itself.
>
> I didn't understand that, surely you can just close() the socket?
> There is no dup(2) involved. And we don't necessarily need to close
> the socket, we just need to avoid writing to it when we're already in
> the middle of sending a message.

My assumption there was select() and *blocking* send(). So the
sentence cited is out of point from the first.

> I'm marking this as Waiting on Author in the commitfest app, because:
> 1. the protocol violation needs to be avoided one way or another, and
> 2. the behavior needs to be consistent so that a single
> pg_terminate_backend() is enough to always kill the connection.

Thank you for the suggestion. I think I can go forward with that
and will come up with new patch.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	hlinnakangas(at)vmware(dot)com
Cc:	robertmhaas(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Escaping from blocked send() reprised.
Date:	2014-08-28 12:47:04
Message-ID:	20140828.214704.93968088.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hello, sorry for the dazed reply in the previous mail.

I made revised patch for this issue.

Attached patches are following,

- 0001_Revise_socket_emulation_for_win32_backend.patch

Revises socket emulation on win32 backend so that each socket
can have its own blocking mode state.

- 0002_Allow_backend_termination_during_write_blocking.patch

The patch to solve the issue. This patch depends on the 0001_
patch.

==========

> > I'm marking this as Waiting on Author in the commitfest app, because:
> > 1. the protocol violation needs to be avoided one way or another, and
> > 2. the behavior needs to be consistent so that a single
> > pg_terminate_backend() is enough to always kill the connection.

- Preventing protocol violation.

To prevent protocol violation, secure_write sets
ClientConnectionLost when SIGTERM detected, then
internal_flush() and ProcessInterrupts() follow the
instruction.

- Single pg_terminate_backend surely kills the backend.

secure_raw_write() uses non-blocking socket and a loop of
select() with timeout to surely detects received
signal(SIGTERM).

To avoid frequent switching of blocking mode, the bare socket
for Port is put to non-blocking mode from the first in
StreamConnection() and blocking mode is controlled only by
Port->noblock in secure_raw_read/write().

To make the code mentioned above (Patch 0002) tidy, rewrite the
socket emulation code for win32 backends so that each socket
can have its own non-blocking state. (patch 0001)

Some concern about this patch,

- This patch allows the number of non-blocking socket to be below
64 (FD_SETSIZE) on win32 backend but it seems to be sufficient.

- This patch introduced redundant socket emulation for win32
backend but win32 bare socket for Port is already nonblocking
as described so it donsn't seem to be a serious problem on
performance. Addition to it, since I don't know the reason why
win32/socket.c provides the blocking-mode socket emulation, I
decided to preserve win32/socket.c to have blocking socket
emulation. Possibly it can be removed.

Any suggestions?

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment	Content-Type	Size
0001_Revise_socket_emulation_for_win32_backend.patch	text/x-patch	6.9 KB
0002_Allow_backend_termination_during_write_blocking.patch	text/x-patch	8.5 KB

From:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc:	<robertmhaas(at)gmail(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Escaping from blocked send() reprised.
Date:	2014-09-02 18:22:29
Message-ID:	54060AE5.5020502@vmware.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 08/28/2014 03:47 PM, Kyotaro HORIGUCHI wrote:
> To make the code mentioned above (Patch 0002) tidy, rewrite the
> socket emulation code for win32 backends so that each socket
> can have its own non-blocking state. (patch 0001)

The first patch that makes non-blocking sockets behave more sanely on
Windows seems like a good idea, independently of the second patch. I'm
looking at the first patch now, I'll make a separate post about the
second patch.

> Some concern about this patch,
>
> - This patch allows the number of non-blocking socket to be below
> 64 (FD_SETSIZE) on win32 backend but it seems to be sufficient.

Yeah, that's plenty.

> - This patch introduced redundant socket emulation for win32
> backend but win32 bare socket for Port is already nonblocking
> as described so it donsn't seem to be a serious problem on
> performance. Addition to it, since I don't know the reason why
> win32/socket.c provides the blocking-mode socket emulation, I
> decided to preserve win32/socket.c to have blocking socket
> emulation. Possibly it can be removed.

On Windows, the backend has an emulation layer for POSIX signals, which
uses threads and Windows events. The reason win32/socket.c always uses
non-blocking mode internally is that it needs to wait for the socket to
become readable/writeable, and for the signal-emulation event, at the
same time. So no, we can't remove it.

The approach taken in the first patch seems sensible. I changed it to
not use FD_SET, though. A custom array seems better, that way we don't
need the pgwin32_nonblockset_init() call, we can just use initialize the
variable. It's a little bit more code, but it's well-contained in
win32/socket.c. Please take a look, to double-check that I didn't screw up.

- Heikki

Attachment	Content-Type	Size
improve-nonblocking-sockets-on-windows-1.patch	text/x-diff	7.9 KB

From:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc:	<robertmhaas(at)gmail(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Escaping from blocked send() reprised.
Date:	2014-09-02 18:46:29
Message-ID:	54061085.3050900@vmware.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 08/28/2014 03:47 PM, Kyotaro HORIGUCHI wrote:
> - Preventing protocol violation.
>
> To prevent protocol violation, secure_write sets
> ClientConnectionLost when SIGTERM detected, then
> internal_flush() and ProcessInterrupts() follow the
> instruction.

Oh, hang on. Now that I look at pqcomm.c more closely, it already has a
mechanism to avoid writing a message in the middle of another message.
See pq_putmessage and PqCommBusy. Can we rely on that?

> - Single pg_terminate_backend surely kills the backend.
>
> secure_raw_write() uses non-blocking socket and a loop of
> select() with timeout to surely detects received
> signal(SIGTERM).

I was going to suggest using WaitLatchOrSocket instead of sleeping in 1
second increment, but I see that WaitLatchOrSocket() doesn't currently
support waiting for a socket to become writeable, without also waiting
for it to become readable. I wonder how difficult it would be to lift
that restriction.

I also wonder if it would be simpler to keep the socket in blocking mode
after all, and just close() in the signal handler if PqCommBusy == true.
If the signal arrives while sleeping in send(), the effect would be the
same as with your patch. If the signal arrives while sending, but not
sleeping, you would not get the chance to send the already-buffered data
to the client. But maybe that's OK, whether or not you're blocked is not
very deterministic anyway.

- Heikki

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, robertmhaas(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Escaping from blocked send() reprised.
Date:	2014-09-02 19:01:44
Message-ID:	20140902190144.GC28635@awork2.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2014-09-02 21:46:29 +0300, Heikki Linnakangas wrote:
> I was going to suggest using WaitLatchOrSocket instead of sleeping in 1
> second increment, but I see that WaitLatchOrSocket() doesn't currently
> support waiting for a socket to become writeable, without also waiting for
> it to become readable. I wonder how difficult it would be to lift that
> restriction.

It's imo not that difficult. I've a prototype patch for that
somewhere. I tested my poll() implementation and it worked, but didn't
yet get to select().

> I also wonder if it would be simpler to keep the socket in blocking mode
> after all, and just close() in the signal handler if PqCommBusy == true. If
> the signal arrives while sleeping in send(), the effect would be the same as
> with your patch. If the signal arrives while sending, but not sleeping, you
> would not get the chance to send the already-buffered data to the client.
> But maybe that's OK, whether or not you're blocked is not very deterministic
> anyway.

I've actually been working on a patch to make the whole interaction with
the client using sockets. The reason I started so is that we lots of far
to complex stuff in signal handlers, and using a latch would allow us to
instead interrupt send/recv. While still heavily WIP the reduction in
odd stuff (check e.g. HandleCatchupInterrupt()) made me rather happy.

I'm slightly worried about the added overhead due to the latch code. In
my implementation I only use latches after a nonblocking read, but
still. Every WaitLatchOrSocket() does a drainSelfPipe(). I wonder if
that can be made problematic.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, robertmhaas(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Escaping from blocked send() reprised.
Date:	2014-09-02 19:05:17
Message-ID:	20140902190517.GD28635@awork2.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2014-09-02 21:01:44 +0200, Andres Freund wrote:
> I've actually been working on a patch to make the whole interaction with
> the client using sockets. The reason I started so is that we lots of far
> to complex stuff in signal handlers, and using a latch would allow us to
> instead interrupt send/recv. While still heavily WIP the reduction in
> odd stuff (check e.g. HandleCatchupInterrupt()) made me rather happy.

Actually, the even more important reason is that that would allow us to
throw errors/fatals sanely in interrupts because we wouldn't possibly
jump through openssl anymore...

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, robertmhaas(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Escaping from blocked send() reprised.
Date:	2014-09-02 21:21:03
Message-ID:	526.1409692863@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> writes:
> I was going to suggest using WaitLatchOrSocket instead of sleeping in 1
> second increment, but I see that WaitLatchOrSocket() doesn't currently
> support waiting for a socket to become writeable, without also waiting
> for it to become readable. I wonder how difficult it would be to lift
> that restriction.

My recollection is that there was a reason for that, but I don't recall
details any more.

regards, tom lane

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, robertmhaas(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Escaping from blocked send() reprised.
Date:	2014-09-02 21:23:58
Message-ID:	20140902212358.GF28635@awork2.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2014-09-02 17:21:03 -0400, Tom Lane wrote:
> Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> writes:
> > I was going to suggest using WaitLatchOrSocket instead of sleeping in 1
> > second increment, but I see that WaitLatchOrSocket() doesn't currently
> > support waiting for a socket to become writeable, without also waiting
> > for it to become readable. I wonder how difficult it would be to lift
> > that restriction.
>
> My recollection is that there was a reason for that, but I don't recall
> details any more.

http://git.postgresql.org/pg/commitdiff/e42a21b9e6c9b9e6346a34b62628d48ff2fc6ddf

In my prototype I've changed the API that errors set both
READABLE/WRITABLE. Seems to work....

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, <robertmhaas(at)gmail(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Escaping from blocked send() reprised.
Date:	2014-09-03 12:09:54
Message-ID:	54070512.9010807@vmware.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 09/03/2014 12:23 AM, Andres Freund wrote:
> On 2014-09-02 17:21:03 -0400, Tom Lane wrote:
>> Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> writes:
>>> I was going to suggest using WaitLatchOrSocket instead of sleeping in 1
>>> second increment, but I see that WaitLatchOrSocket() doesn't currently
>>> support waiting for a socket to become writeable, without also waiting
>>> for it to become readable. I wonder how difficult it would be to lift
>>> that restriction.
>>
>> My recollection is that there was a reason for that, but I don't recall
>> details any more.
>
> http://git.postgresql.org/pg/commitdiff/e42a21b9e6c9b9e6346a34b62628d48ff2fc6ddf
>
> In my prototype I've changed the API that errors set both
> READABLE/WRITABLE. Seems to work....

Andres, would you mind posting the WIP patch you have? That could be a
better foundation for this patch.

- Heikki

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Escaping from blocked send() reprised.
Date:	2014-09-04 12:49:22
Message-ID:	CA+Tgmoau7LkQ3C3_7B4Z6eiWREZpvBzvHDA1oue976UB-AoBpw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Sep 2, 2014 at 3:01 PM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> I'm slightly worried about the added overhead due to the latch code. In
> my implementation I only use latches after a nonblocking read, but
> still. Every WaitLatchOrSocket() does a drainSelfPipe(). I wonder if
> that can be made problematic.

I think that's not the word you're looking for. Or if it is, then -
it's already problematic. At some point I hacked up a very crude
prototype that made LWLocks use latches to sleep instead of
semaphores. It was slow.

AIUI, the only reason why we need the self-pipe thing is because on
some platforms signals don't interrupt system calls. But my
impression was that those platforms were somewhat obscure. Could we
have a separate latch implementation for platforms where we know that
system calls will get interrupted by signals? Alternatively, should
we consider reimplementing latches using semaphores? I assume having
the signal handler up the semaphore would allow the attempt to down
the semaphore to succeed on return from the handler, so it would
accomplish the same thing as the self-pipe trick.

Basically, it doesn't feel like a good thing that we've got two sets
of primitives for making a backend wait that (1) don't really know
about each other and (2) use different operating system primitives.
Presumably one of the two systems is better; let's figure out which
one it is, use that one all the time, and get rid of the other one.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Escaping from blocked send() reprised.
Date:	2014-09-04 13:05:16
Message-ID:	5408638C.1080308@vmware.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 09/04/2014 03:49 PM, Robert Haas wrote:
> On Tue, Sep 2, 2014 at 3:01 PM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
>> I'm slightly worried about the added overhead due to the latch code. In
>> my implementation I only use latches after a nonblocking read, but
>> still. Every WaitLatchOrSocket() does a drainSelfPipe(). I wonder if
>> that can be made problematic.
>
> I think that's not the word you're looking for. Or if it is, then -
> it's already problematic. At some point I hacked up a very crude
> prototype that made LWLocks use latches to sleep instead of
> semaphores. It was slow.

Hmm. Perhaps we should call drainSelfPipe() only after poll/select
returns saying that there is something in the self-pipe. That would be a
win assuming it's more common for the self-pipe to be empty.

> AIUI, the only reason why we need the self-pipe thing is because on
> some platforms signals don't interrupt system calls.

That's not the only reason. It also eliminates the race condition that
someone might set the latch after we've checked that it's not set, but
before calling poll/select. The same reason that ppoll and pselect exist.

> But my
> impression was that those platforms were somewhat obscure. Could we
> have a separate latch implementation for platforms where we know that
> system calls will get interrupted by signals?

... and have ppoll or pselect. Yeah, seems reasonable, assuming that
ppoll/pselect is faster.

> Alternatively, should
> we consider reimplementing latches using semaphores? I assume having
> the signal handler up the semaphore would allow the attempt to down
> the semaphore to succeed on return from the handler, so it would
> accomplish the same thing as the self-pipe trick.

I don't think there's a function to wait for a file descriptor or
semaphore at the same time.

- Heikki

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc:	Andres Freund <andres(at)2ndquadrant(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Escaping from blocked send() reprised.
Date:	2014-09-04 13:37:12
Message-ID:	CA+TgmobnF3TG_Vdm_mzZ=xXoRP=3rEgixCriKxWAsSOGP2AhLw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Sep 4, 2014 at 9:05 AM, Heikki Linnakangas
<hlinnakangas(at)vmware(dot)com> wrote:
> Hmm. Perhaps we should call drainSelfPipe() only after poll/select returns
> saying that there is something in the self-pipe. That would be a win
> assuming it's more common for the self-pipe to be empty.

Couldn't hurt.

>> But my
>> impression was that those platforms were somewhat obscure. Could we
>> have a separate latch implementation for platforms where we know that
>> system calls will get interrupted by signals?
>
> ... and have ppoll or pselect. Yeah, seems reasonable, assuming that
> ppoll/pselect is faster.

Hrm. So we'd have to block SIGUSR1, check the flag, then use
pselect() to temporarily unblock SIGUSR1 and wait, then on return
again unblock SIGUSR1? Doesn't seem very appealing. I think changing
the signal mask is fast on Linux, but quite slow on at least some
other UNIX-like platforms. And I've heard that pselect() isn't always
truly atomic, so we might run into platform-specific bugs, too. I
wonder if there's a better way e.g. using memory barriers.

WaitLatch: check is_set. if yes then done. otherwise, set signal_me.
memory barrier. recheck is_set. if not set then wait using
poll/select. memory barrier. clear signal_me.
SetLatch: check is_set. if yes then done. otherwise, set is_set.
memory barrier. check signal_me. if set, then send SIGUSR1.

>> Alternatively, should
>> we consider reimplementing latches using semaphores? I assume having
>> the signal handler up the semaphore would allow the attempt to down
>> the semaphore to succeed on return from the handler, so it would
>> accomplish the same thing as the self-pipe trick.
>
> I don't think there's a function to wait for a file descriptor or semaphore
> at the same time.

Oh, good point. So that's out, then.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Andres Freund <andres(at)2ndquadrant(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Escaping from blocked send() reprised.
Date:	2014-09-04 13:53:21
Message-ID:	54086ED1.2060404@vmware.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 09/04/2014 04:37 PM, Robert Haas wrote:
> Hrm. So we'd have to block SIGUSR1, check the flag, then use
> pselect() to temporarily unblock SIGUSR1 and wait, then on return
> again unblock SIGUSR1? Doesn't seem very appealing. I think changing
> the signal mask is fast on Linux, but quite slow on at least some
> other UNIX-like platforms. And I've heard that pselect() isn't always
> truly atomic, so we might run into platform-specific bugs, too. I
> wonder if there's a better way e.g. using memory barriers.
>
> WaitLatch: check is_set. if yes then done. otherwise, set signal_me.
> memory barrier. recheck is_set. if not set then wait using
> poll/select. memory barrier. clear signal_me.
> SetLatch: check is_set. if yes then done. otherwise, set is_set.
> memory barrier. check signal_me. if set, then send SIGUSR1.

Doesn't work. No matter what you do, the process running WaitLatch might
receive the signal immediately before it calls poll/select. The signal
handler will run, and the poll/select call will then go to sleep. There
is no way to do this without support from the kernel, that is why
ppoll/pselect exist.

- Heikki

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc:	Andres Freund <andres(at)2ndquadrant(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Escaping from blocked send() reprised.
Date:	2014-09-04 13:57:14
Message-ID:	CA+TgmoZvJeRDqxFuwQKStx6-F=O39kH4_y_yE1D-cztWxLMN4Q@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Sep 4, 2014 at 9:53 AM, Heikki Linnakangas
<hlinnakangas(at)vmware(dot)com> wrote:
> On 09/04/2014 04:37 PM, Robert Haas wrote:
>> Hrm. So we'd have to block SIGUSR1, check the flag, then use
>> pselect() to temporarily unblock SIGUSR1 and wait, then on return
>> again unblock SIGUSR1? Doesn't seem very appealing. I think changing
>> the signal mask is fast on Linux, but quite slow on at least some
>> other UNIX-like platforms. And I've heard that pselect() isn't always
>> truly atomic, so we might run into platform-specific bugs, too. I
>> wonder if there's a better way e.g. using memory barriers.
>>
>> WaitLatch: check is_set. if yes then done. otherwise, set signal_me.
>> memory barrier. recheck is_set. if not set then wait using
>> poll/select. memory barrier. clear signal_me.
>> SetLatch: check is_set. if yes then done. otherwise, set is_set.
>> memory barrier. check signal_me. if set, then send SIGUSR1.
>
> Doesn't work. No matter what you do, the process running WaitLatch might
> receive the signal immediately before it calls poll/select. The signal
> handler will run, and the poll/select call will then go to sleep. There is
> no way to do this without support from the kernel, that is why ppoll/pselect
> exist.

Eesh, I was confused there: ignore me. I was trying to optimize away
the signal handling but assuming we still had the self-pipe byte. But
of course in that case we don't need to change anything at all.

I'm going to go get some more caffeine.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	hlinnakangas(at)vmware(dot)com
Cc:	robertmhaas(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Escaping from blocked send() reprised.
Date:	2014-09-05 06:42:05
Message-ID:	20140905.154205.167219106.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hello,

> > - This patch introduced redundant socket emulation for win32
> > backend but win32 bare socket for Port is already nonblocking
> > as described so it donsn't seem to be a serious problem on
> > performance. Addition to it, since I don't know the reason why
> > win32/socket.c provides the blocking-mode socket emulation, I
> > decided to preserve win32/socket.c to have blocking socket
> > emulation. Possibly it can be removed.
>
> On Windows, the backend has an emulation layer for POSIX signals,
> which uses threads and Windows events. The reason win32/socket.c
> always uses non-blocking mode internally is that it needs to wait for
> the socket to become readable/writeable, and for the signal-emulation
> event, at the same time. So no, we can't remove it.

I see, thank you.

> The approach taken in the first patch seems sensible. I changed it to
> not use FD_SET, though. A custom array seems better, that way we don't
> need the pgwin32_nonblockset_init() call, we can just use initialize
> the variable. It's a little bit more code, but it's well-contained in
> win32/socket.c. Please take a look, to double-check that I didn't
> screw up.

Thank you. I felt a bit qualm to abusing fd_set. A bit more code
is not a problem.

I had close look on your patch.

Both 'nonblocking' and 'noblock' are appears in function names,
pgwin32_set_socket_block/noblock/is_nonblocking(). I prefer
nonblocking/blocking pair but I'm satisfied they are in uniform
style anyway. (Though I also didn't so ;p)

pgwin32_set_socket_block() leaves garbage in
nonblocking_sockets[] but it's no problem practically. You also
removed blocking'ize(?) code but I agree that it is correct
because fds of nonclosed socket won't be reused anyway.

pg_set_block/noblock() made me laugh. Yes you're correct. Sorry
for the bronken (but workable) code.

After all, the patch looks pretty good.
I'll continue to fit the another patch onto this.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	hlinnakangas(at)vmware(dot)com
Cc:	robertmhaas(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Escaping from blocked send() reprised.
Date:	2014-09-05 08:30:19
Message-ID:	20140905.173019.74674107.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hello, attached is the new-and-far-simple version of this
patch. It no longer depends on win32 nonblocking patch since the
socket is in blocking mode again.

> On 08/28/2014 03:47 PM, Kyotaro HORIGUCHI wrote:
> > - Preventing protocol violation.
> >
> > To prevent protocol violation, secure_write sets
> > ClientConnectionLost when SIGTERM detected, then
> > internal_flush() and ProcessInterrupts() follow the
> > instruction.
>
> Oh, hang on. Now that I look at pqcomm.c more closely, it already has
> a mechanism to avoid writing a message in the middle of another
> message. See pq_putmessage and PqCommBusy. Can we rely on that?

Hmm, it gracefully returns up to ExecProcNode() and PqCommBusy is
turned off on the way at pq_putmessage() under current
implement. So PqCommBusy is already false before it runs into
ProcessInterrupts().

Allowing ImmediateInterruptOK on signalled during send(), setting
whereToSendOutput to DestNone if PqCommBusy is true will do. We
can also distinguish read and write by looking
DoingCommandRead. The ImmediateInterruptOK section can be defined
enclosing by prepare_for_client_read/client_read_end.

> > - Single pg_terminate_backend surely kills the backend.
> >
> > secure_raw_write() uses non-blocking socket and a loop of
> > select() with timeout to surely detects received
> > signal(SIGTERM).
>
> I was going to suggest using WaitLatchOrSocket instead of sleeping in
> 1 second increment, but I see that WaitLatchOrSocket() doesn't
> currently support waiting for a socket to become writeable, without
> also waiting for it to become readable. I wonder how difficult it
> would be to lift that restriction.

It seems quite difficult hearing the following discussion.

> I also wonder if it would be simpler to keep the socket in blocking
> mode after all, and just close() in the signal handler if PqCommBusy
> == true. If the signal arrives while sleeping in send(), the effect
> would be the same as with your patch. If the signal arrives while
> sending, but not sleeping, you would not get the chance to send the
> already-buffered data to the client. But maybe that's OK, whether or
> not you're blocked is not very deterministic anyway.

Hmm. We're back round to the my first patch, with immediately
close the socket, and became irrelevant to win32 layer
patch. Anyway, it sounds reasonable.

Attached patch is a quick hack patch, but it seems working as
expected at a glance.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment	Content-Type	Size
0001-Simplly-cutting-off-the-socket-if-signalled-during-s.patch	text/x-patch	5.0 KB

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	hlinnakangas(at)vmware(dot)com
Cc:	robertmhaas(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Escaping from blocked send() reprised.
Date:	2014-09-05 08:39:35
Message-ID:	20140905.173935.51370343.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Sorry, It tha patch contains a silly bug. Please find the
attatched one.

> Hello, attached is the new-and-far-simple version of this
> patch. It no longer depends on win32 nonblocking patch since the
> socket is in blocking mode again.
>
> > On 08/28/2014 03:47 PM, Kyotaro HORIGUCHI wrote:
> > > - Preventing protocol violation.
> > >
> > > To prevent protocol violation, secure_write sets
> > > ClientConnectionLost when SIGTERM detected, then
> > > internal_flush() and ProcessInterrupts() follow the
> > > instruction.
> >
> > Oh, hang on. Now that I look at pqcomm.c more closely, it already has
> > a mechanism to avoid writing a message in the middle of another
> > message. See pq_putmessage and PqCommBusy. Can we rely on that?
>
> Hmm, it gracefully returns up to ExecProcNode() and PqCommBusy is
> turned off on the way at pq_putmessage() under current
> implement. So PqCommBusy is already false before it runs into
> ProcessInterrupts().
>
> Allowing ImmediateInterruptOK on signalled during send(), setting
> whereToSendOutput to DestNone if PqCommBusy is true will do. We
> can also distinguish read and write by looking
> DoingCommandRead. The ImmediateInterruptOK section can be defined
> enclosing by prepare_for_client_read/client_read_end.
>
> > > - Single pg_terminate_backend surely kills the backend.
> > >
> > > secure_raw_write() uses non-blocking socket and a loop of
> > > select() with timeout to surely detects received
> > > signal(SIGTERM).
> >
> > I was going to suggest using WaitLatchOrSocket instead of sleeping in
> > 1 second increment, but I see that WaitLatchOrSocket() doesn't
> > currently support waiting for a socket to become writeable, without
> > also waiting for it to become readable. I wonder how difficult it
> > would be to lift that restriction.
>
> It seems quite difficult hearing the following discussion.
>
> > I also wonder if it would be simpler to keep the socket in blocking
> > mode after all, and just close() in the signal handler if PqCommBusy
> > == true. If the signal arrives while sleeping in send(), the effect
> > would be the same as with your patch. If the signal arrives while
> > sending, but not sleeping, you would not get the chance to send the
> > already-buffered data to the client. But maybe that's OK, whether or
> > not you're blocked is not very deterministic anyway.
>
> Hmm. We're back round to the my first patch, with immediately
> close the socket, and became irrelevant to win32 layer
> patch. Anyway, it sounds reasonable.
>
> Attached patch is a quick hack patch, but it seems working as
> expected at a glance.

Attachment	Content-Type	Size
0001-Simplly-cutting-off-the-socket-if-signalled-during-s.patch	text/x-patch	5.3 KB

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	hlinnakangas(at)vmware(dot)com
Cc:	robertmhaas(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Escaping from blocked send() reprised.
Date:	2014-09-09 10:31:04
Message-ID:	20140909.193104.141171221.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi, I added and edited some comments.

> Sorry, It tha patch contains a silly bug. Please find the
> attatched one.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment	Content-Type	Size
0001-Simplly-cutting-off-the-socket-if-signalled-during_v2.patch	text/x-patch	6.7 KB

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	hlinnakangas(at)vmware(dot)com
Cc:	robertmhaas(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Escaping from blocked send() reprised.
Date:	2014-09-10 00:04:21
Message-ID:	20140910.090421.66171022.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hmm. Sorry, I misunderstood the specification.

> You approach that coloring tokens seems right, but you have
> broken the parse logic by adding your code.
>
> Other than the mistakes others pointed, I found that
>
> - non-SQL-ident like tokens are ignored by their token style,
> quoted or not, so the following line works.
>
> | "local" All aLL trust
>
> I suppose this is not what you intended. This is because you have
> igonred the attribute of a token when comparing it as
> non-SQL-ident tokens.
>
>
> - '+' at the head of the sequence '+"' is treated as the first
> character of the *quoted* string. e.g. +"hoge" is tokenized as
> "+hoge":special_quoted.

I found this is what intended. This should be documented as
comments.

|2) users and user-groups only requires special handling and behavior as follows
| Normal user :
| A. unquoted ( USER ) will be treated as user ( downcase ).
| B. quoted ( "USeR" ) will be treated as USeR (case-sensitive).
| C. quoted ( "+USER" ) will be treated as normal user +USER (i.e. will not be considered as user-group) and case-sensitive as string is quoted.

This seems confising with the B below. This seems should be
rearranged.

| User Group :
| A. unquoted ( +USERGROUP ) will be treated as +usergruop ( downcase ).
| B. plus quoted ( +"UserGROUP" ) will be treated as +UserGROUP (case-sensitive).

> This is why you simply continued processing for '+"' without
> discarding and skipping the '+', and not setting in_quote so the
> following parser code works as it is not intended. You should
> understand what the original code does and insert or modify
> logics not braeking the assumptions.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

From:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc:	<robertmhaas(at)gmail(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Escaping from blocked send() reprised.
Date:	2014-09-10 15:53:03
Message-ID:	541073DF.70902@vmware.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Wrong thread...

On 09/10/2014 03:04 AM, Kyotaro HORIGUCHI wrote:
> Hmm. Sorry, I misunderstood the specification.
>
>> You approach that coloring tokens seems right, but you have
>> broken the parse logic by adding your code.
>>
>> Other than the mistakes others pointed, I found that
>>
>> - non-SQL-ident like tokens are ignored by their token style,
>> quoted or not, so the following line works.
>>
>> | "local" All aLL trust
>>
>> I suppose this is not what you intended. This is because you have
>> igonred the attribute of a token when comparing it as
>> non-SQL-ident tokens.
>>
>>
>> - '+' at the head of the sequence '+"' is treated as the first
>> character of the *quoted* string. e.g. +"hoge" is tokenized as
>> "+hoge":special_quoted.
>
> I found this is what intended. This should be documented as
> comments.
>
> |2) users and user-groups only requires special handling and behavior as follows
> | Normal user :
> | A. unquoted ( USER ) will be treated as user ( downcase ).
> | B. quoted ( "USeR" ) will be treated as USeR (case-sensitive).
> | C. quoted ( "+USER" ) will be treated as normal user +USER (i.e. will not be considered as user-group) and case-sensitive as string is quoted.
>
> This seems confising with the B below. This seems should be
> rearranged.
>
> | User Group :
> | A. unquoted ( +USERGROUP ) will be treated as +usergruop ( downcase ).
> | B. plus quoted ( +"UserGROUP" ) will be treated as +UserGROUP (case-sensitive).
>
>
>
>> This is why you simply continued processing for '+"' without
>> discarding and skipping the '+', and not setting in_quote so the
>> following parser code works as it is not intended. You should
>> understand what the original code does and insert or modify
>> logics not braeking the assumptions.
>
> regards,
>

--
- Heikki

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	hlinnakangas(at)vmware(dot)com
Cc:	robertmhaas(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Escaping from blocked send() reprised.
Date:	2014-09-17 09:15:43
Message-ID:	20140917.181543.198880029.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Sorry for the mistake...

At Wed, 10 Sep 2014 18:53:03 +0300, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> wrote in <541073DF(dot)70902(at)vmware(dot)com>
> Wrong thread...
>
> On 09/10/2014 03:04 AM, Kyotaro HORIGUCHI wrote:
> > Hmm. Sorry, I misunderstood the specification.
> >
> >> You approach that coloring tokens seems right, but you have
> >> broken the parse logic by adding your code.
> >>
> >> Other than the mistakes others pointed, I found that
> >>
> >> - non-SQL-ident like tokens are ignored by their token style,
> >> quoted or not, so the following line works.
> >>
> >> | "local" All aLL trust

--
Kyotaro Horiguchi
NTT Open Source Software Center

From:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc:	<robertmhaas(at)gmail(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Escaping from blocked send() reprised.
Date:	2014-09-26 18:02:16
Message-ID:	5425AA28.60308@vmware.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 09/09/2014 01:31 PM, Kyotaro HORIGUCHI wrote:
> Hi, I added and edited some comments.
>
>> Sorry, It tha patch contains a silly bug. Please find the
>> attatched one.

I must say this scares the heck out of me. The current code goes through
some trouble to not throw an error while in a recv() send(). For
example, you removed the DoingCommandRead check from
prepare_for_client_read(). There's an comment in postgres.c that says this:

> /*
> * (2) Allow asynchronous signals to be executed immediately if they
> * come in while we are waiting for client input. (This must be
> * conditional since we don't want, say, reads on behalf of COPY FROM
> * STDIN doing the same thing.)
> */
> DoingCommandRead = true;

With the patch, we do allow asynchronous signals to be processed while
blocked in COPY FROM STDIN. Maybe that's OK, but I don't feel
comfortable just changing it. (the comment is now wrong, of course)

This patch also enables processing query cancel signals while blocked,
not just SIGTERM. That's not good; we might be in the middle of sending
a message, and we cannot just error out of that or we might violate the
fe/be protocol. That's OK with a SIGTERM as you're terminating the
connection anyway, and we have the PqCommBusy safeguard in place that
prevents us from sending broken messages to the client, but that's not
good enough if we wanted to keep the backend alive, as we won't be able
to send anything to the client anymore.

BTW, we've been talking about blocking in send(), but this patch also
let's a recv() in e.g. COPY FROM STDIN to be interrupted. That's
probably a good thing; surely you have exactly the same issues with that
as with send(). But I didn't realize we had a problem with that too.

In summary, this patch is not ready as it is, but I think we can fix it.
The key question is: is it safe to handle SIGTERM in the signal handler,
calling the exit-handlers and exiting the backend, when blocked in a
recv() or send()? It's a change in the pqcomm.c API; most pqcomm.c
functions have not thrown errors or processed interrupts before. But
looking at the callers, I think it's safe, and there isn't actually any
comments explicitly saying that pqcomm.c will never throw errors.

I propose the attached patch. It adds a new flag ImmediateDieOK, which
is a weaker form of ImmediateInterruptOK that only allows handling a
pending die-signal in the signal handler.

Robert, others, do you see a problem with this?

Over IM, Robert pointed out that it's not safe to jump out of a signal
handler with siglongjmp, when we're inside library calls, like in a
callback called by OpenSSL. But even with current master branch, that's
exactly what we do. In secure_raw_read(), we set ImmediateInterruptOK =
true, which means that any incoming signal will be handled directly in
the signal handler, which can mean elog(ERROR). Should we be worried?
OpenSSL might get confused if control never returns to the SSL_read() or
SSL_write() function that called secure_raw_read().

- Heikki

Attachment	Content-Type	Size
ImmediateDieOk-1.patch	text/x-diff	5.2 KB

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, robertmhaas(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Escaping from blocked send() reprised.
Date:	2014-09-27 19:12:43
Message-ID:	20140927191243.GD5423@alap3.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2014-09-03 15:09:54 +0300, Heikki Linnakangas wrote:
> On 09/03/2014 12:23 AM, Andres Freund wrote:
> >On 2014-09-02 17:21:03 -0400, Tom Lane wrote:
> >>Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> writes:
> >>>I was going to suggest using WaitLatchOrSocket instead of sleeping in 1
> >>>second increment, but I see that WaitLatchOrSocket() doesn't currently
> >>>support waiting for a socket to become writeable, without also waiting
> >>>for it to become readable. I wonder how difficult it would be to lift
> >>>that restriction.
> >>
> >>My recollection is that there was a reason for that, but I don't recall
> >>details any more.
> >
> >http://git.postgresql.org/pg/commitdiff/e42a21b9e6c9b9e6346a34b62628d48ff2fc6ddf
> >
> >In my prototype I've changed the API that errors set both
> >READABLE/WRITABLE. Seems to work....
>
> Andres, would you mind posting the WIP patch you have? That could be a
> better foundation for this patch.

Sorry, I missed this message and only cought up when reading your CF
status mail. I've attached three patches:

0001: Allows WaitLatchOrSocket(WL_WRITABLE) without WL_READABLE. I've
tested the poll() and select() implementations on linux and
blindly patched up windows.
0002: Put the socket the backend uses to communicate with the client
into nonblocking mode as soon as latches are ready and use latches
to wait. This probably doesn't work correctly without 0003, but
seems easier to review separately.
0003: Don't do sinval catchup and notify processing in signal
handlers. It's quite cool that it worked that well so far, but it
requires some complicated code and is rather fragile. 0002 allows
to move that out of signal handlers and just use a latch
there. This seems remarkably simpler:
4 files changed, 69 insertions(+), 229 deletions(-)

These aren't ready for commit, especially not 0003, but I think they are
quite a good foundation for getting rid of the blocking in send(). I
haven't added any interrupt processing after interrupted writes, but
marked the relevant places with XXXs.

With regard to 0002, I dislike the current need to do interrupt
processing both in be-secure.c and be-secure-openssl.c. I guess we could
solve that by returning something like EINTR from the ssl routines when
they need further reads/writes and do all the processing in one place in
be-secure.c.

There's also some cleanup in 0002/0003 needed:
prepare_for_client_read()/client_read_ended() aren't needed in that form
anymore and should probably rather be something like
CHECK_FOR_READ_INTERRUPT() or similar. Similarly the
EnableCatchupInterrupt()/DisableCatchupInterrupt() in autovacuum.c is
pretty ugly.

Btw, be-secure.c is really not a good name anymore...

What do you think?

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, robertmhaas(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Escaping from blocked send() reprised.
Date:	2014-09-27 22:54:21
Message-ID:	20140927225421.GE5423@alap3.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2014-09-27 21:12:43 +0200, Andres Freund wrote:
> On 2014-09-03 15:09:54 +0300, Heikki Linnakangas wrote:
> > On 09/03/2014 12:23 AM, Andres Freund wrote:
> > >On 2014-09-02 17:21:03 -0400, Tom Lane wrote:
> > >>Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> writes:
> > >>>I was going to suggest using WaitLatchOrSocket instead of sleeping in 1
> > >>>second increment, but I see that WaitLatchOrSocket() doesn't currently
> > >>>support waiting for a socket to become writeable, without also waiting
> > >>>for it to become readable. I wonder how difficult it would be to lift
> > >>>that restriction.
> > >>
> > >>My recollection is that there was a reason for that, but I don't recall
> > >>details any more.
> > >
> > >http://git.postgresql.org/pg/commitdiff/e42a21b9e6c9b9e6346a34b62628d48ff2fc6ddf
> > >
> > >In my prototype I've changed the API that errors set both
> > >READABLE/WRITABLE. Seems to work....
> >
> > Andres, would you mind posting the WIP patch you have? That could be a
> > better foundation for this patch.
>
> Sorry, I missed this message and only cought up when reading your CF
> status mail. I've attached three patches:
>
> 0001: Allows WaitLatchOrSocket(WL_WRITABLE) without WL_READABLE. I've
> tested the poll() and select() implementations on linux and
> blindly patched up windows.
> 0002: Put the socket the backend uses to communicate with the client
> into nonblocking mode as soon as latches are ready and use latches
> to wait. This probably doesn't work correctly without 0003, but
> seems easier to review separately.
> 0003: Don't do sinval catchup and notify processing in signal
> handlers. It's quite cool that it worked that well so far, but it
> requires some complicated code and is rather fragile. 0002 allows
> to move that out of signal handlers and just use a latch
> there. This seems remarkably simpler:
> 4 files changed, 69 insertions(+), 229 deletions(-)
>
> These aren't ready for commit, especially not 0003, but I think they are
> quite a good foundation for getting rid of the blocking in send(). I
> haven't added any interrupt processing after interrupted writes, but
> marked the relevant places with XXXs.
>
> With regard to 0002, I dislike the current need to do interrupt
> processing both in be-secure.c and be-secure-openssl.c. I guess we could
> solve that by returning something like EINTR from the ssl routines when
> they need further reads/writes and do all the processing in one place in
> be-secure.c.
>
> There's also some cleanup in 0002/0003 needed:
> prepare_for_client_read()/client_read_ended() aren't needed in that form
> anymore and should probably rather be something like
> CHECK_FOR_READ_INTERRUPT() or similar. Similarly the
> EnableCatchupInterrupt()/DisableCatchupInterrupt() in autovacuum.c is
> pretty ugly.
>
> Btw, be-secure.c is really not a good name anymore...
>
> What do you think?

I've invested some more time in this:
0002 now makes sense on its own and doesn't change anything around the
interrupt handling. Oh, and it compiles without 0003.
0003 Sinval/notify processing got simplified further. There really isn't
any need for DisableNotifyInterrupt/DisableCatchupInterrupt
anymore. Also begin_client_read/client_read_ended don't make much
sense anymore. Instead introduce ProcessClientReadInterrupt (which
wants a better name).
There's also a very WIP
0004 Allows secure_read/write be interrupted when ProcDiePending is
set. All of that happens via the latch mechanism, nothing happens
inside signal handlers. So I do think it's quite an improvement
over what's been discussed in this thread.
But it (and the other approaches) do noticeably increase the
likelihood of clients not getting the error message if the client
isn't actually dead. The likelihood of write() being blocked
*temporarily* due to normal bandwidth constraints is quite high
when you consider COPY FROM and similar. Right now we'll wait till
we can put the error message into the socket afaics.

1-3 need some serious comment work, but I think the approach is
basically sound. I'm much, much less sure about allowing send() to be
interrupted.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment	Content-Type	Size
0001-Allow-to-wait-for-a-writable-socket-in-WaitLatchOrSo.patch	text/x-patch	4.7 KB
0002-Make-backends-use-a-nonblocking-socket-to-communicat.patch	text/x-patch	6.7 KB
0003-Move-sinval-catchup-and-notify-processing-out-of-sig.patch	text/x-patch	33.2 KB
0004-Heavily-WIP-Process-die-interrupts-while-reading-wri.patch	text/x-patch	4.0 KB

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, robertmhaas(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Escaping from blocked send() reprised.
Date:	2014-09-29 23:45:29
Message-ID:	20140929234529.GL2084@awork2.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2014-09-28 00:54:21 +0200, Andres Freund wrote:
> On 2014-09-27 21:12:43 +0200, Andres Freund wrote:
> > On 2014-09-03 15:09:54 +0300, Heikki Linnakangas wrote:
> > Sorry, I missed this message and only cought up when reading your CF
> > status mail. I've attached three patches:
> >
> > 0001: Allows WaitLatchOrSocket(WL_WRITABLE) without WL_READABLE. I've
> > tested the poll() and select() implementations on linux and
> > blindly patched up windows.
> > 0002: Put the socket the backend uses to communicate with the client
> > into nonblocking mode as soon as latches are ready and use latches
> > to wait. This probably doesn't work correctly without 0003, but
> > seems easier to review separately.
> > 0003: Don't do sinval catchup and notify processing in signal
> > handlers. It's quite cool that it worked that well so far, but it
> > requires some complicated code and is rather fragile. 0002 allows
> > to move that out of signal handlers and just use a latch
> > there. This seems remarkably simpler:
> > 4 files changed, 69 insertions(+), 229 deletions(-)
> >
> > These aren't ready for commit, especially not 0003, but I think they are
> > quite a good foundation for getting rid of the blocking in send(). I
> > haven't added any interrupt processing after interrupted writes, but
> > marked the relevant places with XXXs.
> >
> > With regard to 0002, I dislike the current need to do interrupt
> > processing both in be-secure.c and be-secure-openssl.c. I guess we could
> > solve that by returning something like EINTR from the ssl routines when
> > they need further reads/writes and do all the processing in one place in
> > be-secure.c.
> >
> > There's also some cleanup in 0002/0003 needed:
> > prepare_for_client_read()/client_read_ended() aren't needed in that form
> > anymore and should probably rather be something like
> > CHECK_FOR_READ_INTERRUPT() or similar. Similarly the
> > EnableCatchupInterrupt()/DisableCatchupInterrupt() in autovacuum.c is
> > pretty ugly.
> >
> > Btw, be-secure.c is really not a good name anymore...
> >
> > What do you think?
>
> I've invested some more time in this:
> 0002 now makes sense on its own and doesn't change anything around the
> interrupt handling. Oh, and it compiles without 0003.
> 0003 Sinval/notify processing got simplified further. There really isn't
> any need for DisableNotifyInterrupt/DisableCatchupInterrupt
> anymore. Also begin_client_read/client_read_ended don't make much
> sense anymore. Instead introduce ProcessClientReadInterrupt (which
> wants a better name).
> There's also a very WIP
> 0004 Allows secure_read/write be interrupted when ProcDiePending is
> set. All of that happens via the latch mechanism, nothing happens
> inside signal handlers. So I do think it's quite an improvement
> over what's been discussed in this thread.
> But it (and the other approaches) do noticeably increase the
> likelihood of clients not getting the error message if the client
> isn't actually dead. The likelihood of write() being blocked
> *temporarily* due to normal bandwidth constraints is quite high
> when you consider COPY FROM and similar. Right now we'll wait till
> we can put the error message into the socket afaics.
>
> 1-3 need some serious comment work, but I think the approach is
> basically sound. I'm much, much less sure about allowing send() to be
> interrupted.

Kyatoro, could you check whether you can achieve what you want using
0004?

It's imo pretty clear that a fair amount of base work needs to be done
and there's been a fair amount of progress made this fest. I think this
can now be marked returned with feedback.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	hlinnakangas(at)vmware(dot)com
Cc:	robertmhaas(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Escaping from blocked send() reprised.
Date:	2014-09-30 06:35:39
Message-ID:	20140930.153539.49495386.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Thank you for reviewing. I'll look close to the patch tomorrow.

> I must say this scares the heck out of me. The current code goes
> through some trouble to not throw an error while in a recv()
> send(). For example, you removed the DoingCommandRead check from
> prepare_for_client_read(). There's an comment in postgres.c that says
> this:
>
> > /*
> > * (2) Allow asynchronous signals to be executed immediately
> > * if they
> > * come in while we are waiting for client input. (This must
> > * be
> > * conditional since we don't want, say, reads on behalf of
> > * COPY FROM
> > * STDIN doing the same thing.)
> > */
> > DoingCommandRead = true;

Hmm. Sorry. That's my fault that I skipped over the issues about
"COPY FROM STDIN".

> With the patch, we do allow asynchronous signals to be processed while
> blocked in COPY FROM STDIN. Maybe that's OK, but I don't feel
> comfortable just changing it. (the comment is now wrong, of course)

I don't see actual problem but I agree that the behavior should
not be chenged.

> This patch also enables processing query cancel signals while blocked,
> not just SIGTERM. That's not good; we might be in the middle of
> sending a message, and we cannot just error out of that or we might
> violate the fe/be protocol. That's OK with a SIGTERM as you're
> terminating the connection anyway, and we have the PqCommBusy
> safeguard in place that prevents us from sending broken messages to
> the client, but that's not good enough if we wanted to keep the
> backend alive, as we won't be able to send anything to the client
> anymore.

Ok, since what I want is escaping from blocked send() only by
SIGTERM, it needs another mechanism from current
prepare_for_client_read().

> BTW, we've been talking about blocking in send(), but this patch also
> let's a recv() in e.g. COPY FROM STDIN to be interrupted. That's
> probably a good thing; surely you have exactly the same issues with
> that as with send(). But I didn't realize we had a problem with that
> too.

I see. (But it is mere a side-effect of my carelessness, as you know:)

> In summary, this patch is not ready as it is, but I think we can fix
> it. The key question is: is it safe to handle SIGTERM in the signal
> handler, calling the exit-handlers and exiting the backend, when
> blocked in a recv() or send()? It's a change in the pqcomm.c API; most
> pqcomm.c functions have not thrown errors or processed interrupts
> before. But looking at the callers, I think it's safe, and there isn't
> actually any comments explicitly saying that pqcomm.c will never throw
> errors.
>
> I propose the attached patch. It adds a new flag ImmediateDieOK, which
> is a weaker form of ImmediateInterruptOK that only allows handling a
> pending die-signal in the signal handler.
>
> Robert, others, do you see a problem with this?

The patch seems excluding all problems menthioned in the message,
I have no objection to it.

> Over IM, Robert pointed out that it's not safe to jump out of a signal
> handler with siglongjmp, when we're inside library calls, like in a
> callback called by OpenSSL. But even with current master branch,
> that's exactly what we do. In secure_raw_read(), we set
> ImmediateInterruptOK = true, which means that any incoming signal will
> be handled directly in the signal handler, which can mean
> elog(ERROR). Should we be worried? OpenSSL might get confused if
> control never returns to the SSL_read() or SSL_write() function that
> called secure_raw_read().

IMHO, it will soon die even if OpenSSL is confused. It seems a
bit brute that sudden cutoff occurs even when the socket is *not*
blocked, but the backend will soon die and frontend will
immediately get ECONNRESET (..hmm it is not seen in manpages of
recv/read(2)) and should safely exit from OpenSSL.

I cannot run this patch right now, but it seems to be no problem.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	andres(at)2ndquadrant(dot)com
Cc:	hlinnakangas(at)vmware(dot)com, tgl(at)sss(dot)pgh(dot)pa(dot)us, robertmhaas(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Escaping from blocked send() reprised.
Date:	2014-09-30 06:46:40
Message-ID:	20140930.154640.89059482.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Wow, thank you for the patch.

> > > 0001: Allows WaitLatchOrSocket(WL_WRITABLE) without WL_READABLE. I've
> > > tested the poll() and select() implementations on linux and
> > > blindly patched up windows.
> > > 0002: Put the socket the backend uses to communicate with the client
> > > into nonblocking mode as soon as latches are ready and use latches
> > > to wait. This probably doesn't work correctly without 0003, but
> > > seems easier to review separately.
> > > 0003: Don't do sinval catchup and notify processing in signal
> > > handlers. It's quite cool that it worked that well so far, but it
> > > requires some complicated code and is rather fragile. 0002 allows
> > > to move that out of signal handlers and just use a latch
> > > there. This seems remarkably simpler:
> > > 4 files changed, 69 insertions(+), 229 deletions(-)
> > >
> > > These aren't ready for commit, especially not 0003, but I think they are
> > > quite a good foundation for getting rid of the blocking in send(). I
> > > haven't added any interrupt processing after interrupted writes, but
> > > marked the relevant places with XXXs.
> > >
> > > With regard to 0002, I dislike the current need to do interrupt
> > > processing both in be-secure.c and be-secure-openssl.c. I guess we could
> > > solve that by returning something like EINTR from the ssl routines when
> > > they need further reads/writes and do all the processing in one place in
> > > be-secure.c.
> > >
> > > There's also some cleanup in 0002/0003 needed:
> > > prepare_for_client_read()/client_read_ended() aren't needed in that form
> > > anymore and should probably rather be something like
> > > CHECK_FOR_READ_INTERRUPT() or similar. Similarly the
> > > EnableCatchupInterrupt()/DisableCatchupInterrupt() in autovacuum.c is
> > > pretty ugly.
> > >
> > > Btw, be-secure.c is really not a good name anymore...
> > >
> > > What do you think?
> >
> > I've invested some more time in this:
> > 0002 now makes sense on its own and doesn't change anything around the
> > interrupt handling. Oh, and it compiles without 0003.
> > 0003 Sinval/notify processing got simplified further. There really isn't
> > any need for DisableNotifyInterrupt/DisableCatchupInterrupt
> > anymore. Also begin_client_read/client_read_ended don't make much
> > sense anymore. Instead introduce ProcessClientReadInterrupt (which
> > wants a better name).
> > There's also a very WIP
> > 0004 Allows secure_read/write be interrupted when ProcDiePending is
> > set. All of that happens via the latch mechanism, nothing happens
> > inside signal handlers. So I do think it's quite an improvement
> > over what's been discussed in this thread.
> > But it (and the other approaches) do noticeably increase the
> > likelihood of clients not getting the error message if the client
> > isn't actually dead. The likelihood of write() being blocked
> > *temporarily* due to normal bandwidth constraints is quite high
> > when you consider COPY FROM and similar. Right now we'll wait till
> > we can put the error message into the socket afaics.
> >
> > 1-3 need some serious comment work, but I think the approach is
> > basically sound. I'm much, much less sure about allowing send() to be
> > interrupted.
>
> Kyatoro, could you check whether you can achieve what you want using
> 0004?
>
> It's imo pretty clear that a fair amount of base work needs to be done
> and there's been a fair amount of progress made this fest. I think this
> can now be marked returned with feedback.

Myself is satisfied by Heikki's solution, and it seems ready for
commit. But I agree with the temporarily blocked state is seen
often and it breaks even non-blocked socket. If we want to/should
avoid breaking *temporarily or not* blocked socket even for
SIGTERM, this mechanism should be used.

Which way should we take?

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	andres(at)2ndquadrant(dot)com
Cc:	hlinnakangas(at)vmware(dot)com, tgl(at)sss(dot)pgh(dot)pa(dot)us, robertmhaas(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Escaping from blocked send() reprised.
Date:	2014-09-30 07:05:19
Message-ID:	20140930.160519.222985399.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

By the way,

> Sorry, I missed this message and only cought up when reading your CF
> status mail. I've attached three patches:

Could let me know how to get the CF status mail?

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

From:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, <andres(at)2ndquadrant(dot)com>
Cc:	<tgl(at)sss(dot)pgh(dot)pa(dot)us>, <robertmhaas(at)gmail(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Escaping from blocked send() reprised.
Date:	2014-09-30 07:34:59
Message-ID:	542A5D23.4060004@vmware.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 09/30/2014 10:05 AM, Kyotaro HORIGUCHI wrote:
> By the way,
>
>> Sorry, I missed this message and only cought up when reading your CF
>> status mail. I've attached three patches:
>
> Could let me know how to get the CF status mail?

I think he meant this email I sent last weekend:

http://www.postgresql.org/message-id/542672D2.3060708@vmware.com

- Heikki

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, robertmhaas(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Escaping from blocked send() reprised.
Date:	2014-09-30 08:49:21
Message-ID:	20140930084921.GN2084@awork2.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2014-09-26 21:02:16 +0300, Heikki Linnakangas wrote:
> I propose the attached patch. It adds a new flag ImmediateDieOK, which is a
> weaker form of ImmediateInterruptOK that only allows handling a pending
> die-signal in the signal handler.
>
> Robert, others, do you see a problem with this?

Per se I don't have a problem with it. There does exist the problem that
the user doesn't get a error message in more cases though. On the other
hand it's bad if any user can prevent the database from restarting.

> Over IM, Robert pointed out that it's not safe to jump out of a signal
> handler with siglongjmp, when we're inside library calls, like in a callback
> called by OpenSSL. But even with current master branch, that's exactly what
> we do. In secure_raw_read(), we set ImmediateInterruptOK = true, which means
> that any incoming signal will be handled directly in the signal handler,
> which can mean elog(ERROR). Should we be worried? OpenSSL might get confused
> if control never returns to the SSL_read() or SSL_write() function that
> called secure_raw_read().

But this is imo prohibitive. Yes, we're doing it for a long while. But
no, that's not ok. It actually prompoted me into prototyping the latch
thing (in some other thread). I don't think existing practice justifies
expanding it further.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	hlinnakangas(at)vmware(dot)com
Cc:	andres(at)2ndquadrant(dot)com, tgl(at)sss(dot)pgh(dot)pa(dot)us, robertmhaas(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Escaping from blocked send() reprised.
Date:	2014-10-02 06:36:43
Message-ID:	20141002.153643.20806052.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> >> Sorry, I missed this message and only cought up when reading your CF
> >> status mail. I've attached three patches:
> >
> > Could let me know how to get the CF status mail?
>
> I think he meant this email I sent last weekend:
>
> http://www.postgresql.org/message-id/542672D2.3060708@vmware.com

I see, that's what I also received. Thank you.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	andres(at)2ndquadrant(dot)com
Cc:	hlinnakangas(at)vmware(dot)com, robertmhaas(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Escaping from blocked send() reprised.
Date:	2014-10-02 08:47:39
Message-ID:	20141002.174739.24593737.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hello,

> > I propose the attached patch. It adds a new flag ImmediateDieOK, which is a
> > weaker form of ImmediateInterruptOK that only allows handling a pending
> > die-signal in the signal handler.
> >
> > Robert, others, do you see a problem with this?
>
> Per se I don't have a problem with it. There does exist the problem that
> the user doesn't get a error message in more cases though. On the other
> hand it's bad if any user can prevent the database from restarting.
>
> > Over IM, Robert pointed out that it's not safe to jump out of a signal
> > handler with siglongjmp, when we're inside library calls, like in a callback
> > called by OpenSSL. But even with current master branch, that's exactly what
> > we do. In secure_raw_read(), we set ImmediateInterruptOK = true, which means
> > that any incoming signal will be handled directly in the signal handler,
> > which can mean elog(ERROR). Should we be worried? OpenSSL might get confused
> > if control never returns to the SSL_read() or SSL_write() function that
> > called secure_raw_read().
>
> But this is imo prohibitive. Yes, we're doing it for a long while. But
> no, that's not ok. It actually prompoted me into prototyping the latch
> thing (in some other thread). I don't think existing practice justifies
> expanding it further.

I see, in that case, this approach seems basically
applicable. But if I understand correctly, this patch seems not
to return out of the openssl code even when latch was found to be
set in secure_raw_write/read. I tried setting errno = ECONNRESET
and it went well but seems a bad deed.

secure_raw_write(Port *port, const void *ptr, size_t len)
{
n = send(port->sock, ptr, len, 0);

if (!port->noblock && n < 0 && (errno == EWOULDBLOCK || errno == EAGAIN))
{
w = WaitLatchOrSocket(&MyProc->procLatch, ...

if (w & WL_LATCH_SET)
{
ResetLatch(&MyProc->procLatch);
/*
* Force a return, so interrupts can be processed when not
* (possibly) underneath a ssl library.
*/
errno = EINTR;
(return n; // n is negative)

my_sock_write(BIO *h, const char *buf, int size)
{
res = secure_raw_write(((Port *) h->ptr), buf, size);
BIO_clear_retry_flags(h);
if (res <= 0)
{
if (errno == EINTR || errno == EWOULDBLOCK || errno == EAGAIN)
{
BIO_set_retry_write(h);

--
Kyotaro Horiguchi
NTT Open Source Software Center

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc:	hlinnakangas(at)vmware(dot)com, robertmhaas(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Escaping from blocked send() reprised.
Date:	2014-10-02 09:02:50
Message-ID:	20141002090250.GF7158@awork2.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2014-10-02 17:47:39 +0900, Kyotaro HORIGUCHI wrote:
> Hello,
>
> > > I propose the attached patch. It adds a new flag ImmediateDieOK, which is a
> > > weaker form of ImmediateInterruptOK that only allows handling a pending
> > > die-signal in the signal handler.
> > >
> > > Robert, others, do you see a problem with this?
> >
> > Per se I don't have a problem with it. There does exist the problem that
> > the user doesn't get a error message in more cases though. On the other
> > hand it's bad if any user can prevent the database from restarting.
> >
> > > Over IM, Robert pointed out that it's not safe to jump out of a signal
> > > handler with siglongjmp, when we're inside library calls, like in a callback
> > > called by OpenSSL. But even with current master branch, that's exactly what
> > > we do. In secure_raw_read(), we set ImmediateInterruptOK = true, which means
> > > that any incoming signal will be handled directly in the signal handler,
> > > which can mean elog(ERROR). Should we be worried? OpenSSL might get confused
> > > if control never returns to the SSL_read() or SSL_write() function that
> > > called secure_raw_read().
> >
> > But this is imo prohibitive. Yes, we're doing it for a long while. But
> > no, that's not ok. It actually prompoted me into prototyping the latch
> > thing (in some other thread). I don't think existing practice justifies
> > expanding it further.
>
> I see, in that case, this approach seems basically
> applicable. But if I understand correctly, this patch seems not
> to return out of the openssl code even when latch was found to be
> set in secure_raw_write/read.

Correct. That's why I think it's the way forward. There's several
problems now where the inability to do real things while reading/writing
is a problem.

> I tried setting errno = ECONNRESET
> and it went well but seems a bad deed.

Where and why did you do that?

> secure_raw_write(Port *port, const void *ptr, size_t len)
> {
> n = send(port->sock, ptr, len, 0);
>
> if (!port->noblock && n < 0 && (errno == EWOULDBLOCK || errno == EAGAIN))
> {
> w = WaitLatchOrSocket(&MyProc->procLatch, ...
>
> if (w & WL_LATCH_SET)
> {
> ResetLatch(&MyProc->procLatch);
> /*
> * Force a return, so interrupts can be processed when not
> * (possibly) underneath a ssl library.
> */
> errno = EINTR;
> (return n; // n is negative)
>
>
> my_sock_write(BIO *h, const char *buf, int size)
> {
> res = secure_raw_write(((Port *) h->ptr), buf, size);
> BIO_clear_retry_flags(h);
> if (res <= 0)
> {
> if (errno == EINTR || errno == EWOULDBLOCK || errno == EAGAIN)
> {
> BIO_set_retry_write(h);

Hm, this seems, besides one comment, the code from the last patch in my
series. Do you have a particular question about it?

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	andres(at)2ndquadrant(dot)com
Cc:	hlinnakangas(at)vmware(dot)com, robertmhaas(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Escaping from blocked send() reprised.
Date:	2014-10-02 10:11:21
Message-ID:	20141002.191121.13329779.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

> > > But this is imo prohibitive. Yes, we're doing it for a long while. But
> > > no, that's not ok. It actually prompoted me into prototyping the latch
> > > thing (in some other thread). I don't think existing practice justifies
> > > expanding it further.
> >
> > I see, in that case, this approach seems basically
> > applicable. But if I understand correctly, this patch seems not
> > to return out of the openssl code even when latch was found to be
> > set in secure_raw_write/read.
>
> Correct. That's why I think it's the way forward. There's several
> problems now where the inability to do real things while reading/writing
> is a problem.
>
> > I tried setting errno = ECONNRESET
> > and it went well but seems a bad deed.
>
> Where and why did you do that?

The patch of this message.

http://www.postgresql.org/message-id/20140828.214704.93968088.horiguchi.kyotaro@lab.ntt.co.jp

The reason for setting errno (instead of a variable for it) is to
trick openssl (or my_socck_write? I've forgot it..) into
recognizing as if the underneath send(2) have returned with any
uncontinueable error so it cannot be any of continueable errnos
(EINTR/EWOULDBLOCK/EAGAIN). Iy my faint memory, only avoiding
BIO_set_retry_write() in my_sock_write() dosn't work as expected
but it might be enough that my_sock_write returns -1 and doesn't
set BIO_set_retry_write().

The reason why ECONNNRESET is any of other errnos possible for
send(2)(*1) doesn't seem to fit the real situation, and the
blocked situation seems similar to resetted connection from the
view that it cannot continue to work due to external condition,
and it is used in be_tls_write() in a similary way.

Come to think of it, setting ECONNRESET is not so evil?

> > secure_raw_write(Port *port, const void *ptr, size_t len)
> > {
> > n = send(port->sock, ptr, len, 0);
> >
> > if (!port->noblock && n < 0 && (errno == EWOULDBLOCK || errno == EAGAIN))
> > {
> > w = WaitLatchOrSocket(&MyProc->procLatch, ...
> >
> > if (w & WL_LATCH_SET)
> > {
> > ResetLatch(&MyProc->procLatch);
> > /*
> > * Force a return, so interrupts can be processed when not
> > * (possibly) underneath a ssl library.
> > */
> > errno = EINTR;
> > (return n; // n is negative)
> >
> >
> > my_sock_write(BIO *h, const char *buf, int size)
> > {
> > res = secure_raw_write(((Port *) h->ptr), buf, size);
> > BIO_clear_retry_flags(h);
> > if (res <= 0)
> > {
> > if (errno == EINTR || errno == EWOULDBLOCK || errno == EAGAIN)
> > {
> > BIO_set_retry_write(h);
>
> Hm, this seems, besides one comment, the code from the last patch in my
> series. Do you have a particular question about it?

I didn't have a particluar qustion about it. This is cited only
in order to show the route to retrying.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

From:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, <robertmhaas(at)gmail(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Escaping from blocked send() reprised.
Date:	2014-10-03 14:12:18
Message-ID:	542EAEC2.5060705@vmware.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 09/28/2014 01:54 AM, Andres Freund wrote:
> I've invested some more time in this:

Thanks!

In 0001, the select() codepath will not return (WL_SOCKET_READABLE |
WL_SOCKET_WRITEABLE) on EOF or error, like the comment says and like the
poll() path does. It only sets WL_SOCKET_READABLE if WL_SOCKET_READABLE
was requested as a wake-event, and likewise for writeable, while the
poll() codepath returns (WL_SOCKET_READABLE | WL_SOCKET_WRITEABLE)
regardless of the requested wake-events. I'm not sure which is actually
better - a separate WL_SOCKET_ERROR code might be best - but it's
inconsistent as it is.

> 0002 now makes sense on its own and doesn't change anything around the
> interrupt handling. Oh, and it compiles without 0003.

WaitLatchOrSocket() can throw an error, so it's not totally safe to call
that underneath OpenSSL. Admittedly the cases where it throws an error
are "shouldn't happen" cases like "poll() failed" or "read() on
self-pipe failed", but still. Perhaps those errors should be
reclassified as FATAL; it's not clear you can just roll back and expect
to continue running if any of them happens.

In secure_raw_write(), you need to save and restore errno, as
WaitLatchOrSocket will not preserve it. If secure_raw_write() calls
WaitLatchOrSocket(), and it returns because the latch was set, and we
fall out of secure_raw_write, we will return -1 but the errno might not
be set to anything sensible anymore.

> 0003 Sinval/notify processing got simplified further. There really isn't
> any need for DisableNotifyInterrupt/DisableCatchupInterrupt
> anymore. Also begin_client_read/client_read_ended don't make much
> sense anymore. Instead introduce ProcessClientReadInterrupt (which
> wants a better name).
> There's also a very WIP
> 0004 Allows secure_read/write be interrupted when ProcDiePending is
> set. All of that happens via the latch mechanism, nothing happens
> inside signal handlers. So I do think it's quite an improvement
> over what's been discussed in this thread.
> But it (and the other approaches) do noticeably increase the
> likelihood of clients not getting the error message if the client
> isn't actually dead. The likelihood of write() being blocked
> *temporarily* due to normal bandwidth constraints is quite high
> when you consider COPY FROM and similar. Right now we'll wait till
> we can put the error message into the socket afaics.
>
> 1-3 need some serious comment work, but I think the approach is
> basically sound. I'm much, much less sure about allowing send() to be
> interrupted.

Yeah, 1-3 seem sane. 4 also looks OK to me at a quick glance. It
basically enables handling the "die" interrupt immediately, if we're
blocked in a read or write. It won't be handled in the signal handler,
but within the secure_read/write call anyway.

- Heikki

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, robertmhaas(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Escaping from blocked send() reprised.
Date:	2014-10-03 14:26:35
Message-ID:	20141003142635.GT7158@awork2.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

On 2014-10-03 17:12:18 +0300, Heikki Linnakangas wrote:
> On 09/28/2014 01:54 AM, Andres Freund wrote:
> >I've invested some more time in this:
>
> Thanks!
>
> In 0001, the select() codepath will not return (WL_SOCKET_READABLE |
> WL_SOCKET_WRITEABLE) on EOF or error, like the comment says and like the
> poll() path does. It only sets WL_SOCKET_READABLE if WL_SOCKET_READABLE was
> requested as a wake-event, and likewise for writeable, while the poll()
> codepath returns (WL_SOCKET_READABLE | WL_SOCKET_WRITEABLE) regardless of
> the requested wake-events. I'm not sure which is actually better - a
> separate WL_SOCKET_ERROR code might be best - but it's inconsistent as it
> is.

Hm. Right. I think we should only report the requested state. We can't
really discern wether it's a hangup, error or actually readable/writable
with select() - it just returns the socket as readable/writable as soon
as it doesn't block anymore. Where not blocking includes the connection
having gone bad.

It took me a while to figure out whether that's actually guaranteed by
the spec, but I'm pretty sure it is...

> >0002 now makes sense on its own and doesn't change anything around the
> > interrupt handling. Oh, and it compiles without 0003.
>
> WaitLatchOrSocket() can throw an error, so it's not totally safe to call
> that underneath OpenSSL.

Hm. Fair point.

> Admittedly the cases where it throws an error are
> "shouldn't happen" cases like "poll() failed" or "read() on self-pipe
> failed", but still. Perhaps those errors should be reclassified as FATAL;
> it's not clear you can just roll back and expect to continue running if any
> of them happens.

Fine with me.

> In secure_raw_write(), you need to save and restore errno, as
> WaitLatchOrSocket will not preserve it. If secure_raw_write() calls
> WaitLatchOrSocket(), and it returns because the latch was set, and we fall
> out of secure_raw_write, we will return -1 but the errno might not be set to
> anything sensible anymore.

Oops.

> >0003 Sinval/notify processing got simplified further. There really isn't
> > any need for DisableNotifyInterrupt/DisableCatchupInterrupt
> > anymore. Also begin_client_read/client_read_ended don't make much
> > sense anymore. Instead introduce ProcessClientReadInterrupt (which
> > wants a better name).
> >There's also a very WIP
> >0004 Allows secure_read/write be interrupted when ProcDiePending is
> > set. All of that happens via the latch mechanism, nothing happens
> > inside signal handlers. So I do think it's quite an improvement
> > over what's been discussed in this thread.
> > But it (and the other approaches) do noticeably increase the
> > likelihood of clients not getting the error message if the client
> > isn't actually dead. The likelihood of write() being blocked
> > *temporarily* due to normal bandwidth constraints is quite high
> > when you consider COPY FROM and similar. Right now we'll wait till
> > we can put the error message into the socket afaics.
> >
> >1-3 need some serious comment work, but I think the approach is
> >basically sound. I'm much, much less sure about allowing send() to be
> >interrupted.
>
> Yeah, 1-3 seem sane.

I think 3 also needs a careful look. Have you looked through it? While
imo much less complex than before, there's some complex interactions in
the touched code. And we have terrible coverage of both catchup
interrupts and notify stuff...

Tom, do you happen to have time to look at that bit?

There's also the concern that using a latch for client communication
increases the number of syscalls for the same work. We should at least
try to quantify that...

> 4 also looks OK to me at a quick glance. It basically
> enables handling the "die" interrupt immediately, if we're blocked in a read
> or write. It won't be handled in the signal handler, but within the
> secure_read/write call anyway.

What are you thinking about the concern that it'll reduce the likelihood
of transferring the error message to the client? I tried to reduce that
by only allowing errors when write() blocks, but that's not an
infrequent event.

Thanks for looking.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, <robertmhaas(at)gmail(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Escaping from blocked send() reprised.
Date:	2014-10-03 15:29:23
Message-ID:	542EC0D3.6030209@vmware.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 10/03/2014 05:26 PM, Andres Freund wrote:
> On 2014-10-03 17:12:18 +0300, Heikki Linnakangas wrote:
>> On 09/28/2014 01:54 AM, Andres Freund wrote:
>>> 0003 Sinval/notify processing got simplified further. There really isn't
>>> any need for DisableNotifyInterrupt/DisableCatchupInterrupt
>>> anymore. Also begin_client_read/client_read_ended don't make much
>>> sense anymore. Instead introduce ProcessClientReadInterrupt (which
>>> wants a better name).
>>> There's also a very WIP
>>> 0004 Allows secure_read/write be interrupted when ProcDiePending is
>>> set. All of that happens via the latch mechanism, nothing happens
>>> inside signal handlers. So I do think it's quite an improvement
>>> over what's been discussed in this thread.
>>> But it (and the other approaches) do noticeably increase the
>>> likelihood of clients not getting the error message if the client
>>> isn't actually dead. The likelihood of write() being blocked
>>> *temporarily* due to normal bandwidth constraints is quite high
>>> when you consider COPY FROM and similar. Right now we'll wait till
>>> we can put the error message into the socket afaics.
>>>
>>> 1-3 need some serious comment work, but I think the approach is
>>> basically sound. I'm much, much less sure about allowing send() to be
>>> interrupted.
>>
>> Yeah, 1-3 seem sane.
>
> I think 3 also needs a careful look. Have you looked through it? While
> imo much less complex than before, there's some complex interactions in
> the touched code. And we have terrible coverage of both catchup
> interrupts and notify stuff...

I only looked at the .patch, I didn't apply it, so I didn't look at the
context much. But I don't see any fundamental problem with it. I would
like to have a closer look before it's committed, though.

> There's also the concern that using a latch for client communication
> increases the number of syscalls for the same work. We should at least
> try to quantify that...

I'm not too concerned about that, since we only do extra syscalls when
the socket isn't immediately available for reading/writing, i.e. when we
have to sleep anyway.

>> 4 also looks OK to me at a quick glance. It basically
>> enables handling the "die" interrupt immediately, if we're blocked in a read
>> or write. It won't be handled in the signal handler, but within the
>> secure_read/write call anyway.
>
> What are you thinking about the concern that it'll reduce the likelihood
> of transferring the error message to the client? I tried to reduce that
> by only allowing errors when write() blocks, but that's not an
> infrequent event.

I'm not too concerned about that either. I mean, it's probably true that
it reduces the likelihood, but I don't particularly care myself. But if
we care, we could use a timeout there, so that if we receive a SIGTERM
while blocked on a send(), we wait for a few seconds to see if we can
send whatever we were sending, before terminating the backend.

What should we do with this patch in the commitfest? Are you planning to
clean up and commit these patches?

- Heikki

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, robertmhaas(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Escaping from blocked send() reprised.
Date:	2014-10-05 22:16:38
Message-ID:	20141005221638.GB3462@awork2.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2014-10-03 18:29:23 +0300, Heikki Linnakangas wrote:
> On 10/03/2014 05:26 PM, Andres Freund wrote:
> >On 2014-10-03 17:12:18 +0300, Heikki Linnakangas wrote:
> >>On 09/28/2014 01:54 AM, Andres Freund wrote:
> >>>0003 Sinval/notify processing got simplified further. There really isn't
> >>> any need for DisableNotifyInterrupt/DisableCatchupInterrupt
> >>> anymore. Also begin_client_read/client_read_ended don't make much
> >>> sense anymore. Instead introduce ProcessClientReadInterrupt (which
> >>> wants a better name).
> >>>There's also a very WIP
> >>>0004 Allows secure_read/write be interrupted when ProcDiePending is
> >>> set. All of that happens via the latch mechanism, nothing happens
> >>> inside signal handlers. So I do think it's quite an improvement
> >>> over what's been discussed in this thread.
> >>> But it (and the other approaches) do noticeably increase the
> >>> likelihood of clients not getting the error message if the client
> >>> isn't actually dead. The likelihood of write() being blocked
> >>> *temporarily* due to normal bandwidth constraints is quite high
> >>> when you consider COPY FROM and similar. Right now we'll wait till
> >>> we can put the error message into the socket afaics.
> >>>
> >>>1-3 need some serious comment work, but I think the approach is
> >>>basically sound. I'm much, much less sure about allowing send() to be
> >>>interrupted.
> >>
> >>Yeah, 1-3 seem sane.
> >
> >I think 3 also needs a careful look. Have you looked through it? While
> >imo much less complex than before, there's some complex interactions in
> >the touched code. And we have terrible coverage of both catchup
> >interrupts and notify stuff...
>
> I only looked at the .patch, I didn't apply it, so I didn't look at the
> context much. But I don't see any fundamental problem with it. I would like
> to have a closer look before it's committed, though.

I'd appreciate that. I don't want to commit it without a careful review
of another committer.

> >There's also the concern that using a latch for client communication
> >increases the number of syscalls for the same work. We should at least
> >try to quantify that...
>
> I'm not too concerned about that, since we only do extra syscalls when the
> socket isn't immediately available for reading/writing, i.e. when we have to
> sleep anyway.

Well, kernels actually do some nice optimizations for blocking reads -
at least for local sockets. Like switching to the other process
immediately and such.
I'm not super concerned either, but I think we should try to measure
it. And if we're failing, we probably should try to address these
problems - if possible in the latch code.

> >>4 also looks OK to me at a quick glance. It basically
> >>enables handling the "die" interrupt immediately, if we're blocked in a read
> >>or write. It won't be handled in the signal handler, but within the
> >>secure_read/write call anyway.
> >
> >What are you thinking about the concern that it'll reduce the likelihood
> >of transferring the error message to the client? I tried to reduce that
> >by only allowing errors when write() blocks, but that's not an
> >infrequent event.
>
> I'm not too concerned about that either. I mean, it's probably true that it
> reduces the likelihood, but I don't particularly care myself. But if we
> care, we could use a timeout there, so that if we receive a SIGTERM while
> blocked on a send(), we wait for a few seconds to see if we can send
> whatever we were sending, before terminating the backend.
>
>
> What should we do with this patch in the commitfest?

I think feature should be ddeclare as 'returned with feedback' for this
commitfest. I've done so.

I don't see much of a reason waiting with patch 1 till the next
commitfest. It imo looks uncontroversial and doesn't have any far
reaching consequences.

> Are you planning to clean up and commit these patches?

I plan to do so one by one, yes. If you'd like to pick up any of them,
feel free to do (after telling me, to avoid duplicated efforts). I don't
feel proprietary about any of them. But I guess you have other stuff
you'd like to work on too ;)

I'll try to send out a version with the stuff you mentioned earlier in
the next couple days.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	andres(at)2ndquadrant(dot)com
Cc:	hlinnakangas(at)vmware(dot)com, robertmhaas(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Escaping from blocked send() reprised.
Date:	2014-10-09 05:06:35
Message-ID:	20141009.140635.18322240.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hello, simplly inhibit set retry flag when ProcDiePending in
my_sock_write seems enough.

But it returns with SSL_ERROR_SYSCALL not SSL_ERROR_WANT_WRITE so
I modified the patch 4 as the attached patch.

Finally, the attached patch works as expected with Andres's patch
1-3.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment	Content-Type	Size
0004-Process_die_intr_while_writing.patch	text/x-patch	2.5 KB

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc:	hlinnakangas(at)vmware(dot)com, robertmhaas(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Escaping from blocked send() reprised.
Date:	2014-10-09 11:26:32
Message-ID:	20141009112632.GA29124@awork2.int
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2014-10-09 14:06:35 +0900, Kyotaro HORIGUCHI wrote:
> Hello, simplly inhibit set retry flag when ProcDiePending in
> my_sock_write seems enough.
>
> But it returns with SSL_ERROR_SYSCALL not SSL_ERROR_WANT_WRITE so
> I modified the patch 4 as the attached patch.

Why is that necessary? It seems really rather wrong to make
BIO_set_retry_write() dependant on ProcDiePending? Especially as, at
least in my testing, it's not even required because the be_tls_write()
can just check the error properly?

> diff --git a/src/backend/libpq/be-secure-openssl.c b/src/backend/libpq/be-secure-openssl.c
> index 6fc6903..2288fe2 100644
> --- a/src/backend/libpq/be-secure-openssl.c
> +++ b/src/backend/libpq/be-secure-openssl.c
> @@ -750,7 +750,8 @@ my_sock_write(BIO *h, const char *buf, int size)
> BIO_clear_retry_flags(h);
> if (res <= 0)
> {
> - if (errno == EINTR || errno == EWOULDBLOCK || errno == EAGAIN)
> + if (!ProcDiePending &&
> + (errno == EINTR || errno == EWOULDBLOCK || errno == EAGAIN))
> {
> BIO_set_retry_write(h);
> }

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	andres(at)2ndquadrant(dot)com
Cc:	hlinnakangas(at)vmware(dot)com, robertmhaas(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Escaping from blocked send() reprised.
Date:	2014-10-10 06:48:56
Message-ID:	20141010.154856.252680989.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hmm.. Sorry for my stupidity.

> Why is that necessary? It seems really rather wrong to make
> BIO_set_retry_write() dependant on ProcDiePending? Especially as, at
> least in my testing, it's not even required because the be_tls_write()
> can just check the error properly?

I mistook the previous conversation as it doesn't work as
expected. I confirmed that it works fine.

After all, it works as I expected. The parameter for
ProcessClientWriteInterrupt() looks somewhat uneasy but the patch
4 looks fine as a whole. Do you have anything to worry about in
the patch?

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

From:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, <robertmhaas(at)gmail(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Escaping from blocked send() reprised.
Date:	2014-10-30 13:27:13
Message-ID:	54523CB1.7030808@vmware.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 10/03/2014 06:29 PM, Heikki Linnakangas wrote:
> On 10/03/2014 05:26 PM, Andres Freund wrote:
>> On 2014-10-03 17:12:18 +0300, Heikki Linnakangas wrote:
>>> On 09/28/2014 01:54 AM, Andres Freund wrote:
>>>> 0003 Sinval/notify processing got simplified further. There really isn't
>>>> any need for DisableNotifyInterrupt/DisableCatchupInterrupt
>>>> anymore. Also begin_client_read/client_read_ended don't make much
>>>> sense anymore. Instead introduce ProcessClientReadInterrupt (which
>>>> wants a better name).
>>>> There's also a very WIP
>>>> 0004 Allows secure_read/write be interrupted when ProcDiePending is
>>>> set. All of that happens via the latch mechanism, nothing happens
>>>> inside signal handlers. So I do think it's quite an improvement
>>>> over what's been discussed in this thread.
>>>> But it (and the other approaches) do noticeably increase the
>>>> likelihood of clients not getting the error message if the client
>>>> isn't actually dead. The likelihood of write() being blocked
>>>> *temporarily* due to normal bandwidth constraints is quite high
>>>> when you consider COPY FROM and similar. Right now we'll wait till
>>>> we can put the error message into the socket afaics.
>>>>
>>>> 1-3 need some serious comment work, but I think the approach is
>>>> basically sound. I'm much, much less sure about allowing send() to be
>>>> interrupted.
>>>
>>> Yeah, 1-3 seem sane.
>>
>> I think 3 also needs a careful look. Have you looked through it? While
>> imo much less complex than before, there's some complex interactions in
>> the touched code. And we have terrible coverage of both catchup
>> interrupts and notify stuff...
>
> I only looked at the .patch, I didn't apply it, so I didn't look at the
> context much. But I don't see any fundamental problem with it. I would
> like to have a closer look before it's committed, though.

About patch 3:

Looking closer, this design still looks OK to me. As you said yourself,
the comments need some work (e.g. the step 5. in the top comment in
async.c needs updating). And then there are a couple of FIXME and XXX
comments that need to be addressed.

The comment on PGPROC.procLatch in storage/proc.h says just this:

> Latch procLatch; /* generic latch for process */

This needs a lot more explaining. It's now used by signal handlers to
interrupt a read or write to the socket; that should be mentioned. What
else is it used for? (for waking up a backend in synchronous
replication, at least) What are the rules on when to arm it and when to
reset it?

Would it be more clear to use a separate, backend-private, latch, for
the signals? I guess that won't work, though, because sometimes we need
need to wait for a wakeup from a different process or from a signal at
the same time (SyncRepWaitForLSN() in particular). Not without adding a
variant of WaitLatch that can wait on two latches simultaneously, anyway.

The assumption in secure_raw_read that MyProc exists is pretty
surprising. I understand why it's that way, and there's a comment in
PostgresMain explaining why the socket cannot be put into non-blocking
mode earlier, but it's still a bit whacky. Not sure what to do about that.

- Heikki

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, robertmhaas(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Escaping from blocked send() reprised.
Date:	2014-10-30 13:37:30
Message-ID:	20141030133730.GD8151@alap3.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2014-10-30 15:27:13 +0200, Heikki Linnakangas wrote:
> The comment on PGPROC.procLatch in storage/proc.h says just this:
>
> > Latch procLatch; /* generic latch for process */
>
> This needs a lot more explaining. It's now used by signal handlers to
> interrupt a read or write to the socket; that should be mentioned. What else
> is it used for? (for waking up a backend in synchronous replication, at
> least) What are the rules on when to arm it and when to reset it?

Hm. I agree it use expaned commentary, but I'm unsure if that much
detail is really warranted. Any such documentation seems to be almost
guaranteed to be out of date. As evidenced that there's none to date...

> Would it be more clear to use a separate, backend-private, latch, for the
> signals? I guess that won't work, though, because sometimes we need need to
> wait for a wakeup from a different process or from a signal at the same time
> (SyncRepWaitForLSN() in particular). Not without adding a variant of
> WaitLatch that can wait on two latches simultaneously, anyway.

I wondered the same, but I don't really see what it'd buy us during
normal running. It seems like it'd just make code more complex without
leading to relevantly fewer wakeups.

> The assumption in secure_raw_read that MyProc exists is pretty surprising. I
> understand why it's that way, and there's a comment in PostgresMain
> explaining why the socket cannot be put into non-blocking mode earlier, but
> it's still a bit whacky. Not sure what to do about that.

It makes me quite unhappy too. I looked what it'd take to make proclatch
available earlier, but it wasn't pretty. I wondered whether we could use
a 'early proc latch' in MyProcLatch that used until we're fully attached
to shared memory. Then, when attaching to shared memory we'd set the old
latch once, and reset MyProcLatch to the shared memory one.

But that's pretty ugly.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Alex Shulgin <ash(at)commandprompt(dot)com>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, robertmhaas(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Escaping from blocked send() reprised.
Date:	2014-11-17 15:22:54
Message-ID:	87y4r9n969.fsf@commandprompt.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Andres Freund <andres(at)2ndquadrant(dot)com> writes:
>
> I've invested some more time in this:
> 0002 now makes sense on its own and doesn't change anything around the
> interrupt handling. Oh, and it compiles without 0003.

In this patch, the endif appears to be misplaced in PostgresMain:

+ if (MyProcPort != NULL)
+ {
+#ifdef WIN32
+ pgwin32_noblock = true;
+#else
+ if (!pg_set_noblock(MyProcPort->sock))
+ ereport(COMMERROR,
+ (errmsg("could not set socket to nonblocking mode: %m")));
+ }
+#endif
+
pqinitmask();

After re-reading these I don't see the rest of items I wanted to inqury
about anymore, so it just makes more sense now.

One thing I did try is sending a NOTICE to the client when in
ProcessInterrupts() and DoingCommandRead is true. I think[1] it was
expected to be delivered instantly, but actually the client (psql) only
displays it after sending the next statement.

While I'm reading on FE/BE protocol someone might want to share his
wisdom on this subject. My guess: psql blocks on readline/libedit call
and can't effectively poll the server socket before complete input from
user.

--
Alex

[1] http://www.postgresql.org/message-id/1262173040.19367.5015.camel@ebony

``AFAIK, NOTICE was suggested because it can be sent at any time,
whereas ERRORs are only associated with statements.''

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc:	Andres Freund <andres(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Escaping from blocked send() reprised.
Date:	2014-12-15 03:37:34
Message-ID:	CAB7nPqQabVZMzwK75eGoHt05EuX9WiEu7VQBbn-i77faHqg19w@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Oct 30, 2014 at 10:27 PM, Heikki Linnakangas
<hlinnakangas(at)vmware(dot)com> wrote:
> On 10/03/2014 06:29 PM, Heikki Linnakangas wrote:
>> [blah]
> About patch 3:
> Looking closer, this design still looks OK to me. As you said yourself, the
> comments need some work (e.g. the step 5. in the top comment in async.c
> needs updating). And then there are a couple of FIXME and XXX comments that
> need to be addressed.
Those patches have not been updated in a while, and I am seeing some
feedback from several people, hence returning as returned with
feedback. Horiguchi-san, feel free to add new entries on the CF app in
2014-12 or move this entry if you feel overwise.
Regards,
--
Michael

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	andres(at)2ndquadrant(dot)com
Cc:	hlinnakangas(at)vmware(dot)com, robertmhaas(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Escaping from blocked send() reprised.
Date:	2014-12-15 09:19:26
Message-ID:	20141215.181926.72089914.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Since I don't have clear idea how to promote this, I will remake
and be back with new patch based on Andres' for patches.

> Hmm.. Sorry for my stupidity.
>
> > Why is that necessary? It seems really rather wrong to make
> > BIO_set_retry_write() dependant on ProcDiePending? Especially as, at
> > least in my testing, it's not even required because the be_tls_write()
> > can just check the error properly?
>
> I mistook the previous conversation as it doesn't work as
> expected. I confirmed that it works fine.
>
> After all, it works as I expected. The parameter for
> ProcessClientWriteInterrupt() looks somewhat uneasy but the patch
> 4 looks fine as a whole. Do you have anything to worry about in
> the patch?

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc:	hlinnakangas(at)vmware(dot)com, robertmhaas(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Escaping from blocked send() reprised.
Date:	2014-12-15 09:58:37
Message-ID:	20141215095837.GD5023@alap3.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2014-12-15 18:19:26 +0900, Kyotaro HORIGUCHI wrote:
> Since I don't have clear idea how to promote this, I will remake
> and be back with new patch based on Andres' for patches.

Do my patches miss any functionality you want?

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	andres(at)2ndquadrant(dot)com
Cc:	hlinnakangas(at)vmware(dot)com, robertmhaas(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Escaping from blocked send() reprised.
Date:	2014-12-17 03:39:22
Message-ID:	20141217.123922.69865806.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hello,

> On 2014-12-15 18:19:26 +0900, Kyotaro HORIGUCHI wrote:
> > Since I don't have clear idea how to promote this, I will remake
> > and be back with new patch based on Andres' for patches.
>
> Do my patches miss any functionality you want?

The patch satisfies what I want, as I said upthread. What I don't
know is how I can go on with it in this CF topic, in other word
what should I do in order to put it to "ready for committer"?

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, robertmhaas(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Escaping from blocked send() reprised.
Date:	2015-01-08 13:17:32
Message-ID:	20150108131732.GD12509@alap3.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2014-10-03 16:26:35 +0200, Andres Freund wrote:
> On 2014-10-03 17:12:18 +0300, Heikki Linnakangas wrote:
> > >0002 now makes sense on its own and doesn't change anything around the
> > > interrupt handling. Oh, and it compiles without 0003.
> >
> > WaitLatchOrSocket() can throw an error, so it's not totally safe to call
> > that underneath OpenSSL.
>
> Hm. Fair point.

I think we should fix this by simply prohibiting
WaitLatch/WaitLatchOrSocket from ERRORing out. The easiest, and imo
acceptable, thing is to simply convert the relevant ERRORs to FATAL. I
think that'd be perfectly fine as it seems very unlikely that we
continue sanely afterwards.

It would really be nice if we had a simple way to raise a FATAL that
won't go to the client for situations like this. I'd proposed
elog(FATAL | COMERROR, ...) in the past...

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Alex Shulgin <ash(at)commandprompt(dot)com>
Cc:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, robertmhaas(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Escaping from blocked send() reprised.
Date:	2015-01-10 01:59:07
Message-ID:	20150110015907.GE12509@alap3.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2014-11-17 18:22:54 +0300, Alex Shulgin wrote:
>
> Andres Freund <andres(at)2ndquadrant(dot)com> writes:
> >
> > I've invested some more time in this:
> > 0002 now makes sense on its own and doesn't change anything around the
> > interrupt handling. Oh, and it compiles without 0003.
>
> In this patch, the endif appears to be misplaced in PostgresMain:
>
> + if (MyProcPort != NULL)
> + {
> +#ifdef WIN32
> + pgwin32_noblock = true;
> +#else
> + if (!pg_set_noblock(MyProcPort->sock))
> + ereport(COMMERROR,
> + (errmsg("could not set socket to nonblocking mode: %m")));
> + }
> +#endif
> +

Uh. Odd. Anyway, that bit of code is now somewhere else anyway...

> One thing I did try is sending a NOTICE to the client when in
> ProcessInterrupts() and DoingCommandRead is true. I think[1] it was
> expected to be delivered instantly, but actually the client (psql) only
> displays it after sending the next statement.

Yea, that should be psql specific though. I hope ;)

> While I'm reading on FE/BE protocol someone might want to share his
> wisdom on this subject. My guess: psql blocks on readline/libedit call
> and can't effectively poll the server socket before complete input from
> user.

I'm not sure if it's actually a "can't". It doesn't at least ;)

Greetings,

Andres Freund

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, robertmhaas(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Escaping from blocked send() reprised.
Date:	2015-01-10 02:25:42
Message-ID:	20150110022542.GF12509@alap3.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2014-09-28 00:54:21 +0200, Andres Freund wrote:
> I've invested some more time in this:

And yet round of time spent.

The major change since last round is that I've introduced a local latch
that exists in every process. If InitProcess() is run, that latch is
replaced by the shared process latch and the reverse happens in
ProcKill. To implement this, I decided to remove some code duplication
around process initialization.

The major reason to do is that this allows us to get rid of the special
case where MyProc isn't available yet during early process startup,
where the socket still had to be in blocking mode. Instead we can now
rely on the latch all the time.

Other than that I've significantly cleaned up and tested the
patchset. Unfortunately that testing has brought a significant number of
SSL bugs to light, but they all turned out to be independent of these
changes. I'll try to detail them in a separate email early next week.

I've done some performance measurements, to verify this doesn't cause a
regression. When testing with 'SELECT 1' as a trivial statement I
couldn't measure any regression at 32 pgbench clients/threads on my 4
core (i7-4800MQ) laptop. At 390 clients I saw about 1.6%. With statement
that actually does something that difference vanished. Personally I'm
ok with that.

The patches are:

0001-Allow-latches-to-wait-for-socket-writability-without.patch
Imo pretty close to commit and can be committed independently.

0002-Commonalize-process-startup-code.patch
The above mentioned deduplication. Needs a review (completely new),
but it's imo a clear improvement and allows for a fair amount of
future/further deduplication.

0003-Add-a-default-local-latch-for-use-in-signal-handlers.patch
Adds the default latch + the logic to switch to shared latch while
attached. I think it's a good idea, but I'd like some feedback.

0004-Use-a-nonblocking-socket-for-FE-BE-communication-and.patch
The earlier patch that converts the communication to use latches
with some added cleanup/improvements. Specifically we don't have to
rely on MyProc->procLatch anymore due to 0003 which gets rid of
some ugly code in be-secure.c and postgres.c

I think this patch might not be safe without 0005 because I can't
convince myself that it's safe to interrupt latch waits with work
that actually might also use the same latch (sinval
interrupts). But it's easier to review this way.

0005-Introduce-and-use-infrastructure-for-interrupt-proce.patch
Actually move most of sinval.c/async.c's interrupt handling out of
the signal handlers and use the latches. This is the cleaned up
version of the earlier commit.

0006-WIP-Process-die-interrupts-while-reading-writing-fro.patch
Not much has changed, needs at least some comments. I want to get
the other stuff done first.

There remains one 'FIXME' after all the patches, which is
interactive_getc() won't react to catchup/notify interrupts - which, as
far as I can see, is fine, as there are none.

Comments?

Greetings,

Andres Freund

Attachment	Content-Type	Size
0001-Allow-latches-to-wait-for-socket-writability-without.patch	text/x-patch	5.8 KB
0002-Commonalize-process-startup-code.patch	text/x-patch	22.1 KB
0003-Add-a-default-local-latch-for-use-in-signal-handlers.patch	text/x-patch	20.3 KB
0004-Use-a-nonblocking-socket-for-FE-BE-communication-and.patch	text/x-patch	7.5 KB
0005-Introduce-and-use-infrastructure-for-interrupt-proce.patch	text/x-patch	38.0 KB
0006-WIP-Process-die-interrupts-while-reading-writing-fro.patch	text/x-patch	4.4 KB

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Escaping from blocked send() reprised.
Date:	2015-01-10 16:35:02
Message-ID:	20150110163502.GM12509@alap3.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2014-09-04 08:49:22 -0400, Robert Haas wrote:
> On Tue, Sep 2, 2014 at 3:01 PM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> > I'm slightly worried about the added overhead due to the latch code. In
> > my implementation I only use latches after a nonblocking read, but
> > still. Every WaitLatchOrSocket() does a drainSelfPipe(). I wonder if
> > that can be made problematic.
>
> I think that's not the word you're looking for.

There's a "less" missing...

> At some point I hacked up a very crude prototype that made LWLocks use
> latches to sleep instead of semaphores. It was slow.

Interesting. I dimly remembered you mentioning this, that's how I
rediscovered this message.

Do you remember any details?

My guess that's not so much the overhead of the latch itself, but the
lack of the directed wakeup stuff the OS provides for semaphores.

If we could replace all usages of semaphores that set immediate
interrupts to ok, we could quite easily make the deadlock detector
et. al. run outside of signal handlers. That would imo make it more
robust, and easier to understand - right now the correctness of locking
done in the deadlock detector isn't obvious. With the infrastructure in
place it'd also allow your new parallelism code to run outside of signal
handlers.

Unfortunately currently sempahores can't be unlocked in a signal handler
(as sysv semaphores aren't signal safe)... It'd also be not so nice to
set both a latch and semaphores in every signal handler.

> AIUI, the only reason why we need the self-pipe thing is because on
> some platforms signals don't interrupt system calls. But my
> impression was that those platforms were somewhat obscure.

To the contrary, I think it's only very obscure platforms where signals
still interrupt syscalls - we set SA_RESTART for pretty much
everything. There's a couple of system calls that ignore SA_RESTART. For
some that's defined in posix, for others it's operating system
specific. E.g. on linux semop(), poll(), select() are defined to always
return EINTR when interrupted.

Anyway, the discussion since cleared up that we need the self byte to
handle a race, anyway.

> Basically, it doesn't feel like a good thing that we've got two sets
> of primitives for making a backend wait that (1) don't really know
> about each other and (2) use different operating system primitives.
> Presumably one of the two systems is better; let's figure out which
> one it is, use that one all the time, and get rid of the other one.

I think the latch interface is clearly better for what we use
sema/latches for as it allows to wait for signals (latch sets), a socket
and timeouts. So let's try to figure out how to make it perform
comparably or better than semaphores.

There's imo only one semaphore user that can't trivially be replaced by
latches: the semaphore spinlock emulation. Both proc.c and and lwlock.c
can be converted quite easily - in the latter case, it might actually
end up saving some code.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Noah Misch <noah(at)leadboat(dot)com>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, robertmhaas(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Escaping from blocked send() reprised.
Date:	2015-01-11 21:36:07
Message-ID:	20150111213607.GA2722746@tornado.leadboat.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sat, Jan 10, 2015 at 03:25:42AM +0100, Andres Freund wrote:
> 0001-Allow-latches-to-wait-for-socket-writability-without.patch
> Imo pretty close to commit and can be committed independently.

The key open question is whether all platforms of interest can reliably detect
end-of-file when poll()ing or select()ing for write only. Older GNU/Linux
select() cannot; see attached test program. We use poll() there anyway, so
the bug in that configuration does not affect PostgreSQL. Is it a bellwether
of similar bugs in other implementations, bugs that will affect PostgreSQL?

> This previously had explicitly been forbidden in e42a21b9e6c9, as
> there was no use case at that point. We now are looking into making
> FE/BE communication use latches, so it

Truncated sentence.

> + if (pfds[0].revents & (POLLHUP | POLLERR | POLLNVAL))
> + {
> + /* EOF/error condition */
> + if (wakeEvents & WL_SOCKET_READABLE)
> + result |= WL_SOCKET_READABLE;
> + if (wakeEvents & WL_SOCKET_WRITEABLE)
> + result |= WL_SOCKET_WRITEABLE;
> + }

With some poll() implementations (e.g. OS X), this can wrongly report
WL_SOCKET_WRITEABLE if the peer used shutdown(SHUT_WR). I tentatively think
that's acceptable. libpq does not use shutdown(), and other client interfaces
would do so at their own risk. Should we worry about hostile clients creating
a denial-of-service by causing a server send() to block unexpectedly?
Probably not; a user able to send arbitrary TCP traffic to the postmaster port
can already achieve that.

> + if (resEvents.lNetworkEvents & FD_CLOSE)
> + {
> + if (wakeEvents & WL_SOCKET_READABLE)
> + result |= WL_SOCKET_READABLE;
> + if (wakeEvents & WL_SOCKET_WRITEABLE)
> + result |= WL_SOCKET_WRITEABLE;
> + }
> +
> }

Extra blank line.

Attachment	Content-Type	Size
select-writeonly.c	text/plain	3.0 KB

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Escaping from blocked send() reprised.
Date:	2015-01-11 21:47:53
Message-ID:	CA+TgmoaEOURG53jH9w76Ec8-tOiW2m2ATm1Y340hp3TAs5dj-w@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sat, Jan 10, 2015 at 11:35 AM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> Interesting. I dimly remembered you mentioning this, that's how I
> rediscovered this message.
>
> Do you remember any details?

No, not really.

> My guess that's not so much the overhead of the latch itself, but the
> lack of the directed wakeup stuff the OS provides for semaphores.

That's possible.

> If we could replace all usages of semaphores that set immediate
> interrupts to ok, we could quite easily make the deadlock detector
> et. al. run outside of signal handlers. That would imo make it more
> robust, and easier to understand - right now the correctness of locking
> done in the deadlock detector isn't obvious. With the infrastructure in
> place it'd also allow your new parallelism code to run outside of signal
> handlers.

Yes, I would be very happy to see ImmediateInterruptOK die in a fire.

> Unfortunately currently sempahores can't be unlocked in a signal handler
> (as sysv semaphores aren't signal safe)... It'd also be not so nice to
> set both a latch and semaphores in every signal handler.

Agreed.

>> AIUI, the only reason why we need the self-pipe thing is because on
>> some platforms signals don't interrupt system calls. But my
>> impression was that those platforms were somewhat obscure.
>
> To the contrary, I think it's only very obscure platforms where signals
> still interrupt syscalls - we set SA_RESTART for pretty much
> everything. There's a couple of system calls that ignore SA_RESTART. For
> some that's defined in posix, for others it's operating system
> specific. E.g. on linux semop(), poll(), select() are defined to always
> return EINTR when interrupted.

The recent problems with test_shm_mq test failing on anole were caused
by the fact that a signal doesn't abort select() on that platform, but
does reset the timer. So a steady stream of signals results in never
reaching the timeout.

> Anyway, the discussion since cleared up that we need the self byte to
> handle a race, anyway.

Eh?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Noah Misch <noah(at)leadboat(dot)com>
Cc:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, robertmhaas(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Escaping from blocked send() reprised.
Date:	2015-01-11 23:40:50
Message-ID:	20150111234050.GA459@awork2.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2015-01-11 16:36:07 -0500, Noah Misch wrote:
> On Sat, Jan 10, 2015 at 03:25:42AM +0100, Andres Freund wrote:
> > 0001-Allow-latches-to-wait-for-socket-writability-without.patch
> > Imo pretty close to commit and can be committed independently.
>
> The key open question is whether all platforms of interest can reliably detect
> end-of-file when poll()ing or select()ing for write only. Older GNU/Linux
> select() cannot; see attached test program.

Yuck. By my reading that's a violation of posix.

I did test it a bit, and I didn't see problems, but that obviously
doesn't say much about old versions.

Afaics we interestingly don't have any poll-less buildfarm animals that
use unix_latch.c...

> We use poll() there anyway, so the bug in that configuration does not
> affect PostgreSQL. Is it a bellwether of similar bugs in other
> implementations, bugs that will affect PostgreSQL?

Hm. I can think of two stopgap measures we could add:

1) If we're using select() and WL_SOCKET_WRITEABLE is set without
_READABLE, add a timeout of Min(1s, Max(passed_timeout, 1s)). As the
time spent waiting only for writable normally shouldn't be very long,
that shouldn't be noticeably bad for power usage.
2) Add a SIGPIPE handler that just does a SetLatch(MyLach).

> > This previously had explicitly been forbidden in e42a21b9e6c9, as
> > there was no use case at that point. We now are looking into making
> > FE/BE communication use latches, so it
>
> Truncated sentence.

Fixed in what I've since pushed (as Heikki basically was ok with the
patch sent a couple months back, modulo some fixes)...

> > + if (pfds[0].revents & (POLLHUP | POLLERR | POLLNVAL))
> > + {
> > + /* EOF/error condition */
> > + if (wakeEvents & WL_SOCKET_READABLE)
> > + result |= WL_SOCKET_READABLE;
> > + if (wakeEvents & WL_SOCKET_WRITEABLE)
> > + result |= WL_SOCKET_WRITEABLE;
> > + }
>
> With some poll() implementations (e.g. OS X), this can wrongly report
> WL_SOCKET_WRITEABLE if the peer used shutdown(SHUT_WR). I tentatively think
> that's acceptable. libpq does not use shutdown(), and other client interfaces
> would do so at their own risk. Should we worry about hostile clients creating
> a denial-of-service by causing a server send() to block unexpectedly?
> Probably not; a user able to send arbitrary TCP traffic to the postmaster port
> can already achieve that.

Yea, this doesn't seem particularly concerning.

a) They can just stop consuming writes and use noticeable amounts of
memory by doing output intensive queries. That uses significant os
resources and is much harder to detect - today.

b) does accepting WL_SOCKET_WRITEABLE without _READABLE change anything
here? We already allow _WRITABLE... What happens if you write/send() in
that state, btw?

Greetings,

Andres Freund

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Escaping from blocked send() reprised.
Date:	2015-01-11 23:53:55
Message-ID:	20150111235355.GB459@awork2.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2015-01-11 16:47:53 -0500, Robert Haas wrote:
> > My guess that's not so much the overhead of the latch itself, but the
> > lack of the directed wakeup stuff the OS provides for semaphores.
>
> That's possible.

> On Sat, Jan 10, 2015 at 11:35 AM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> > Interesting. I dimly remembered you mentioning this, that's how I
> > rediscovered this message.
> >
> > Do you remember any details?
>
> No, not really.

I've done a hackish conversion of proc.c to use semaphores and in a
sleeping heavy workload (pgbench -j 390 against a scale 1 db) there
originally was about a %5 regression (fluctuating a fair bit).

I've played with hacking unix_latch.c to

a) use eventfd() for the local latch and (no performance difference
anymore)

b) Adding a eventfd to struct Latch for latches that are created from
postmaster. That fd can directly be written to by other processes
(combined slightly faster than semaphores).

Not sure how much b) is acceptable, due to using MaxBackend fds. And
both only help linux.

> > If we could replace all usages of semaphores that set immediate
> > interrupts to ok, we could quite easily make the deadlock detector
> > et. al. run outside of signal handlers. That would imo make it more
> > robust, and easier to understand - right now the correctness of locking
> > done in the deadlock detector isn't obvious. With the infrastructure in
> > place it'd also allow your new parallelism code to run outside of signal
> > handlers.
>
> Yes, I would be very happy to see ImmediateInterruptOK die in a fire.

Yea, I think it's a absolutely horrible idea.

> > Unfortunately currently sempahores can't be unlocked in a signal handler
> > (as sysv semaphores aren't signal safe)... It'd also be not so nice to
> > set both a latch and semaphores in every signal handler.
>
> Agreed.

I've since (re-)realised that we've actually relied on semaphore being
signal safe for years. The PGSemaphoreLock() in proc.c allows
interrupts. And the deadlock detector then uses lwlocks, which in turn
use semaphores. That sucks.

> > Anyway, the discussion since cleared up that we need the self byte to
> > handle a race, anyway.
>
> Eh?

I'm referring to the point Heikki has made in his second paragraph in
5408638C(dot)1080308(at)vmware(dot)com .

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Noah Misch <noah(at)leadboat(dot)com>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, robertmhaas(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Escaping from blocked send() reprised.
Date:	2015-01-12 00:37:53
Message-ID:	20150112003753.GB2722746@tornado.leadboat.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Jan 12, 2015 at 12:40:50AM +0100, Andres Freund wrote:
> On 2015-01-11 16:36:07 -0500, Noah Misch wrote:
> > On Sat, Jan 10, 2015 at 03:25:42AM +0100, Andres Freund wrote:
> > > 0001-Allow-latches-to-wait-for-socket-writability-without.patch
> > > Imo pretty close to commit and can be committed independently.
> >
> > The key open question is whether all platforms of interest can reliably detect
> > end-of-file when poll()ing or select()ing for write only. Older GNU/Linux
> > select() cannot; see attached test program.
>
> Yuck. By my reading that's a violation of posix.

Agreed.

> I did test it a bit, and I didn't see problems, but that obviously
> doesn't say much about old versions.
>
> Afaics we interestingly don't have any poll-less buildfarm animals that
> use unix_latch.c...

More likely is that some system will have a poll() exhibiting the same bug,
possibly via poll() being a wrapper around select(). Systems without poll()
are hard to find; the gnulib manual lists only Windows, BeOS and HP NonStop
OS. HP NonStop OS is the one possibly of interest here. I have never
personally seen a machine running it.

I recommend either (a) taking no action or (b) adding a regression test
verifying WaitLatchOrSocket() conformance in this scenario. Then we can
decide what more to do if failure evidence emerges.

> > We use poll() there anyway, so the bug in that configuration does not
> > affect PostgreSQL. Is it a bellwether of similar bugs in other
> > implementations, bugs that will affect PostgreSQL?
>
> Hm. I can think of two stopgap measures we could add:
>
> 1) If we're using select() and WL_SOCKET_WRITEABLE is set without
> _READABLE, add a timeout of Min(1s, Max(passed_timeout, 1s)). As the
> time spent waiting only for writable normally shouldn't be very long,
> that shouldn't be noticeably bad for power usage.
> 2) Add a SIGPIPE handler that just does a SetLatch(MyLach).

I'm having trouble visualizing those proposed measures in detail, but I trust
that a decent workaround would emerge.

> > > + if (pfds[0].revents & (POLLHUP | POLLERR | POLLNVAL))
> > > + {
> > > + /* EOF/error condition */
> > > + if (wakeEvents & WL_SOCKET_READABLE)
> > > + result |= WL_SOCKET_READABLE;
> > > + if (wakeEvents & WL_SOCKET_WRITEABLE)
> > > + result |= WL_SOCKET_WRITEABLE;
> > > + }
> >
> > With some poll() implementations (e.g. OS X), this can wrongly report
> > WL_SOCKET_WRITEABLE if the peer used shutdown(SHUT_WR). I tentatively think
> > that's acceptable. libpq does not use shutdown(), and other client interfaces
> > would do so at their own risk. Should we worry about hostile clients creating
> > a denial-of-service by causing a server send() to block unexpectedly?
> > Probably not; a user able to send arbitrary TCP traffic to the postmaster port
> > can already achieve that.
>
> Yea, this doesn't seem particularly concerning.
>
> a) They can just stop consuming writes and use noticeable amounts of
> memory by doing output intensive queries. That uses significant os
> resources and is much harder to detect - today.

If there's anything to worry about here (unlikely), it would be with respect
to not-yet-authenticated connections only.

> b) does accepting WL_SOCKET_WRITEABLE without _READABLE change anything
> here? We already allow _WRITABLE...

Today's code translates POLLHUP to WL_SOCKET_READABLE; it must see POLLOUT to
set WL_SOCKET_WRITEABLE. Your patch changes that.

> What happens if you write/send() in
> that state, btw?

write() reports EAGAIN.

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Noah Misch <noah(at)leadboat(dot)com>
Cc:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, robertmhaas(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Escaping from blocked send() reprised.
Date:	2015-01-12 00:45:41
Message-ID:	20150112004541.GA20690@awork2.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2015-01-11 19:37:53 -0500, Noah Misch wrote:
> I recommend either (a) taking no action or (b) adding a regression test
> verifying WaitLatchOrSocket() conformance in this scenario.

Do you have a good idea how to test b) save a C function in regress.c
that does what your test does using latches?

> Then we cane decide what more to do if failure evidence emerges.

Seems fine to me.

> > Hm. I can think of two stopgap measures we could add:
> >
> > 1) If we're using select() and WL_SOCKET_WRITEABLE is set without
> > _READABLE, add a timeout of Min(1s, Max(passed_timeout, 1s)). As the
> > time spent waiting only for writable normally shouldn't be very long,
> > that shouldn't be noticeably bad for power usage.
> > 2) Add a SIGPIPE handler that just does a SetLatch(MyLach).
>
> I'm having trouble visualizing those proposed measures in detail, but I trust
> that a decent workaround would emerge.

For 1) I'm thinking of just regularly causing a spurious
WL_SOCKET_WRITEABLE event via timeouts if it's the only parameter. Latch
users have to deal with spurious wakeups anyway, so that should be
mostly unproblematic.

For 2) I was thinking that for now the problem only arises for the main
FE/BE socket. So we can install a sigpipe handler that does a SetLatch()
- that will trigger WaitLatch() to return and, after checking for
interrupts, retry the actual send() - which then'd return ECONNRESET.

> > What happens if you write/send() in
> > that state, btw?
>
> write() reports EAGAIN.

Grand.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Noah Misch <noah(at)leadboat(dot)com>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, robertmhaas(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Escaping from blocked send() reprised.
Date:	2015-01-12 00:55:33
Message-ID:	20150112005533.GC2722746@tornado.leadboat.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Jan 12, 2015 at 01:45:41AM +0100, Andres Freund wrote:
> On 2015-01-11 19:37:53 -0500, Noah Misch wrote:
> > I recommend either (a) taking no action or (b) adding a regression test
> > verifying WaitLatchOrSocket() conformance in this scenario.
>
> Do you have a good idea how to test b) save a C function in regress.c
> that does what your test does using latches?

No, that's what I had in mind. You could probably achieve it with a libpq
program that lets input accumulate, but that's trickier.

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Noah Misch <noah(at)leadboat(dot)com>
Cc:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, robertmhaas(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Escaping from blocked send() reprised.
Date:	2015-01-13 12:11:31
Message-ID:	20150113121131.GA12272@alap3.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2015-01-12 00:40:50 +0100, Andres Freund wrote:
> Fixed in what I've since pushed (as Heikki basically was ok with the
> patch sent a couple months back, modulo some fixes)...

I'd not actually pushed that patch... I had pushed some patches
(barriers, atomics), but had decided to hold off on this. I've now done
so.

I've mentioned the portability concerns over select() bugs in the commit
message & a comment. ATM I'm not inclined to add a relatively elaborate
test for the bug on pretty fringe platforms.

Thanks for looking at this!

I plan to continue with committing
1) Commonalize process startup code
2) Add a default local latch for use in signal handlers
3) Use a nonblocking socket for FE/BE communication and block using latches

pretty soon.

As we already seem to assume that WaitLatch() is signal safe/reentrant
(c.f. walsender.c), I'm fine with committing 3) in isolation, without
4). I need a test that properly exercises catchup interrupts before
committing that.

Once I have that test I plan to commit
4) Introduce and use infrastructure for interrupt processing during
client reads.

I'd like some input from others what they think about the problem that
5) "Process 'die' interrupts while reading/writing from a socket."

can reduce the likelihood of clients getting the error message. I
personally think that's more than outweighed by not having backends
stuck (save quickdie) for a long time when the client is gone/stuck. I
think the middleground in the patch to only process die events when
actually blocked in writes reduces the likelihood of this sufficiently.

I have hacks ontop this to get rid of ImmediateInterrupt alltogether,
although I'm not sure how well this will work for some parts of
auth/crypt.c. Everything else, including the deadlock checker, seems
quite doable.

Andres

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, robertmhaas(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Escaping from blocked send() reprised.
Date:	2015-01-16 11:39:17
Message-ID:	20150116113917.GA16991@alap3.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi Heikki,

On 2014-09-02 21:22:29 +0300, Heikki Linnakangas wrote:
> On 08/28/2014 03:47 PM, Kyotaro HORIGUCHI wrote:
> > To make the code mentioned above (Patch 0002) tidy, rewrite the
> > socket emulation code for win32 backends so that each socket
> > can have its own non-blocking state. (patch 0001)
>
> The first patch that makes non-blocking sockets behave more sanely on
> Windows seems like a good idea, independently of the second patch. I'm
> looking at the first patch now, I'll make a separate post about the second
> patch.

> On Windows, the backend has an emulation layer for POSIX signals, which uses
> threads and Windows events. The reason win32/socket.c always uses
> non-blocking mode internally is that it needs to wait for the socket to
> become readable/writeable, and for the signal-emulation event, at the same
> time. So no, we can't remove it.
>
> The approach taken in the first patch seems sensible. I changed it to not
> use FD_SET, though. A custom array seems better, that way we don't need the
> pgwin32_nonblockset_init() call, we can just use initialize the variable.
> It's a little bit more code, but it's well-contained in win32/socket.c.
> Please take a look, to double-check that I didn't screw up.

Heikki, what's your plan about this patch? Do you plan to commit it?

Greetings,

Andres Freund