walreceiver is uninterruptible on win32

Lists: pgsql-hackers
From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: walreceiver is uninterruptible on win32
Date: 2010-03-10 09:09:57
Message-ID: 3f0b79eb1003100109u4e5131c1hc26c1db709ab3418@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

http://archives.postgresql.org/pgsql-hackers/2010-01/msg01672.php
On win32, the blocking libpq functions like PQconnectdb() and
PQexec() are uninterruptible since they use the vanilla select()
instead of our signal emulation layer compatible select().
Nevertheless, currently walreceiver uses them to establish a
connection, send a handshake message and wait for the reply.
So walreceiver also becomes uninterruptible for a while. This
is the must-fix problem for 9.0.

I replaced the blocking libpq functions currently used with
asynchronous ones, and used the emulated version of select()
to wait, to make walreceiver interruptible. Here is the patch.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment Content-Type Size
win32_interruptible_walreceiver_v1.patch text/x-patch 10.3 KB

From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: walreceiver is uninterruptible on win32
Date: 2010-03-12 11:13:43
Message-ID: 9837222c1003120313l2a1ec911vaf229a4f7890f010@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Mar 10, 2010 at 10:09, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> Hi,
>
> http://archives.postgresql.org/pgsql-hackers/2010-01/msg01672.php
> On win32, the blocking libpq functions like PQconnectdb() and
> PQexec() are uninterruptible since they use the vanilla select()
> instead of our signal emulation layer compatible select().
> Nevertheless, currently walreceiver uses them to establish a
> connection, send a handshake message and wait for the reply.
> So walreceiver also becomes uninterruptible for a while. This
> is the must-fix problem for 9.0.
>
> I replaced the blocking libpq functions currently used with
> asynchronous ones, and used the emulated version of select()
> to wait, to make walreceiver interruptible. Here is the patch.

These are issues that affect other things running libpq in the backend
as well, right? Such as dblink? Perhaps we can factor out most of this
into functions in backend/port/win32 so that we can re-use it fro
there?

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/


From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: walreceiver is uninterruptible on win32
Date: 2010-03-15 09:14:07
Message-ID: 3f0b79eb1003150214v5427d107u60a60b394000d648@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Mar 12, 2010 at 8:13 PM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
> On Wed, Mar 10, 2010 at 10:09, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>> Hi,
>>
>> http://archives.postgresql.org/pgsql-hackers/2010-01/msg01672.php
>> On win32, the blocking libpq functions like PQconnectdb() and
>> PQexec() are uninterruptible since they use the vanilla select()
>> instead of our signal emulation layer compatible select().
>> Nevertheless, currently walreceiver uses them to establish a
>> connection, send a handshake message and wait for the reply.
>> So walreceiver also becomes uninterruptible for a while. This
>> is the must-fix problem for 9.0.
>>
>> I replaced the blocking libpq functions currently used with
>> asynchronous ones, and used the emulated version of select()
>> to wait, to make walreceiver interruptible. Here is the patch.
>
> These are issues that affect other things running libpq in the backend
> as well, right? Such as dblink?

Yes. So Heikki wrote the patch for dblink.
http://archives.postgresql.org/pgsql-hackers/2010-01/msg02072.php

> Perhaps we can factor out most of this
> into functions in backend/port/win32 so that we can re-use it fro
> there?

Sorry. I couldn't get your point. Could you explain it in detail?

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: walreceiver is uninterruptible on win32
Date: 2010-03-15 09:42:03
Message-ID: 9837222c1003150242p7733c7c6je440d528f04b3a01@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Mar 15, 2010 at 10:14, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> On Fri, Mar 12, 2010 at 8:13 PM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
>> On Wed, Mar 10, 2010 at 10:09, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>>> Hi,
>>>
>>> http://archives.postgresql.org/pgsql-hackers/2010-01/msg01672.php
>>> On win32, the blocking libpq functions like PQconnectdb() and
>>> PQexec() are uninterruptible since they use the vanilla select()
>>> instead of our signal emulation layer compatible select().
>>> Nevertheless, currently walreceiver uses them to establish a
>>> connection, send a handshake message and wait for the reply.
>>> So walreceiver also becomes uninterruptible for a while. This
>>> is the must-fix problem for 9.0.
>>>
>>> I replaced the blocking libpq functions currently used with
>>> asynchronous ones, and used the emulated version of select()
>>> to wait, to make walreceiver interruptible. Here is the patch.
>>
>> These are issues that affect other things running libpq in the backend
>> as well, right? Such as dblink?
>
> Yes. So Heikki wrote the patch for dblink.
> http://archives.postgresql.org/pgsql-hackers/2010-01/msg02072.php

IIRC that was never applied.

>
>> Perhaps we can factor out most of this
>> into functions in backend/port/win32 so that we can re-use it fro
>> there?
>
> Sorry. I couldn't get your point. Could you explain it in detail?

What I'm referring to is the part that Heikki writes as "The
implementation should be shared between the two, but I'm not sure
how". I think we should try to factor out things that can be shared
into separate functions and stick those in port/win32 (assuming
they're win32-specific, otherwise, in another suitable location), and
then call them from both. There seems to be a lot of things that
should be doable that way.

I notice for example that the dblink patch doesn't have the code for
timeout handling - shouldn't it?

I think we need to look at this as a single problem needing to be
solved, and then have the same solution applied to dblink and
walreceiver.

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/


From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: walreceiver is uninterruptible on win32
Date: 2010-03-15 10:56:10
Message-ID: 3f0b79eb1003150356k7d7cc10at6472bc29ad15c71a@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Mar 15, 2010 at 6:42 PM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
>>> Perhaps we can factor out most of this
>>> into functions in backend/port/win32 so that we can re-use it fro
>>> there?
>>
>> Sorry. I couldn't get your point. Could you explain it in detail?
>
> What I'm referring to is the part that Heikki writes as "The
> implementation should be shared between the two, but I'm not sure
> how". I think we should try to factor out things that can be shared
> into separate functions and stick those in port/win32 (assuming
> they're win32-specific, otherwise, in another suitable location), and
> then call them from both. There seems to be a lot of things that
> should be doable that way.
>
> I notice for example that the dblink patch doesn't have the code for
> timeout handling - shouldn't it?
>
> I think we need to look at this as a single problem needing to be
> solved, and then have the same solution applied to dblink and
> walreceiver.

Thanks for the explanation. I agree that the code should be shared,
but am not sure how, too.

Something like libpq_select() which waits for the socket to become
ready would be required for walreceiver and dblink. But it's necessary
for walreceiver on not only win32 but also the other, so some functions
might need to be placed in the location other than port/win32.

I'll think of this issue for a while.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: walreceiver is uninterruptible on win32
Date: 2010-03-15 15:32:58
Message-ID: 4B9E532A.5020307@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Fujii Masao wrote:
> On Mon, Mar 15, 2010 at 6:42 PM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
>> I think we need to look at this as a single problem needing to be
>> solved, and then have the same solution applied to dblink and
>> walreceiver.

Agreed.

> Something like libpq_select() which waits for the socket to become
> ready would be required for walreceiver and dblink. But it's necessary
> for walreceiver on not only win32 but also the other, ...

Really, why? I thought this is a purely Windows specific problem.

Just replacing PQexec() with PQsendQuery() is pretty straightforward, we
could put that replacement in a file in port/win32. Replacing
PQconnectdb() is more complicated because you need to handle connection
timeout. I suggest that we only add the replacement for PQexec(), and
live with the situation for PQconnectdb(), that covers 99% of the
scenarios anyway.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Joe Conway <mail(at)joeconway(dot)com>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: walreceiver is uninterruptible on win32
Date: 2010-03-15 15:49:50
Message-ID: 4B9E571E.5000306@joeconway.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 03/15/2010 02:42 AM, Magnus Hagander wrote:
>
> I think we need to look at this as a single problem needing to be
> solved, and then have the same solution applied to dblink and
> walreceiver.
>

+1

Joe


From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: walreceiver is uninterruptible on win32
Date: 2010-03-16 01:35:26
Message-ID: 3f0b79eb1003151835g44ab5c16g38b794cd386a8eb9@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Mar 16, 2010 at 12:32 AM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>> Something like libpq_select() which waits for the socket to become
>> ready would be required for walreceiver and dblink. But it's necessary
>> for walreceiver on not only win32 but also the other, ...
>
> Really, why? I thought this is a purely Windows specific problem.

Because, on all the platforms, libpq_receive() needs to call libpq_select().
Or you mean that we should leave the existing libpq_select() as it is, and
create new win32 specific function which waits for the socket to become ready,
for this issue?

> Just replacing PQexec() with PQsendQuery() is pretty straightforward, we
> could put that replacement in a file in port/win32. Replacing
> PQconnectdb() is more complicated because you need to handle connection
> timeout. I suggest that we only add the replacement for PQexec(), and
> live with the situation for PQconnectdb(), that covers 99% of the
> scenarios anyway.

I'll try to replace PQexec() first, and PQconnectdb() second if I have
enough time.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: walreceiver is uninterruptible on win32
Date: 2010-03-25 14:33:24
Message-ID: 3f0b79eb1003250733l527881a5oef2c0c19958b71c5@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Mar 16, 2010 at 10:35 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>> Just replacing PQexec() with PQsendQuery() is pretty straightforward, we
>> could put that replacement in a file in port/win32. Replacing
>> PQconnectdb() is more complicated because you need to handle connection
>> timeout. I suggest that we only add the replacement for PQexec(), and
>> live with the situation for PQconnectdb(), that covers 99% of the
>> scenarios anyway.
>
> I'll try to replace PQexec() first, and PQconnectdb() second if I have
> enough time.

Sorry for the delay. The attached patch replaces PQexec() used by dblink
and libpqwalreceiver with pgwin32_PQexec() which is the win32 version of
PQexec().

pgwin32_PQexec() is provided as the library 'libpqbe.dll', which is created
only on win32. dblink.dll and libpqwalreceiver.dll refer to libpqbe.dll.
Also libpqbe.dll refers to libpq.dll.

I'm not sure if my patch is in the right way. If you notice anything,
please feel free to comment!

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment Content-Type Size
pgwin32_PQexec_v1.patch application/octet-stream 8.8 KB

From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: walreceiver is uninterruptible on win32
Date: 2010-04-02 14:11:20
Message-ID: m2p9837222c1004020711j763b02e6r37e4caba514bb82b@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Mar 25, 2010 at 15:33, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> On Tue, Mar 16, 2010 at 10:35 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>>> Just replacing PQexec() with PQsendQuery() is pretty straightforward, we
>>> could put that replacement in a file in port/win32. Replacing
>>> PQconnectdb() is more complicated because you need to handle connection
>>> timeout. I suggest that we only add the replacement for PQexec(), and
>>> live with the situation for PQconnectdb(), that covers 99% of the
>>> scenarios anyway.
>>
>> I'll try to replace PQexec() first, and PQconnectdb() second if I have
>> enough time.
>
> Sorry for the delay. The attached patch replaces PQexec() used by dblink
> and libpqwalreceiver with pgwin32_PQexec() which is the win32 version of
> PQexec().
>
> pgwin32_PQexec() is provided as the library 'libpqbe.dll', which is created
> only on win32. dblink.dll and libpqwalreceiver.dll refer to libpqbe.dll.
> Also libpqbe.dll refers to libpq.dll.
>
> I'm not sure if my patch is in the right way. If you notice anything,
> please feel free to comment!

Well, first of all, it would require the addition of the new target to
the msvc build system, but that's easy - I can do that for you.

More to the point, I'm not sure I like the creation of yet another DLL
to deal with this. The reason this isn't just exported from the main
backend is the same reason we created the libpqwalreceiver library I'm
sure - bt that means we already have one.

How about we just use this same source file, but compile and link it
directly into both dblink and libpqwalreceiver? That'd leave us with
one DLL less, making life easier.

The downside would be that other third-party modules who need to call
libpq wouldn't be able to access this without copying the function.
But is this really something that's useful for third party modules?

As for the code itself, don't we need a CHECK_FOR_INTERRUPTS in there
for it to be actually useful?

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: walreceiver is uninterruptible on win32
Date: 2010-04-02 15:26:51
Message-ID: 24130.1270222011@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Magnus Hagander <magnus(at)hagander(dot)net> writes:
> On Thu, Mar 25, 2010 at 15:33, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>> Sorry for the delay. The attached patch replaces PQexec() used by dblink
>> and libpqwalreceiver with pgwin32_PQexec() which is the win32 version of
>> PQexec().
>>
>> pgwin32_PQexec() is provided as the library 'libpqbe.dll', which is created
>> only on win32. dblink.dll and libpqwalreceiver.dll refer to libpqbe.dll.
>> Also libpqbe.dll refers to libpq.dll.

> [ assorted objections ]

I disapprove of the whole approach, actually. The right way to fix this
is to not touch or replace libpq at all, but to change walreceiver to
use libpq's async-query facilities directly. Instead of PQexec, use
PQsendQuery and then a loop involving PQisBusy, PQgetResult, etc.
You've more or less done that loop, but you've put it in the wrong
place.

The larger point is that I don't believe this issue exists only on
Windows. I think we're going to want something like this on all
platforms, and that implies supporting poll() not just select() for the
waiting part.

The patch also seems confused about whether it's intending to be a
general-purpose solution or not. You can maybe take some shortcuts
if it's only going to be for walreceiver, but if you're going to put
it in dblink it is *not* acceptable to take shortcuts like not
processing errors completely.

regards, tom lane


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: walreceiver is uninterruptible on win32
Date: 2010-04-02 15:40:28
Message-ID: l2p9837222c1004020840n5417588an44b49e3a59a3e286@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Apr 2, 2010 at 17:26, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Magnus Hagander <magnus(at)hagander(dot)net> writes:
>> On Thu, Mar 25, 2010 at 15:33, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>>> Sorry for the delay. The attached patch replaces PQexec() used by dblink
>>> and libpqwalreceiver with pgwin32_PQexec() which is the win32 version of
>>> PQexec().
>>>
>>> pgwin32_PQexec() is provided as the library 'libpqbe.dll', which is created
>>> only on win32. dblink.dll and libpqwalreceiver.dll refer to libpqbe.dll.
>>> Also libpqbe.dll refers to libpq.dll.
>
>> [ assorted objections ]
>
> I disapprove of the whole approach, actually.  The right way to fix this
> is to not touch or replace libpq at all, but to change walreceiver to
> use libpq's async-query facilities directly.  Instead of PQexec, use
> PQsendQuery and then a loop involving PQisBusy, PQgetResult, etc.
> You've more or less done that loop, but you've put it in the wrong
> place.

Any particular reason not to wrap that in a function? Not called
pgwin32_PQexec() then, but something more generic? And not doing any
#defines to change PQexec, but call that wrapper directly?

> The larger point is that I don't believe this issue exists only on
> Windows.  I think we're going to want something like this on all
> platforms, and that implies supporting poll() not just select() for the
> waiting part.

The most important part of the issue doesn't (because PQexec will be
interrupted by a signal), but there may certainly be others.

> The patch also seems confused about whether it's intending to be a
> general-purpose solution or not.  You can maybe take some shortcuts
> if it's only going to be for walreceiver, but if you're going to put
> it in dblink it is *not* acceptable to take shortcuts like not
> processing errors completely.

Yeah, good point.

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: walreceiver is uninterruptible on win32
Date: 2010-04-02 15:46:18
Message-ID: 26614.1270223178@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Magnus Hagander <magnus(at)hagander(dot)net> writes:
> On Fri, Apr 2, 2010 at 17:26, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> I disapprove of the whole approach, actually. The right way to fix this
>> is to not touch or replace libpq at all, but to change walreceiver to
>> use libpq's async-query facilities directly. Instead of PQexec, use
>> PQsendQuery and then a loop involving PQisBusy, PQgetResult, etc.
>> You've more or less done that loop, but you've put it in the wrong
>> place.

> Any particular reason not to wrap that in a function? Not called
> pgwin32_PQexec() then, but something more generic? And not doing any
> #defines to change PQexec, but call that wrapper directly?

Yeah, that's fine. I just think it's easier to deal with this as a
local issue in walreceiver and dblink than to try to pretend we're
changing libpq's API.

>> The larger point is that I don't believe this issue exists only on
>> Windows. I think we're going to want something like this on all
>> platforms, and that implies supporting poll() not just select() for the
>> waiting part.

> The most important part of the issue doesn't (because PQexec will be
> interrupted by a signal), but there may certainly be others.

Really? As you pointed out yourself, if control doesn't reach a
CHECK_FOR_INTERRUPTS then we have a problem. We also know that select
isn't interrupted by signals on all platforms.

regards, tom lane


From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: walreceiver is uninterruptible on win32
Date: 2010-04-05 06:18:06
Message-ID: v2o3f0b79eb1004042318u3aa4f546l8f2a14b01df07cb1@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Apr 2, 2010 at 11:11 PM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
> More to the point, I'm not sure I like the creation of yet another DLL
> to deal with this. The reason this isn't just exported from the main
> backend is the same reason we created the libpqwalreceiver library I'm
> sure - bt that means we already have one.
>
> How about we just use this same source file, but compile and link it
> directly into both dblink and libpqwalreceiver? That'd leave us with
> one DLL less, making life easier.

ISTM that we cannot compile dblink using USE_PGXS=1, if that DLL doesn't
exist in the installation directory. No?

> As for the code itself, don't we need a CHECK_FOR_INTERRUPTS in there
> for it to be actually useful?

Yes. I'll add a CHECK_FOR_INTERRUPTS in the loop.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: walreceiver is uninterruptible on win32
Date: 2010-04-05 07:00:14
Message-ID: r2w3f0b79eb1004050000yde9e010etb82eb320bec456e9@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Apr 3, 2010 at 12:26 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> I disapprove of the whole approach, actually.  The right way to fix this
> is to not touch or replace libpq at all, but to change walreceiver to
> use libpq's async-query facilities directly.  Instead of PQexec, use
> PQsendQuery and then a loop involving PQisBusy, PQgetResult, etc.

Yes.

> You've more or less done that loop, but you've put it in the wrong
> place.

OK. I'll reconsider about how to use those asynchronous libpq functions.
But, if possible, could you point out where "the right place" is?

> The larger point is that I don't believe this issue exists only on
> Windows.  I think we're going to want something like this on all
> platforms, and that implies supporting poll() not just select() for the
> waiting part.

OK. I'll change the part so that poll() is used if HAVE_POLL is defined,
select() otherwise.

> The patch also seems confused about whether it's intending to be a
> general-purpose solution or not.  You can maybe take some shortcuts
> if it's only going to be for walreceiver, but if you're going to put
> it in dblink it is *not* acceptable to take shortcuts like not
> processing errors completely.

OK. I'll address this problem. Since PGconn->errorMessage cannot be
updated from outside of libpq, I'm thinking of making the caller give
the StringInfo variable as a parameter to pgwin32_PQexec(), and
putting error messages in it. Then the caller use the StringInfo
instead of PQerrorMessage(PGconn), to get the error messages.

And ISTM that dblink needs to hold the StringInfo using hash as
do the PGconn.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: walreceiver is uninterruptible on win32
Date: 2010-04-06 12:25:49
Message-ID: x2h3f0b79eb1004060525o22ba3e9ap7ba7ab66de7e6766@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Apr 5, 2010 at 3:18 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> On Fri, Apr 2, 2010 at 11:11 PM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
>> More to the point, I'm not sure I like the creation of yet another DLL
>> to deal with this. The reason this isn't just exported from the main
>> backend is the same reason we created the libpqwalreceiver library I'm
>> sure - bt that means we already have one.
>>
>> How about we just use this same source file, but compile and link it
>> directly into both dblink and libpqwalreceiver? That'd leave us with
>> one DLL less, making life easier.
>
> ISTM that we cannot compile dblink using USE_PGXS=1, if that DLL doesn't
> exist in the installation directory. No?

I might have misinterpreted your point. You mean that the same source
file defining something like pgwin32_PQexec should be placed in both
contrib/dblink and src/backend/replication/libpqwalreceiver? If so,
we can compile dblink using USE_PGXS without the DLL.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: walreceiver is uninterruptible on win32
Date: 2010-04-06 16:45:49
Message-ID: w2h9837222c1004060945j21fbaa12ued6ddd3d0c9332c5@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Apr 6, 2010 at 2:25 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> On Mon, Apr 5, 2010 at 3:18 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>> On Fri, Apr 2, 2010 at 11:11 PM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
>>> More to the point, I'm not sure I like the creation of yet another DLL
>>> to deal with this. The reason this isn't just exported from the main
>>> backend is the same reason we created the libpqwalreceiver library I'm
>>> sure - bt that means we already have one.
>>>
>>> How about we just use this same source file, but compile and link it
>>> directly into both dblink and libpqwalreceiver? That'd leave us with
>>> one DLL less, making life easier.
>>
>> ISTM that we cannot compile dblink using USE_PGXS=1, if that DLL doesn't
>> exist in the installation directory. No?
>
> I might have misinterpreted your point. You mean that the same source
> file defining something like pgwin32_PQexec should be placed in both
> contrib/dblink and src/backend/replication/libpqwalreceiver? If so,
> we can compile dblink using USE_PGXS without the DLL.

No, I don't mean that. I mean store it in one place, and copy/link it
into where it's used. Look at for example how crypt.c and
getaddrinfo.c are handled in libpq.

Not sure how that will play with PGXS, though, but I'm not entirely
sure we care if it can be built that way? If it does, there should be
some way to get PGXS to execute that rule as well, I'm sure.

Also note that per Tom's comments this is not a win32 only fix, so it
shouldn't be called pgwin32_*().

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/


From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: walreceiver is uninterruptible on win32
Date: 2010-04-08 08:01:03
Message-ID: o2m3f0b79eb1004080101j287fd613uad9cc81033af0df7@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Apr 7, 2010 at 1:45 AM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
> No, I don't mean that. I mean store it in one place, and copy/link it
> into where it's used. Look at for example how crypt.c and
> getaddrinfo.c are handled in libpq.

Thanks for the advice!

> Not sure how that will play with PGXS, though, but I'm not entirely
> sure we care if it can be built that way?

Probably Yes.

> If it does, there should be
> some way to get PGXS to execute that rule as well, I'm sure.

If we can copy/link the source file defining "new PQexec" when
we compile the dblink, DLL doesn't seem to be required. So I
stop creating new DLL for PGXS.

> Also note that per Tom's comments this is not a win32 only fix, so it
> shouldn't be called pgwin32_*().

Yep.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: walreceiver is uninterruptible on win32
Date: 2010-04-12 11:54:07
Message-ID: y2u3f0b79eb1004120454lde94595cldf5521a6a9b8fab@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Apr 8, 2010 at 5:01 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>> If it does, there should be
>> some way to get PGXS to execute that rule as well, I'm sure.
>
> If we can copy/link the source file defining "new PQexec" when
> we compile the dblink, DLL doesn't seem to be required. So I
> stop creating new DLL for PGXS.

On second thought, ISTM that we cannot use any source files which exist
in places other than contrib/dblink and installation directory when we
compile dblink under USE_PGXS=1. But we can put the file implementing
new PQexec on those neither. So I'm thinking again that it should be
provided as the shared library and be linked from walreceiver and dblink.
Is this right?

If adding new shared library is too big change at this point, I think
that we should postpone the fix only for dblink to 9.1 or later. Since
no one has complained about this long-term problem of dblink, I'm not
sure it really should be fixed right now. Thought?

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


From: Joseph Conway <mail(at)joeconway(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: walreceiver is uninterruptible on win32
Date: 2010-04-12 14:26:04
Message-ID: 4BC32D7C.70004@joeconway.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Fujii Masao wrote:
> If adding new shared library is too big change at this point, I think
> that we should postpone the fix only for dblink to 9.1 or later. Since
> no one has complained about this long-term problem of dblink, I'm not
> sure it really should be fixed right now. Thought?

I would agree with this. No one has ever complained that I am aware of.

Joe


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: walreceiver is uninterruptible on win32
Date: 2010-04-12 16:56:51
Message-ID: x2y9837222c1004120956zef64d9esf766e3f0fa247a03@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Apr 12, 2010 at 13:54, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> On Thu, Apr 8, 2010 at 5:01 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>>> If it does, there should be
>>> some way to get PGXS to execute that rule as well, I'm sure.
>>
>> If we can copy/link the source file defining "new PQexec" when
>> we compile the dblink, DLL doesn't seem to be required. So I
>> stop creating new DLL for PGXS.
>
> On second thought, ISTM that we cannot use any source files which exist
> in places other than contrib/dblink and installation directory when we
> compile dblink under USE_PGXS=1. But we can put the file implementing
> new PQexec on those neither. So I'm thinking again that it should be
> provided as the shared library and be linked from walreceiver and dblink.
> Is this right?
>
> If adding new shared library is too big change at this point, I think
> that we should postpone the fix only for dblink to 9.1 or later. Since
> no one has complained about this long-term problem of dblink, I'm not
> sure it really should be fixed right now. Thought?

+1. Let's fix walreceiver for now, and we can revisit dblink later.
Since we haven't had any complaints so far...

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/


From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Joe Conway <mail(at)joeconway(dot)com>
Subject: Re: walreceiver is uninterruptible on win32
Date: 2010-04-13 00:21:16
Message-ID: k2n3f0b79eb1004121721m7169f31di35ebd57d74c55457@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Apr 13, 2010 at 1:56 AM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
>> If adding new shared library is too big change at this point, I think
>> that we should postpone the fix only for dblink to 9.1 or later. Since
>> no one has complained about this long-term problem of dblink, I'm not
>> sure it really should be fixed right now. Thought?
>
> +1. Let's fix walreceiver for now, and we can revisit dblink later.
> Since we haven't had any complaints so far...

OK. I'll focus on walreceiver now.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Joe Conway <mail(at)joeconway(dot)com>
Subject: Re: walreceiver is uninterruptible on win32
Date: 2010-04-14 08:02:18
Message-ID: v2u3f0b79eb1004140102w4f052d86y64286a28fdab53e@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Apr 13, 2010 at 9:21 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> On Tue, Apr 13, 2010 at 1:56 AM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
>>> If adding new shared library is too big change at this point, I think
>>> that we should postpone the fix only for dblink to 9.1 or later. Since
>>> no one has complained about this long-term problem of dblink, I'm not
>>> sure it really should be fixed right now. Thought?
>>
>> +1. Let's fix walreceiver for now, and we can revisit dblink later.
>> Since we haven't had any complaints so far...
>
> OK. I'll focus on walreceiver now.

The attached patch changes walreceiver so that it uses new function
libpqrcv_PQexec() instead of PQexec() for sending handshake message.
libpqrcv_PQexec() sends a query and wait for the results by using
the asynchronous libpq functions and the backend version of select().
It's interruptible by signals. You would be able to shut the standby
server down in the middle of handshake for replication connection.

http://archives.postgresql.org/pgsql-hackers/2010-03/msg00625.php
> Just replacing PQexec() with PQsendQuery() is pretty straightforward, we
> could put that replacement in a file in port/win32. Replacing
> PQconnectdb() is more complicated because you need to handle connection
> timeout. I suggest that we only add the replacement for PQexec(), and
> live with the situation for PQconnectdb(), that covers 99% of the
> scenarios anyway.

According to the suggestion, I replaced only the PQexec() and didn't
add the replacement of PQconnect() for now.

http://archives.postgresql.org/pgsql-hackers/2010-04/msg00077.php
> As for the code itself, don't we need a CHECK_FOR_INTERRUPTS in there
> for it to be actually useful?

Since the backend version of select() receives any incoming signals
on Win32, CHECK_FOR_INTERRUPTS seems not to be needed in the loop,
at least in walreceiver. No? The patch doesn't put it in there, and
I was able to interrupt walreceiver executing libpqrcv_PQexec() when
I tested the patch on Win32.

http://archives.postgresql.org/pgsql-hackers/2010-04/msg00084.php
> The larger point is that I don't believe this issue exists only on
> Windows. I think we're going to want something like this on all
> platforms, and that implies supporting poll() not just select() for the
> waiting part.

The patch uses libpq_select() to check for the socket. It supports
poll() and select().

> The patch also seems confused about whether it's intending to be a
> general-purpose solution or not. You can maybe take some shortcuts
> if it's only going to be for walreceiver, but if you're going to put
> it in dblink it is *not* acceptable to take shortcuts like not
> processing errors completely.

The patch still takes some shortcuts since we decided to postpone
the fix for dblink to 9.1 or later.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment Content-Type Size
libpqrcv_PQexec_v1.patch application/octet-stream 5.2 KB

From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Joe Conway <mail(at)joeconway(dot)com>
Subject: Re: walreceiver is uninterruptible on win32
Date: 2010-04-14 14:15:43
Message-ID: r2n9837222c1004140715m9bf986f0w384c3631c18d11ed@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Apr 14, 2010 at 10:02 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> On Tue, Apr 13, 2010 at 9:21 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>> On Tue, Apr 13, 2010 at 1:56 AM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
>>>> If adding new shared library is too big change at this point, I think
>>>> that we should postpone the fix only for dblink to 9.1 or later. Since
>>>> no one has complained about this long-term problem of dblink, I'm not
>>>> sure it really should be fixed right now. Thought?
>>>
>>> +1. Let's fix walreceiver for now, and we can revisit dblink later.
>>> Since we haven't had any complaints so far...
>>
>> OK. I'll focus on walreceiver now.
>
> The attached patch changes walreceiver so that it uses new function
> libpqrcv_PQexec() instead of PQexec() for sending handshake message.
> libpqrcv_PQexec() sends a query and wait for the results by using
> the asynchronous libpq functions and the backend version of select().
> It's interruptible by signals. You would be able to shut the standby
> server down in the middle of handshake for replication connection.
>
> http://archives.postgresql.org/pgsql-hackers/2010-03/msg00625.php
>> Just replacing PQexec() with PQsendQuery() is pretty straightforward, we
>> could put that replacement in a file in port/win32. Replacing
>> PQconnectdb() is more complicated because you need to handle connection
>> timeout. I suggest that we only add the replacement for PQexec(), and
>> live with the situation for PQconnectdb(), that covers 99% of the
>> scenarios anyway.
>
> According to the suggestion, I replaced only the PQexec() and didn't
> add the replacement of PQconnect() for now.
>
> http://archives.postgresql.org/pgsql-hackers/2010-04/msg00077.php
>> As for the code itself, don't we need a CHECK_FOR_INTERRUPTS in there
>> for it to be actually useful?
>
> Since the backend version of select() receives any incoming signals
> on Win32, CHECK_FOR_INTERRUPTS seems not to be needed in the loop,
> at least in walreceiver. No? The patch doesn't put it in there, and
> I was able to interrupt walreceiver executing libpqrcv_PQexec() when
> I tested the patch on Win32.

It will call the signal handler, yes. Normally, the signal handler
just sets a flag somewhere, which needs to be checked with
CHECK_FOR_INTERRUPTS.

From how I read the walreceiver.c code (which I'm not that familiar
with), the signal handlers call ProcessWalRcvInterrupts() which in
turn has CHECK_FOR_INTERRUPTS in it, and this is where it ends up
being called.

>> The patch also seems confused about whether it's intending to be a
>> general-purpose solution or not.  You can maybe take some shortcuts
>> if it's only going to be for walreceiver, but if you're going to put
>> it in dblink it is *not* acceptable to take shortcuts like not
>> processing errors completely.
>
> The patch still takes some shortcuts since we decided to postpone
> the fix for dblink to 9.1 or later.

Given those shortcuts, can't we simplify it even further like
attached? (If nothing else, your code did PQclear() on an
uninitialized pointer - must've been pure luck that it worked)

Looking at the call-sites, there are bugs now - if PQexec() returns
NULL, we don't deal with it. It also doesn't always free the result
properly. I've added checks for that.

Finally, I've updated some of the comments.

Note: I've only tested this patch as far as that it compiles. I don't
have a SR env around right now, so I'll have to complete with that
later. But if you have a chance to test that it fixes your test case,
please do!

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

Attachment Content-Type Size
libpqrcv_PQexec_v2.patch application/octet-stream 4.6 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Joe Conway <mail(at)joeconway(dot)com>
Subject: Re: walreceiver is uninterruptible on win32
Date: 2010-04-15 02:17:39
Message-ID: 20733.1271297859@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Magnus Hagander <magnus(at)hagander(dot)net> writes:
> Looking at the call-sites, there are bugs now - if PQexec() returns
> NULL, we don't deal with it. It also doesn't always free the result
> properly. I've added checks for that.

I think you're just adding useless complexity there. PQresultStatus
defends itself just fine against a NULL input, and most other libpq
functions likewise.

regards, tom lane


From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Joe Conway <mail(at)joeconway(dot)com>
Subject: Re: walreceiver is uninterruptible on win32
Date: 2010-04-15 03:13:29
Message-ID: p2h3f0b79eb1004142013heb708f31v52d7f20220f4f1df@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Apr 14, 2010 at 11:15 PM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
>> http://archives.postgresql.org/pgsql-hackers/2010-04/msg00077.php
>>> As for the code itself, don't we need a CHECK_FOR_INTERRUPTS in there
>>> for it to be actually useful?
>>
>> Since the backend version of select() receives any incoming signals
>> on Win32, CHECK_FOR_INTERRUPTS seems not to be needed in the loop,
>> at least in walreceiver. No? The patch doesn't put it in there, and
>> I was able to interrupt walreceiver executing libpqrcv_PQexec() when
>> I tested the patch on Win32.
>
> It will call the signal handler, yes. Normally, the signal handler
> just sets a flag somewhere, which needs to be checked with
> CHECK_FOR_INTERRUPTS.
>
> From how I read the walreceiver.c code (which I'm not that familiar
> with), the signal handlers call ProcessWalRcvInterrupts() which in
> turn has CHECK_FOR_INTERRUPTS in it, and this is where it ends up
> being called.

Yes. While establishing replication connection (i.e., executing
walrcv_connect function), the SIGTERM signal handler directly calls
ProcessWalRcvInterrupts() which does CHECK_FOR_INTERRUPTS() and
elog(FATAL).

>>> The patch also seems confused about whether it's intending to be a
>>> general-purpose solution or not.  You can maybe take some shortcuts
>>> if it's only going to be for walreceiver, but if you're going to put
>>> it in dblink it is *not* acceptable to take shortcuts like not
>>> processing errors completely.
>>
>> The patch still takes some shortcuts since we decided to postpone
>> the fix for dblink to 9.1 or later.
>
> Given those shortcuts, can't we simplify it even further like
> attached?

No, we need to repeat PQgetResult() at least two times. The first call
of it reads the RowDescription, DataRow and CommandComplete messages
and switches the state to PGASYNC_READY. The second one reads the
ReadyForQuery message and switches the state to PGASYNC_IDLE. So if we
don't repeat it, libpqrcv_PQexec() would end in a half-finished state.

> (If nothing else, your code did PQclear() on an
> uninitialized pointer - must've been pure luck that it worked)

PQclear(NULL) might be called in my patch, but this is not a problem
since PQclear() does nothing if the specified PGresult argument is NULL.

> Looking at the call-sites, there are bugs now - if PQexec() returns
> NULL, we don't deal with it. It also doesn't always free the result
> properly. I've added checks for that.

As Tom pointed out in another post, we don't need to treat the
"result is NULL" case as special. OTOH, as you pointed out, I
forgot calling PQclear() when the second call of libpqrcv_PQexec()
in libpqrcv_connect() fails. I added it to the patch. Thanks!

> Finally, I've updated some of the comments.

Thanks a lot! I applied that improvements to the patch.

I attached the revised patch.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment Content-Type Size
libpqrcv_PQexec_v3.patch application/octet-stream 5.5 KB

From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Joe Conway <mail(at)joeconway(dot)com>
Subject: Re: walreceiver is uninterruptible on win32
Date: 2010-04-15 07:45:55
Message-ID: o2g9837222c1004150045h32da37dcv975c7ad70f576143@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Apr 15, 2010 at 4:17 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Magnus Hagander <magnus(at)hagander(dot)net> writes:
>> Looking at the call-sites, there are bugs now - if PQexec() returns
>> NULL, we don't deal with it. It also doesn't always free the result
>> properly. I've added checks for that.
>
> I think you're just adding useless complexity there.  PQresultStatus
> defends itself just fine against a NULL input, and most other libpq
> functions likewise.

Yeah, I realized that after posting it. I was still stuck in ancient
times when at least some of those functions couldn't be called with
NULL pointers, so I just put that in there by default :-)

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Joe Conway <mail(at)joeconway(dot)com>
Subject: Re: walreceiver is uninterruptible on win32
Date: 2010-04-15 07:52:13
Message-ID: t2o9837222c1004150052ibbf2d08o8ee65ca584b163c3@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Apr 15, 2010 at 5:13 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> On Wed, Apr 14, 2010 at 11:15 PM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
>>>> The patch also seems confused about whether it's intending to be a
>>>> general-purpose solution or not.  You can maybe take some shortcuts
>>>> if it's only going to be for walreceiver, but if you're going to put
>>>> it in dblink it is *not* acceptable to take shortcuts like not
>>>> processing errors completely.
>>>
>>> The patch still takes some shortcuts since we decided to postpone
>>> the fix for dblink to 9.1 or later.
>>
>> Given those shortcuts, can't we simplify it even further like
>> attached?
>
> No, we need to repeat PQgetResult() at least two times. The first call
> of it reads the RowDescription, DataRow and CommandComplete messages
> and switches the state to PGASYNC_READY. The second one reads the
> ReadyForQuery message and switches the state to PGASYNC_IDLE. So if we
> don't repeat it, libpqrcv_PQexec() would end in a half-finished state.

Ah, ok. That's what I get for not testing it :-)

I still think that could be implemented in a much clearer way though.
Just calling PQgetResult() twice, and checking the return values
sequentially would be much easier to read, imho. Looking through taht
set of "break" statements at the end of the loop is just confusing. If
nothing else, it needs more comments.

But maybe I'm just bikeshedding ;)

>> (If nothing else, your code did PQclear() on an
>> uninitialized pointer - must've been pure luck that it worked)
>
> PQclear(NULL) might be called in my patch, but this is not a problem
> since PQclear() does nothing if the specified PGresult argument is NULL.

Ah, I missed that you initialized it to NULL.

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Joe Conway <mail(at)joeconway(dot)com>
Subject: Re: walreceiver is uninterruptible on win32
Date: 2010-04-15 14:26:05
Message-ID: z2g603c8f071004150726rf33e1c27vf48503eaf9630a6d@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Apr 14, 2010 at 11:13 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> On Wed, Apr 14, 2010 at 11:15 PM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
>>> http://archives.postgresql.org/pgsql-hackers/2010-04/msg00077.php
>>>> As for the code itself, don't we need a CHECK_FOR_INTERRUPTS in there
>>>> for it to be actually useful?
>>>
>>> Since the backend version of select() receives any incoming signals
>>> on Win32, CHECK_FOR_INTERRUPTS seems not to be needed in the loop,
>>> at least in walreceiver. No? The patch doesn't put it in there, and
>>> I was able to interrupt walreceiver executing libpqrcv_PQexec() when
>>> I tested the patch on Win32.
>>
>> It will call the signal handler, yes. Normally, the signal handler
>> just sets a flag somewhere, which needs to be checked with
>> CHECK_FOR_INTERRUPTS.
>>
>> From how I read the walreceiver.c code (which I'm not that familiar
>> with), the signal handlers call ProcessWalRcvInterrupts() which in
>> turn has CHECK_FOR_INTERRUPTS in it, and this is where it ends up
>> being called.
>
> Yes. While establishing replication connection (i.e., executing
> walrcv_connect function), the SIGTERM signal handler directly calls
> ProcessWalRcvInterrupts() which does CHECK_FOR_INTERRUPTS() and
> elog(FATAL).
>
>>>> The patch also seems confused about whether it's intending to be a
>>>> general-purpose solution or not.  You can maybe take some shortcuts
>>>> if it's only going to be for walreceiver, but if you're going to put
>>>> it in dblink it is *not* acceptable to take shortcuts like not
>>>> processing errors completely.
>>>
>>> The patch still takes some shortcuts since we decided to postpone
>>> the fix for dblink to 9.1 or later.
>>
>> Given those shortcuts, can't we simplify it even further like
>> attached?
>
> No, we need to repeat PQgetResult() at least two times. The first call
> of it reads the RowDescription, DataRow and CommandComplete messages
> and switches the state to PGASYNC_READY. The second one reads the
> ReadyForQuery message and switches the state to PGASYNC_IDLE. So if we
> don't repeat it, libpqrcv_PQexec() would end in a half-finished state.
>
>> (If nothing else, your code did PQclear() on an
>> uninitialized pointer - must've been pure luck that it worked)
>
> PQclear(NULL) might be called in my patch, but this is not a problem
> since PQclear() does nothing if the specified PGresult argument is NULL.
>
>> Looking at the call-sites, there are bugs now - if PQexec() returns
>> NULL, we don't deal with it. It also doesn't always free the result
>> properly. I've added checks for that.
>
> As Tom pointed out in another post, we don't need to treat the
> "result is NULL" case as special. OTOH, as you pointed out, I
> forgot calling PQclear() when the second call of libpqrcv_PQexec()
> in libpqrcv_connect() fails. I added it to the patch. Thanks!
>
>> Finally, I've updated some of the comments.
>
> Thanks a lot! I applied that improvements to the patch.
>
> I attached the revised patch.

I have to admit to finding this confusing. According to the comments:

+ /*
+ * Don't emulate the PQexec()'s behavior of returning the last
+ * result when there are many, since walreceiver never sends a
+ * query returning multiple results.
+ */

...but it looks like the code actually is implementing some sort of
loop-that-returns-the-last result.

...Robert


From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Joe Conway <mail(at)joeconway(dot)com>
Subject: Re: walreceiver is uninterruptible on win32
Date: 2010-04-16 07:03:36
Message-ID: s2k3f0b79eb1004160003ndc57ccebr729ffd9c3f42eb57@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Apr 15, 2010 at 11:26 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> I have to admit to finding this confusing.  According to the comments:
>
> +               /*
> +                * Don't emulate the PQexec()'s behavior of returning the last
> +                * result when there are many, since walreceiver never sends a
> +                * query returning multiple results.
> +                */
>
> ...but it looks like the code actually is implementing some sort of
> loop-that-returns-the-last result.

Yeah, it's not a very accurate description. And I found another problem:
libpqrcv_PQexec() ends as soon as an error result arrives even if its
state has not been PGASYNC_IDLE yet.

So I changed libpqrcv_PQexec() so that it emulates the PQexec()'s behavior
except the concatenation of error messages. How about the attached patch?

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment Content-Type Size
libpqrcv_PQexec_v4.patch application/octet-stream 5.6 KB

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Joe Conway <mail(at)joeconway(dot)com>
Subject: Re: walreceiver is uninterruptible on win32
Date: 2010-04-17 01:47:00
Message-ID: j2x603c8f071004161847r4c11fb01yfbdcdbe4b45c07f9@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Apr 16, 2010 at 3:03 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> On Thu, Apr 15, 2010 at 11:26 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> I have to admit to finding this confusing.  According to the comments:
>>
>> +               /*
>> +                * Don't emulate the PQexec()'s behavior of returning the last
>> +                * result when there are many, since walreceiver never sends a
>> +                * query returning multiple results.
>> +                */
>>
>> ...but it looks like the code actually is implementing some sort of
>> loop-that-returns-the-last result.
>
> Yeah, it's not a very accurate description. And I found another problem:
> libpqrcv_PQexec() ends as soon as an error result arrives even if its
> state has not been PGASYNC_IDLE yet.
>
> So I changed libpqrcv_PQexec() so that it emulates the PQexec()'s behavior
> except the concatenation of error messages. How about the attached patch?

Well, the comment definitely makes more sense in this version. I
can't speak to the behavior.

...Robert


From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Joe Conway <mail(at)joeconway(dot)com>
Subject: Re: walreceiver is uninterruptible on win32
Date: 2010-04-19 09:37:38
Message-ID: z2v3f0b79eb1004190237yd371d587j52243c9f0e9e67dd@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Apr 16, 2010 at 4:03 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> On Thu, Apr 15, 2010 at 11:26 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> I have to admit to finding this confusing.  According to the comments:
>>
>> +               /*
>> +                * Don't emulate the PQexec()'s behavior of returning the last
>> +                * result when there are many, since walreceiver never sends a
>> +                * query returning multiple results.
>> +                */
>>
>> ...but it looks like the code actually is implementing some sort of
>> loop-that-returns-the-last result.
>
> Yeah, it's not a very accurate description. And I found another problem:
> libpqrcv_PQexec() ends as soon as an error result arrives even if its
> state has not been PGASYNC_IDLE yet.
>
> So I changed libpqrcv_PQexec() so that it emulates the PQexec()'s behavior
> except the concatenation of error messages. How about the attached patch?

BTW, even if you apply the patch, you would not be able to interrupt the
walreceiver by using smart shutdown because of the bug reported in another
thread. Please be careful about that when you test the patch.
http://archives.postgresql.org/pgsql-hackers/2010-04/msg00592.php

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Joe Conway <mail(at)joeconway(dot)com>
Subject: Re: walreceiver is uninterruptible on win32
Date: 2010-04-19 14:11:29
Message-ID: t2l9837222c1004190711n7d3e3c7cy4601c646f367de9@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Apr 16, 2010 at 09:03, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> On Thu, Apr 15, 2010 at 11:26 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> I have to admit to finding this confusing.  According to the comments:
>>
>> +               /*
>> +                * Don't emulate the PQexec()'s behavior of returning the last
>> +                * result when there are many, since walreceiver never sends a
>> +                * query returning multiple results.
>> +                */
>>
>> ...but it looks like the code actually is implementing some sort of
>> loop-that-returns-the-last result.
>
> Yeah, it's not a very accurate description. And I found another problem:
> libpqrcv_PQexec() ends as soon as an error result arrives even if its
> state has not been PGASYNC_IDLE yet.
>
> So I changed libpqrcv_PQexec() so that it emulates the PQexec()'s behavior
> except the concatenation of error messages. How about the attached patch?

Applied with only minor stylistic changes. Thanks!

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/