Re: Streaming replication and non-blocking I/O

Lists: pgsql-hackers
From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Streaming replication and non-blocking I/O
Date: 2009-12-08 14:23:16
Message-ID: 4B1E6154.5000302@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I find the backend libpq changes related to non-blocking I/O quite
complex. Can we find a simpler solution?

The problem we're trying to solve is that while the walsender backend
sends a lot of WAL records to the client, the client can send a lot of
messages to the backend. If volume of the messages from client to server
exceeds both the input buffer in the server and the output buffer in the
client, the client will block until the server has read some data. But
if the client is blocked, it will not process incoming data from the
server, and eventually the server will block too. And we have a
deadlock. This:
http://florin.bjdean.id.au/docs/omnimark/omni55/docs/html/concept/717.htm
is a pretty good description of the problem.

The first question is: do we really need to be prepared for that? The
XLogRecPtr acknowledgment messages the client sends are very small, and
if the client is mindful about not sending them too often - perhaps max
1 ack per 1 received XLOG message - the receive buffer in the backend
should never fill up in practice.

If that's deemed not good enough, we could modify just internal_flush()
so that it uses secure_poll to wait for the possibility to either read
or write, instead of blocking for just write. Whenever there's incoming
data, read them into PqRecvBuffer for later processing, which keeps the
OS input buffer from filling up. If PqRecvBuffer fills up, it can be
extended, or we can start dropping old XLogRecPtr messages from it.

In any case, we'll need something like pq_wait to check if a message can
be read without blocking, but that's just a small additional function as
opposed to a whole new API for assembling and sending messages without
blocking.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Streaming replication and non-blocking I/O
Date: 2009-12-09 01:42:11
Message-ID: 3f0b79eb0912081742v14b7e516n29be35e7ef66b868@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Dec 8, 2009 at 11:23 PM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> The first question is: do we really need to be prepared for that? The
> XLogRecPtr acknowledgment messages the client sends are very small, and
> if the client is mindful about not sending them too often - perhaps max
> 1 ack per 1 received XLOG message - the receive buffer in the backend
> should never fill up in practice.

It's OK to drop that feature.

> If that's deemed not good enough, we could modify just internal_flush()
> so that it uses secure_poll to wait for the possibility to either read
> or write, instead of blocking for just write. Whenever there's incoming
> data, read them into PqRecvBuffer for later processing, which keeps the
> OS input buffer from filling up. If PqRecvBuffer fills up, it can be
> extended, or we can start dropping old XLogRecPtr messages from it.

Extending PqRecvBuffer seems better because XLogRecPtr message
has some types (i.e., we cannot just drop old message without parsing
all messages in the buffer).

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Streaming replication and non-blocking I/O
Date: 2009-12-09 06:58:44
Message-ID: 4B1F4AA4.2030305@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Fujii Masao wrote:
> On Tue, Dec 8, 2009 at 11:23 PM, Heikki Linnakangas
> <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>> If that's deemed not good enough, we could modify just internal_flush()
>> so that it uses secure_poll to wait for the possibility to either read
>> or write, instead of blocking for just write. Whenever there's incoming
>> data, read them into PqRecvBuffer for later processing, which keeps the
>> OS input buffer from filling up. If PqRecvBuffer fills up, it can be
>> extended, or we can start dropping old XLogRecPtr messages from it.
>
> Extending PqRecvBuffer seems better because XLogRecPtr message
> has some types (i.e., we cannot just drop old message without parsing
> all messages in the buffer).

True. Another idea I had was to introduce a callback that backend libpq
can call when the buffer fills. Walsender would set the callback to
ProcessStreamMsgs().

But if everyone is happy with just relying on the OS buffer to not fill
up, let's just drop it.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Streaming replication and non-blocking I/O
Date: 2009-12-09 07:55:41
Message-ID: 3f0b79eb0912082355kc122d5ai6af78354f380047b@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Dec 9, 2009 at 3:58 PM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> True. Another idea I had was to introduce a callback that backend libpq
> can call when the buffer fills. Walsender would set the callback to
> ProcessStreamMsgs().
>
> But if everyone is happy with just relying on the OS buffer to not fill
> up, let's just drop it.

The OS buffer is expected to be able to store a large number of
XLogRecPtr messages, because its size is small. So it's also OK
to just drop it.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Streaming replication and non-blocking I/O
Date: 2009-12-09 15:00:30
Message-ID: 1948.1260370830@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Fujii Masao <masao(dot)fujii(at)gmail(dot)com> writes:
> On Wed, Dec 9, 2009 at 3:58 PM, Heikki Linnakangas
> <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>> But if everyone is happy with just relying on the OS buffer to not fill
>> up, let's just drop it.

> The OS buffer is expected to be able to store a large number of
> XLogRecPtr messages, because its size is small. So it's also OK
> to just drop it.

It certainly seems to be something we could improve later, when and
if evidence emerges that it's a real-world problem. For now,
simple is beautiful.

regards, tom lane


From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Streaming replication and non-blocking I/O
Date: 2009-12-10 06:41:12
Message-ID: 3f0b79eb0912092241t77449d0bu52a46d7ad128e0fc@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Dec 10, 2009 at 12:00 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> The OS buffer is expected to be able to store a large number of
>> XLogRecPtr messages, because its size is small. So it's also OK
>> to just drop it.
>
> It certainly seems to be something we could improve later, when and
> if evidence emerges that it's a real-world problem.  For now,
> simple is beautiful.

I just dropped the backend libpq changes related to non-blocking I/O.

git://git.postgresql.org/git/users/fujii/postgres.git
branch: replication

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Streaming replication and non-blocking I/O
Date: 2009-12-12 08:09:06
Message-ID: 4B234FA2.8000900@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Fujii Masao wrote:
> On Thu, Dec 10, 2009 at 12:00 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> The OS buffer is expected to be able to store a large number of
>>> XLogRecPtr messages, because its size is small. So it's also OK
>>> to just drop it.
>> It certainly seems to be something we could improve later, when and
>> if evidence emerges that it's a real-world problem. For now,
>> simple is beautiful.
>
> I just dropped the backend libpq changes related to non-blocking I/O.
>
> git://git.postgresql.org/git/users/fujii/postgres.git
> branch: replication

Thanks, much simpler now.

Changing the finish_time argument to pqWaitTimed into timeout_ms changes
the behavior connect_timeout option to PQconnectdb. It should wait for
max connect_timeout seconds in total, but now it is waiting for
connect_timeout seconds at each step in the connection process: opening
a socket, authenticating etc.

Could we change the API of PQgetXLogData to be more like PQgetCopyData?
I'm thinking of removing the timeout argument, and instead looping with
select/poll and PQconsumeInput in the caller. That probably means
introducing a new state analogous to PGASYNC_COPY_IN. I haven't thought
this fully through yet, but it seems like it would be good to have a
consistent API.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Streaming replication and non-blocking I/O
Date: 2009-12-12 15:19:53
Message-ID: 5514.1260631193@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
> Changing the finish_time argument to pqWaitTimed into timeout_ms changes
> the behavior connect_timeout option to PQconnectdb. It should wait for
> max connect_timeout seconds in total, but now it is waiting for
> connect_timeout seconds at each step in the connection process: opening
> a socket, authenticating etc.

Refresh my memory as to why this patch is touching any of that code at
all?

regards, tom lane


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Streaming replication and non-blocking I/O
Date: 2009-12-12 20:42:25
Message-ID: 4B240031.7000608@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
>> Changing the finish_time argument to pqWaitTimed into timeout_ms changes
>> the behavior connect_timeout option to PQconnectdb. It should wait for
>> max connect_timeout seconds in total, but now it is waiting for
>> connect_timeout seconds at each step in the connection process: opening
>> a socket, authenticating etc.
>
> Refresh my memory as to why this patch is touching any of that code at
> all?

Walreceiver wants to wait for data to arrive from the master or a
signal. PQgetXLogData(), which is the libpq function to read a piece of
WAL, takes a timeout argument to support that. Walreceiver calls
PQgetXLogData() in an endless loop, checking for a received sighup or
death of postmaster at every iteration.

In the synchronous replication mode, I presume it's also going to listen
for a signal from the startup process, so that it can send a
acknowledgment to the master as soon as a COMMIT record has been
replayed that a backend on the master is waiting for.

To implement the timeout in PQgetXLogData(), pqWaitTimed() was changed
to take a timeout instead of finishing_time argument. Which is a mistake
because it breaks PQconnectdb, and as I said I don't think
PQgetXLogData(9 should have a timeout argument to begin with. Instead,
it should have a boolean 'async' argument to return immediately if
there's no data, and walreceiver main loop should call poll()/select()
to wait. Ie. just like PQgetCopyData() works.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Streaming replication and non-blocking I/O
Date: 2009-12-14 02:20:21
Message-ID: 3f0b79eb0912131820y253f6878r5990d643fee8a95b@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Dec 13, 2009 at 5:42 AM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> Walreceiver wants to wait for data to arrive from the master or a
> signal. PQgetXLogData(), which is the libpq function to read a piece of
> WAL, takes a timeout argument to support that. Walreceiver calls
> PQgetXLogData() in an endless loop, checking for a received sighup or
> death of postmaster at every iteration.
>
> In the synchronous replication mode, I presume it's also going to listen
> for a signal from the startup process, so that it can send a
> acknowledgment to the master as soon as a COMMIT record has been
> replayed that a backend on the master is waiting for.

Right.

> To implement the timeout in PQgetXLogData(), pqWaitTimed() was changed
> to take a timeout instead of finishing_time argument. Which is a mistake
> because it breaks PQconnectdb, and as I said I don't think
> PQgetXLogData(9 should have a timeout argument to begin with. Instead,
> it should have a boolean 'async' argument to return immediately if
> there's no data, and walreceiver main loop should call poll()/select()
> to wait. Ie. just like PQgetCopyData() works.

Seems good. I'll revise the code.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Streaming replication and non-blocking I/O
Date: 2009-12-14 02:38:55
Message-ID: 10108.1260758335@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Fujii Masao <masao(dot)fujii(at)gmail(dot)com> writes:
> On Sun, Dec 13, 2009 at 5:42 AM, Heikki Linnakangas
> <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>> To implement the timeout in PQgetXLogData(), pqWaitTimed() was changed
>> to take a timeout instead of finishing_time argument. Which is a mistake
>> because it breaks PQconnectdb, and as I said I don't think
>> PQgetXLogData(9 should have a timeout argument to begin with. Instead,
>> it should have a boolean 'async' argument to return immediately if
>> there's no data, and walreceiver main loop should call poll()/select()
>> to wait. Ie. just like PQgetCopyData() works.

> Seems good. I'll revise the code.

Do we need a new "PQgetXLogData" function at all? Seems like you could
shove the data through the COPY protocol and not have to touch libpq
at all, rather than duplicating a nontrivial amount of code there.

regards, tom lane


From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Streaming replication and non-blocking I/O
Date: 2009-12-14 03:56:07
Message-ID: 3f0b79eb0912131956t2912a4b0ua46e8b3acfe0c5fd@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Dec 14, 2009 at 11:38 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Do we need a new "PQgetXLogData" function at all?  Seems like you could
> shove the data through the COPY protocol and not have to touch libpq
> at all, rather than duplicating a nontrivial amount of code there.

Yeah, I also think that all data (the WAL data itself, its LSN and
the flag bits) which the "PQgetXLogData" handles could be shoved
through the COPY protocol. But, outside libpq, it's somewhat messy
to extract the LSN and the flag bits from the data buffer which
"PQgetCopyData" returns, by using ntohs(). So I provided the new
libpq function only for replication. That is, I didn't want to expose
the low layer of network which libpq should handle.

I think that the friendly function would be useful to implement
the standby program (e.g., a stand-alone walreceiver tool) outside
the core.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Streaming replication and non-blocking I/O
Date: 2009-12-14 13:43:58
Message-ID: 3f0b79eb0912140543i67ea696fv1088e14a9f929858@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Dec 12, 2009 at 5:09 PM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> Could we change the API of PQgetXLogData to be more like PQgetCopyData?
> I'm thinking of removing the timeout argument, and instead looping with
> select/poll and PQconsumeInput in the caller. That probably means
> introducing a new state analogous to PGASYNC_COPY_IN. I haven't thought
> this fully through yet, but it seems like it would be good to have a
> consistent API.

On a related issue, so far I haven't considered about the way to output
the notice message at all :( In the current SR, it's always written to
stderr by the defaultNoticeProcessor by using fprintf, whether the
log_destination is specified or not. This is bizarre, and would need to
be fixed.

I'm going to set the new function calling ereport as the current notice
processor by using PQsetNoticeProcessor. But the problem is that only the
completed message like "NOTICE: xxx" is passed to such notice processor,
i.e., the error level itself is not passed.

So I wonder which error level should be used to output the notice message.
There are some approaches to address this;

1. Always use a specific level without regard to the actual one
2. Reverse-engineer the level from the complete message
3. Change some libpq functions so as to pass the error level to the notice
processor

But nothing really stands out. Do you have another good idea?

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Streaming replication and non-blocking I/O
Date: 2009-12-14 14:33:52
Message-ID: 28589.1260801232@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Fujii Masao <masao(dot)fujii(at)gmail(dot)com> writes:
> On Mon, Dec 14, 2009 at 11:38 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Do we need a new "PQgetXLogData" function at all? Seems like you could
>> shove the data through the COPY protocol and not have to touch libpq
>> at all, rather than duplicating a nontrivial amount of code there.

> Yeah, I also think that all data (the WAL data itself, its LSN and
> the flag bits) which the "PQgetXLogData" handles could be shoved
> through the COPY protocol. But, outside libpq, it's somewhat messy
> to extract the LSN and the flag bits from the data buffer which
> "PQgetCopyData" returns, by using ntohs(). So I provided the new
> libpq function only for replication. That is, I didn't want to expose
> the low layer of network which libpq should handle.

I find that a completely unconvincing division of labor. Who is to say
that the LSN is the only part of the data that needs special treatment?

The very, very large practical problem with this is that if you decide
to change the behavior at any time, the only way to be sure that the WAL
receiver is using the right libpq version is to perform a soname major
version bump. The transformations done by libpq will essentially become
part of its ABI, and not a very visible part at that.

I am going to insist that no such logic be placed in libpq. From a
packager's standpoint that's insanity.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Streaming replication and non-blocking I/O
Date: 2009-12-14 15:56:13
Message-ID: 143.1260806173@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Fujii Masao <masao(dot)fujii(at)gmail(dot)com> writes:
> I'm going to set the new function calling ereport as the current notice
> processor by using PQsetNoticeProcessor. But the problem is that only the
> completed message like "NOTICE: xxx" is passed to such notice processor,
> i.e., the error level itself is not passed.

Use PQsetNoticeReceiver. The other one is just there for backwards
compatibility.

regards, tom lane


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Streaming replication and non-blocking I/O
Date: 2009-12-14 18:47:09
Message-ID: 4B26882D.8020708@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> The very, very large practical problem with this is that if you decide
> to change the behavior at any time, the only way to be sure that the WAL
> receiver is using the right libpq version is to perform a soname major
> version bump. The transformations done by libpq will essentially become
> part of its ABI, and not a very visible part at that.

Not having to change the libpq API would certainly be a big advantage.

It's going to be a bit more complicated in walsender/walreceiver to work
with the libpq COPY API. We're going to need a WAL sending/receiving
protocol on top of it, defined in terms of rows and columns passed
through the COPY protocol.

One problem is the the standby is supposed to send back acknowledgments
to the master, telling it how far it has received/replayed the WAL. Is
there any way to send information back to the server, while a COPY OUT
is in progress? That's not absolutely necessary with asynchronous
replication, but will be with synchronous.

One idea is to stop/start the COPY between every batch of WAL records
sent, giving the client (= walreceiver) a chance to send messages back.
But that will lead to extra round trips.

BTW, something that's been bothering me a bit with this patch is that we
now have to link the backend with libpq. I don't see an immediate
problem with that, but I'm not a packager. Does anyone see a problem
with that?

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Streaming replication and non-blocking I/O
Date: 2009-12-14 19:01:17
Message-ID: 12052.1260817277@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
> It's going to be a bit more complicated in walsender/walreceiver to work
> with the libpq COPY API. We're going to need a WAL sending/receiving
> protocol on top of it, defined in terms of rows and columns passed
> through the COPY protocol.

AFAIR, libpq knows essentially nothing of the data being passed through
COPY --- it just treats that as a byte stream. I think you can define
any data format you want, it doesn't need to look exactly like a COPY
of a table would. In fact it's probably a lot better if it DOESN'T
look like COPY data once it gets past libpq, so that you can check
that it is WAL and not COPY data.

> One problem is the the standby is supposed to send back acknowledgments
> to the master, telling it how far it has received/replayed the WAL. Is
> there any way to send information back to the server, while a COPY OUT
> is in progress? That's not absolutely necessary with asynchronous
> replication, but will be with synchronous.

Well, a real COPY would of course not stop to look for incoming
messages, but I don't think that's inherent in the protocol. You
would likely need some libpq adjustments so it didn't throw error
when you tried that, but it would be a small and one-time adjustment.

> BTW, something that's been bothering me a bit with this patch is that we
> now have to link the backend with libpq. I don't see an immediate
> problem with that, but I'm not a packager. Does anyone see a problem
> with that?

Yeah, I have a problem with that. What's the backend doing with libpq?
It's not receiving this data, it's sending it.

regards, tom lane


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Streaming replication and non-blocking I/O
Date: 2009-12-14 19:05:30
Message-ID: 4B268C7A.6080500@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
>> BTW, something that's been bothering me a bit with this patch is that we
>> now have to link the backend with libpq. I don't see an immediate
>> problem with that, but I'm not a packager. Does anyone see a problem
>> with that?
>
> Yeah, I have a problem with that. What's the backend doing with libpq?
> It's not receiving this data, it's sending it.

walreceiver is a postmaster subprocess too.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Streaming replication and non-blocking I/O
Date: 2009-12-14 19:11:45
Message-ID: 12218.1260817905@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
> Tom Lane wrote:
>> Yeah, I have a problem with that. What's the backend doing with libpq?
>> It's not receiving this data, it's sending it.

> walreceiver is a postmaster subprocess too.

Hm. Perhaps it should be a loadable plugin and not hard-linked into the
backend? Compare dblink.

The main concern I have with hard-linking libpq is that it has a lot of
symbol conflicts with the backend --- and at least the ones from
src/port/ aren't easily removed. I foresee problems that will be very
difficult to fix on platforms where we can't filter the set of link
symbols exposed by libpq. Linking a thread-enabled libpq into the
backend could also create problems on some platforms --- it would likely
cause a thread-capable libc to get linked, which is not what we want.

regards, tom lane


From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Streaming replication and non-blocking I/O
Date: 2009-12-15 06:07:20
Message-ID: 3f0b79eb0912142207s30c7176fnb4b7553e7f07100e@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Dec 15, 2009 at 4:11 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Hm.  Perhaps it should be a loadable plugin and not hard-linked into the
> backend?  Compare dblink.

You mean that such plugin is supplied in shared_preload_libraries,
a new process is forked and the shared-memory related to walreceiver
is created by using shmem_startup_hook? Since this approach would
solve the problem discussed previously, ISTM this makes sense.
http://archives.postgresql.org/pgsql-hackers/2009-11/msg00031.php

Some additional code might be required to control the termination
of walreceiver.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Streaming replication and non-blocking I/O
Date: 2009-12-16 03:28:26
Message-ID: 3f0b79eb0912151928w4002a79cu9616d310ab76dd93@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Dec 15, 2009 at 3:47 AM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> Tom Lane wrote:
>> The very, very large practical problem with this is that if you decide
>> to change the behavior at any time, the only way to be sure that the WAL
>> receiver is using the right libpq version is to perform a soname major
>> version bump.  The transformations done by libpq will essentially become
>> part of its ABI, and not a very visible part at that.
>
> Not having to change the libpq API would certainly be a big advantage.

Done; I replaced PQgetXLogData and PQputXLogRecPtr with PQgetCopyData and
PQputCopyData.

git://git.postgresql.org/git/users/fujii/postgres.git
branch: replication

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Streaming replication and non-blocking I/O
Date: 2009-12-16 09:53:49
Message-ID: 4B28AE2D.2030609@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Fujii Masao wrote:
> On Tue, Dec 15, 2009 at 3:47 AM, Heikki Linnakangas
> <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>> Tom Lane wrote:
>>> The very, very large practical problem with this is that if you decide
>>> to change the behavior at any time, the only way to be sure that the WAL
>>> receiver is using the right libpq version is to perform a soname major
>>> version bump. The transformations done by libpq will essentially become
>>> part of its ABI, and not a very visible part at that.
>> Not having to change the libpq API would certainly be a big advantage.
>
> Done; I replaced PQgetXLogData and PQputXLogRecPtr with PQgetCopyData and
> PQputCopyData.

Great! The logical next step is move the handling of TimelineID and
system identifier out of libpq as well.

I'm thinking of refactoring the protocol along these lines:

0. Begin by connecting to the master just like a normal backend does. We
don't necessarily need the new ProtocolVersion code either, though it's
probably still a good idea to reject connections to older server versions.

1. Get the system identifier of the master.

Slave -> Master: Query message, with a query string like
"GET_SYSTEM_IDENTIFIER"

Master -> Slave: RowDescription, DataRow CommandComplete, and
ReadyForQuery messages. The system identifier is returned in the DataRow
message.

This is identical to what happens when a query is executed against a
normal backend using the simple query protocol, so walsender can use
PQexec() for this.

2. Another query exchange like above, for timeline ID. (or these two
steps can be joined into one query, to eliminate one round-trip).

3. Request a backup history file, if needed:

Slave -> Master: Query message, with a query string like
"GET_BACKUP_HISTORY_FILE XXX" where XXX is XLogRecPtr or file name.

Master -> Slave: RowDescription, DataRow CommandComplete and
ReadyForQuery messages as usual. The file contents are returned in the
DataRow message.

4. Start replication

Slave -> Master: Query message, with query string "START REPLICATION:
XXXX", where XXXX is the RecPtr of the starting point.

Master -> Slave: CopyOutResponse followed by a continuous stream of
CopyData messages with WAL contents.

This minimizes the changes to the protocol and libpq, with a clear way
of extending by adding new commands. Similar to what you did a long time
ago, connecting as an actual backend at first and then switching to
walsender mode after running a few queries, but this would all be
handled in a separate loop in walsender instead of running as a
full-blown backend. We'll still need small changes to libpq to allow
sending messages back to the server in COPY_IN mode (maybe add a new
COPY_IN_OUT mode for that).

Thoughts?

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Greg Stark <stark(at)mit(dot)edu>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Subject: Re: Streaming replication and non-blocking I/O
Date: 2009-12-16 10:23:45
Message-ID: 407d949e0912160223y1e6745e8i3f4d792af947f9ec@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I'm interested in abstracting out features of replication from libpq too. It
would be nice if we could implement different communication bus modules.

For example if you have dozens of replicas you may want to use something
like spread to distribute the records using multicast.

Sorry for top posting -- I haven't yet figured out how not to in this
client.

On 16 Dec 2009 09:54, "Heikki Linnakangas" <
heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:

Fujii Masao wrote: > On Tue, Dec 15, 2009 at 3:47 AM, Heikki Linnakangas >
<heikki(dot)linnakangas(at)enter(dot)(dot)(dot)
Great! The logical next step is move the handling of TimelineID and
system identifier out of libpq as well.

I'm thinking of refactoring the protocol along these lines:

0. Begin by connecting to the master just like a normal backend does. We
don't necessarily need the new ProtocolVersion code either, though it's
probably still a good idea to reject connections to older server versions.

1. Get the system identifier of the master.

Slave -> Master: Query message, with a query string like
"GET_SYSTEM_IDENTIFIER"

Master -> Slave: RowDescription, DataRow CommandComplete, and
ReadyForQuery messages. The system identifier is returned in the DataRow
message.

This is identical to what happens when a query is executed against a
normal backend using the simple query protocol, so walsender can use
PQexec() for this.

2. Another query exchange like above, for timeline ID. (or these two
steps can be joined into one query, to eliminate one round-trip).

3. Request a backup history file, if needed:

Slave -> Master: Query message, with a query string like
"GET_BACKUP_HISTORY_FILE XXX" where XXX is XLogRecPtr or file name.

Master -> Slave: RowDescription, DataRow CommandComplete and
ReadyForQuery messages as usual. The file contents are returned in the
DataRow message.

4. Start replication

Slave -> Master: Query message, with query string "START REPLICATION:
XXXX", where XXXX is the RecPtr of the starting point.

Master -> Slave: CopyOutResponse followed by a continuous stream of
CopyData messages with WAL contents.

This minimizes the changes to the protocol and libpq, with a clear way
of extending by adding new commands. Similar to what you did a long time
ago, connecting as an actual backend at first and then switching to
walsender mode after running a few queries, but this would all be
handled in a separate loop in walsender instead of running as a
full-blown backend. We'll still need small changes to libpq to allow
sending messages back to the server in COPY_IN mode (maybe add a new
COPY_IN_OUT mode for that).

Thoughts?

-- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com --

Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org) To make
changes to your subscript...


From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Streaming replication and non-blocking I/O
Date: 2009-12-17 09:12:49
Message-ID: 3f0b79eb0912170112w69e9d98eke0b774ca3208ffd4@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Dec 16, 2009 at 6:53 PM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> Great! The logical next step is move the handling of TimelineID and
> system identifier out of libpq as well.

All right.

> 0. Begin by connecting to the master just like a normal backend does. We
> don't necessarily need the new ProtocolVersion code either, though it's
> probably still a good idea to reject connections to older server versions.

And, I think that such backend should switch to walsender mode when the startup
packet arrives. Otherwise, we would have to authenticate such backend twice
on different context, i.e., a normal backend and walsender. So the settings for
each context would be required in pg_hba.conf. This is odd, I think. Thought?

> 1. Get the system identifier of the master.
>
> Slave -> Master: Query message, with a query string like
> "GET_SYSTEM_IDENTIFIER"
>
> Master -> Slave: RowDescription, DataRow CommandComplete, and
> ReadyForQuery messages. The system identifier is returned in the DataRow
> message.
>
> This is identical to what happens when a query is executed against a
> normal backend using the simple query protocol, so walsender can use
> PQexec() for this.

s/walsender/walreceiver ?

A signal cannot cancel PQexec() during waiting for the message from the
server. We might need to change SIGTERM handler of walreceiver so as to
call proc_exit() immediately if it's during PQexec().

> 2. Another query exchange like above, for timeline ID. (or these two
> steps can be joined into one query, to eliminate one round-trip).
>
> 3. Request a backup history file, if needed:
>
> Slave -> Master: Query message, with a query string like
> "GET_BACKUP_HISTORY_FILE XXX" where XXX is XLogRecPtr or file name.
>
> Master -> Slave: RowDescription, DataRow CommandComplete and
> ReadyForQuery messages as usual. The file contents are returned in the
> DataRow message.
>
> 4. Start replication
>
> Slave -> Master: Query message, with query string "START REPLICATION:
> XXXX", where XXXX is the RecPtr of the starting point.
>
> Master -> Slave: CopyOutResponse followed by a continuous stream of
> CopyData messages with WAL contents.

Seems OK.

> This minimizes the changes to the protocol and libpq, with a clear way
> of extending by adding new commands. Similar to what you did a long time
> ago, connecting as an actual backend at first and then switching to
> walsender mode after running a few queries, but this would all be
> handled in a separate loop in walsender instead of running as a
> full-blown backend.

Agreed. Only walsender should be allowed to handle the query strings that
you proposed, in order that we avoid touching a parser.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Streaming replication and non-blocking I/O
Date: 2009-12-17 12:02:29
Message-ID: 4B2A1DD5.1050506@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Fujii Masao wrote:
> On Wed, Dec 16, 2009 at 6:53 PM, Heikki Linnakangas
> <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>> 0. Begin by connecting to the master just like a normal backend does. We
>> don't necessarily need the new ProtocolVersion code either, though it's
>> probably still a good idea to reject connections to older server versions.
>
> And, I think that such backend should switch to walsender mode when the startup
> packet arrives. Otherwise, we would have to authenticate such backend twice
> on different context, i.e., a normal backend and walsender. So the settings for
> each context would be required in pg_hba.conf. This is odd, I think. Thought?

True.

>> This is identical to what happens when a query is executed against a
>> normal backend using the simple query protocol, so walsender can use
>> PQexec() for this.
>
> s/walsender/walreceiver ?

Right.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Streaming replication and non-blocking I/O
Date: 2009-12-17 13:00:55
Message-ID: 3f0b79eb0912170500y24b5a1ag4f9440459b2b5bf7@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Dec 17, 2009 at 9:02 PM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>> And, I think that such backend should switch to walsender mode when the startup
>> packet arrives. Otherwise, we would have to authenticate such backend twice
>> on different context, i.e., a normal backend and walsender. So the settings for
>> each context would be required in pg_hba.conf. This is odd, I think. Thought?
>
> True.

Currently this switch depends on whether XLOG_STREAMING_CODE is sent from the
standby or not, also which depends on whether PQstartXLogStreaming() is called
or not. But, as the next step, we should get rid of also such changes of libpq.

I'm thinking of making the standby send the "walsender-switch-code" the same way
as application_name; walreceiver always specifies the option like
"replication=on"
in conninfo string and calls PQconnectdb(), which sends the code as a part of
startup packet. And, the environment variable for that should not be defined to
avoid user's mis-configuration, I think.

Thought? Better idea?

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Streaming replication and non-blocking I/O
Date: 2009-12-17 13:25:02
Message-ID: 4B2A312E.1030504@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Fujii Masao wrote:
> I'm thinking of making the standby send the "walsender-switch-code" the same way
> as application_name; walreceiver always specifies the option like
> "replication=on"
> in conninfo string and calls PQconnectdb(), which sends the code as a part of
> startup packet. And, the environment variable for that should not be defined to
> avoid user's mis-configuration, I think.

Sounds good.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Streaming replication and non-blocking I/O
Date: 2009-12-18 02:42:19
Message-ID: 3f0b79eb0912171842l76eda1bdj703a933fccbacd24@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Dec 17, 2009 at 10:25 PM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> Fujii Masao wrote:
>> I'm thinking of making the standby send the "walsender-switch-code" the same way
>> as application_name; walreceiver always specifies the option like
>> "replication=on"
>> in conninfo string and calls PQconnectdb(), which sends the code as a part of
>> startup packet. And, the environment variable for that should not be defined to
>> avoid user's mis-configuration, I think.
>
> Sounds good.

Okey. Design clarification again;

0. Begin by connecting to the master using PQconnectdb() with new conninfo
option specifying the request of replication. The startup packet with the
request is sent to the master, then the backend switches to the walsender
mode. The walsender goes into the main loop and wait for the request from
the walreceiver.

1. Get the system identifier of the master.

Slave -> Master: Query message, with a query string like
"GET_SYSTEM_IDENTIFIER"

Master -> Slave: RowDescription, DataRow CommandComplete, and
ReadyForQuery messages. The system identifier is returned in the DataRow
message.

2. Another query exchange like above, for timeline ID.

Slave -> Master: Query message, with a query string like
"GET_TIMELINE"

Master -> Slave: RowDescription, DataRow CommandComplete, and
ReadyForQuery messages. The timeline ID is returned in the DataRow
message.

3. Request a backup history file, if needed:

Slave -> Master: Query message, with a query string like
"GET_BACKUP_HISTORY_FILE XXX" where XXX is XLogRecPtr.

Master -> Slave: RowDescription, DataRow CommandComplete and
ReadyForQuery messages as usual. The file contents are returned in the
DataRow message.

In 1, 2, 3, the walreceiver uses PQexec() to send Query message and receive
the results.

4. Start replication

Slave -> Master: Query message, with query string "START REPLICATION:
XXXX", where XXXX is the RecPtr of the starting point.

Master -> Slave: CopyOutResponse followed by a continuous stream of
CopyData messages with WAL contents.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Streaming replication and non-blocking I/O
Date: 2009-12-21 12:56:13
Message-ID: 3f0b79eb0912210456m73871b88pf6cab723c2e0b4dc@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Dec 18, 2009 at 11:42 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> Okey. Design clarification again;
>
> 0. Begin by connecting to the master using PQconnectdb() with new conninfo
> option specifying the request of replication. The startup packet with the
> request is sent to the master, then the backend switches to the walsender
> mode. The walsender goes into the main loop and wait for the request from
> the walreceiver.
<snip>
> 4. Start replication
>
> Slave -> Master: Query message, with query string "START REPLICATION:
> XXXX", where XXXX is the RecPtr of the starting point.
>
> Master -> Slave: CopyOutResponse followed by a continuous stream of
> CopyData messages with WAL contents.

Done. Currently there is no new libpq function for replication. The
walreceiver uses only existing functions like PQconnectdb, PQexec,
PQgetCopyData, etc.

git://git.postgresql.org/git/users/fujii/postgres.git
branch: replication

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Streaming replication and non-blocking I/O
Date: 2009-12-21 17:26:16
Message-ID: 4B2FAFB8.3060603@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Fujii Masao wrote:
> On Fri, Dec 18, 2009 at 11:42 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>> Okey. Design clarification again;
>>
>> 0. Begin by connecting to the master using PQconnectdb() with new conninfo
>> option specifying the request of replication. The startup packet with the
>> request is sent to the master, then the backend switches to the walsender
>> mode. The walsender goes into the main loop and wait for the request from
>> the walreceiver.
> <snip>
>> 4. Start replication
>>
>> Slave -> Master: Query message, with query string "START REPLICATION:
>> XXXX", where XXXX is the RecPtr of the starting point.
>>
>> Master -> Slave: CopyOutResponse followed by a continuous stream of
>> CopyData messages with WAL contents.
>
> Done. Currently there is no new libpq function for replication. The
> walreceiver uses only existing functions like PQconnectdb, PQexec,
> PQgetCopyData, etc.

Ok thanks, sounds good, I'll take a look.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Streaming replication and non-blocking I/O
Date: 2009-12-21 17:31:01
Message-ID: 4B2FB0D5.5080703@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Fujii Masao wrote:
> On Tue, Dec 15, 2009 at 4:11 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Hm. Perhaps it should be a loadable plugin and not hard-linked into the
>> backend? Compare dblink.
>
> You mean that such plugin is supplied in shared_preload_libraries,
> a new process is forked and the shared-memory related to walreceiver
> is created by using shmem_startup_hook? Since this approach would
> solve the problem discussed previously, ISTM this makes sense.
> http://archives.postgresql.org/pgsql-hackers/2009-11/msg00031.php
>
> Some additional code might be required to control the termination
> of walreceiver.

I'm not sure which problem in that thread you're referring to, but I can
see two options:

1. Use dlopen()/dlsym() in walreceiver to use libpq. A bit awkward,
though we could write a bunch of macros to hide that and make the libpq
calls look normal.

2. Move walreceiver altogether into a loadable module, which is linked
as usual to libpq. Like e.g contrib/dblink.

Thoughts? Both seem reasonable to me. I tested the 2nd option (see
'replication' branch in my git repository), splitting walreceiver.c into
two: the functions that run in the walreceiver process, and the
functions that are called from other processes to control walreceiver.
That's a quite nice separation, though of course we could do that with
the 1st approach as well.

PS. I just merged with CVS HEAD. Streaming replication is pretty awesome
with Hot Standby!

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Streaming replication and non-blocking I/O
Date: 2009-12-21 18:21:28
Message-ID: 18174.1261419688@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
> Fujii Masao wrote:
> I'm not sure which problem in that thread you're referring to, but I can
> see two options:

> 1. Use dlopen()/dlsym() in walreceiver to use libpq. A bit awkward,
> though we could write a bunch of macros to hide that and make the libpq
> calls look normal.

> 2. Move walreceiver altogether into a loadable module, which is linked
> as usual to libpq. Like e.g contrib/dblink.

> Thoughts? Both seem reasonable to me.

From a packager's standpoint the second is much saner. If you want to
use dlopen() then you will have to know the exact name of the .so file
(e.g. libpq.so.5.3) and possibly its location too. Or you will have to
persuade packagers that they should ship bare "libpq.so" symlinks, which
is contrary to packaging standards on most Linux distros.
(walreceiver.so wouldn't be subject to those standards, but libpq is
because it's a regular library that can also be hard-linked by
applications.)

regards, tom lane


From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Streaming replication and non-blocking I/O
Date: 2009-12-22 03:18:22
Message-ID: 3f0b79eb0912211918h6edc58c9o9bf68f79fdc6203f@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Dec 22, 2009 at 2:31 AM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> 2. Move walreceiver altogether into a loadable module, which is linked
> as usual to libpq. Like e.g contrib/dblink.
>
> Thoughts? Both seem reasonable to me. I tested the 2nd option (see
> 'replication' branch in my git repository), splitting walreceiver.c into
> two: the functions that run in the walreceiver process, and the
> functions that are called from other processes to control walreceiver.
> That's a quite nice separation, though of course we could do that with
> the 1st approach as well.

Though I seem not to understand what a loadable module means, I wonder
how the walreceiver module is loaded. AFAIK, we need to manually install
the dblink functions by executing dblink.sql before using them. Likewise,
if we choose the 2nd option, we must manually install the walreceiver
module before starting replication?

Or we automatically install that by executing system_view.sql, like
pg_start_backup? I'd like to reduce the number of installation operations
as much as possible. Is my concern besides the point?

> PS. I just merged with CVS HEAD. Streaming replication is pretty awesome
> with Hot Standby!

Thanks!

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Streaming replication and non-blocking I/O
Date: 2009-12-22 04:46:05
Message-ID: 29118.1261457165@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Fujii Masao <masao(dot)fujii(at)gmail(dot)com> writes:
> Though I seem not to understand what a loadable module means, I wonder
> how the walreceiver module is loaded.

Put it in shared_preload_libraries, perhaps.

regards, tom lane


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Streaming replication and non-blocking I/O
Date: 2009-12-22 06:30:53
Message-ID: 4B30679D.2080700@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Fujii Masao wrote:
> On Tue, Dec 22, 2009 at 2:31 AM, Heikki Linnakangas
> <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>> 2. Move walreceiver altogether into a loadable module, which is linked
>> as usual to libpq. Like e.g contrib/dblink.
>>
>> Thoughts? Both seem reasonable to me. I tested the 2nd option (see
>> 'replication' branch in my git repository), splitting walreceiver.c into
>> two: the functions that run in the walreceiver process, and the
>> functions that are called from other processes to control walreceiver.
>> That's a quite nice separation, though of course we could do that with
>> the 1st approach as well.
>
> Though I seem not to understand what a loadable module means, I wonder
> how the walreceiver module is loaded. AFAIK, we need to manually install
> the dblink functions by executing dblink.sql before using them. Likewise,
> if we choose the 2nd option, we must manually install the walreceiver
> module before starting replication?

I think we can just use load_external_function() to load the library and
call WalReceiverMain from AuxiliaryProcessMain(). Ie. hard-code the
library name. Walreceiver is quite tightly coupled with the rest of the
backend anyway, so I don't think we need to come up with a pluggable API
at the moment.

That's the way I did it yesterday, see 'replication' branch in my git
repository, but it looks like I fumbled the commit so that some of the
changes were committed as part of the merge commit with origin/master
(=CVS HEAD). Sorry about that.

shared_preload_libraries seems like a bad place because the library
doesn't need to be loaded in all backends. Just the walreceiver process.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Streaming replication and non-blocking I/O
Date: 2009-12-22 07:21:06
Message-ID: 3f0b79eb0912212321v2086ca85k871ebcb76df5c814@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Dec 22, 2009 at 3:30 PM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> I think we can just use load_external_function() to load the library and
> call WalReceiverMain from AuxiliaryProcessMain(). Ie. hard-code the
> library name. Walreceiver is quite tightly coupled with the rest of the
> backend anyway, so I don't think we need to come up with a pluggable API
> at the moment.
>
> That's the way I did it yesterday, see 'replication' branch in my git
> repository, but it looks like I fumbled the commit so that some of the
> changes were committed as part of the merge commit with origin/master
> (=CVS HEAD). Sorry about that.

Umm.., I still cannot find the place where the walreceiver module is
loaded by using load_external_function() in your 'replication' branch.
Also the compilation of that branch fails. Is the 'pushed' branch the
latest? Sorry if I'm missing something.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


From: Greg Stark <gsstark(at)mit(dot)edu>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Streaming replication and non-blocking I/O
Date: 2009-12-22 11:36:03
Message-ID: 407d949e0912220336u595a05e0x20bd91b9fbc08d4d@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Dec 22, 2009 at 6:30 AM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> I think we can just use load_external_function() to load the library and
> call WalReceiverMain from AuxiliaryProcessMain(). Ie. hard-code the
> library name. Walreceiver is quite tightly coupled with the rest of the
> backend anyway, so I don't think we need to come up with a pluggable API
> at the moment.

Please? I am really interested in replacing walsender and walreceiver
with something which uses a communication bus like spread instead of a
single point to point connection.

ISTM if we start with something tightly coupled it'll be hard to
decouple later. Whereas if we start with a limited interface we'll
learn just how much information is really required by the modules and
will have fewer surprises later when we find suprising
interdependencies.

--
greg


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Streaming replication and non-blocking I/O
Date: 2009-12-22 11:49:14
Message-ID: 4B30B23A.3080605@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Fujii Masao wrote:
> On Tue, Dec 22, 2009 at 3:30 PM, Heikki Linnakangas
> <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>> I think we can just use load_external_function() to load the library and
>> call WalReceiverMain from AuxiliaryProcessMain(). Ie. hard-code the
>> library name. Walreceiver is quite tightly coupled with the rest of the
>> backend anyway, so I don't think we need to come up with a pluggable API
>> at the moment.
>>
>> That's the way I did it yesterday, see 'replication' branch in my git
>> repository, but it looks like I fumbled the commit so that some of the
>> changes were committed as part of the merge commit with origin/master
>> (=CVS HEAD). Sorry about that.
>
> Umm.., I still cannot find the place where the walreceiver module is
> loaded by using load_external_function() in your 'replication' branch.
> Also the compilation of that branch fails. Is the 'pushed' branch the
> latest? Sorry if I'm missing something.

Ah, I see. The changes were not included in the merge commit after all,
but I had simple forgot to "git add" them. Sorry about that, should be
there now.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Streaming replication and non-blocking I/O
Date: 2009-12-22 12:47:59
Message-ID: 4B30BFFF.9080505@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Greg Stark wrote:
> On Tue, Dec 22, 2009 at 6:30 AM, Heikki Linnakangas
> <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>> I think we can just use load_external_function() to load the library and
>> call WalReceiverMain from AuxiliaryProcessMain(). Ie. hard-code the
>> library name. Walreceiver is quite tightly coupled with the rest of the
>> backend anyway, so I don't think we need to come up with a pluggable API
>> at the moment.
>
> Please? I am really interested in replacing walsender and walreceiver
> with something which uses a communication bus like spread instead of a
> single point to point connection.

I think you'd still need to be able to request older WAL segments to
resync after a lost connection, restore from base backup etc., which
don't really fit into a publish/subscribe style communication bus. I'm
sure it could all be solved though. It would be a pretty cool feature,
for scaling to a large number of slaves.

> ISTM if we start with something tightly coupled it'll be hard to
> decouple later. Whereas if we start with a limited interface we'll
> learn just how much information is really required by the modules and
> will have fewer surprises later when we find suprising
> interdependencies.

I'm all ears if you have a concrete proposal.

I'm not too worried about it being hard to decouple later. The interface
is actually quite limited already, as the communication between
processes is done via shared memory. It probably wouldn't be hard to
turn it into an API, but I don't think there's a hurry to do that until
someone actually steps up to write an alternative walreceiver/walsender,

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Streaming replication and non-blocking I/O
Date: 2009-12-22 14:21:10
Message-ID: 3f0b79eb0912220621g7be883f3wc091d3a48b98b862@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Dec 22, 2009 at 8:49 PM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> Ah, I see. The changes were not included in the merge commit after all,
> but I had simple forgot to "git add" them. Sorry about that, should be
> there now.

Thanks for doing "git push" again!

But the compilation still fails.
Attached patch addresses this problem.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment Content-Type Size
fix_makefile_bug.patch text/x-patch 1.3 KB

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Streaming replication and non-blocking I/O
Date: 2010-01-04 15:22:53
Message-ID: 4B4207CD.9090301@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I've merged the replication branch with PostgreSQL CVS HEAD now,
including the patch for end-of-backup WAL records I committed earlier
today. See 'replication' branch in my git repository.

There's also a couple of other small changes: I believe the SSL stuff
isn't really necessary, so I removed it. I also moved the
START_REPLICATION phase from the walreceiver main loop to WalRcvConnect,
as it's simpler that way.

I will continue reviewing..

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Streaming replication and non-blocking I/O
Date: 2010-01-06 02:31:15
Message-ID: 3f0b79eb1001051831l5b65c4fbt2abc7989030d66a2@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Jan 5, 2010 at 12:22 AM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> I've merged the replication branch with PostgreSQL CVS HEAD now,
> including the patch for end-of-backup WAL records I committed earlier
> today. See 'replication' branch in my git repository.
>
> There's also a couple of other small changes: I believe the SSL stuff
> isn't really necessary, so I removed it. I also moved the
> START_REPLICATION phase from the walreceiver main loop to WalRcvConnect,
> as it's simpler that way.

I also fixed a couple of small bugs:

* The ErrorResponse message from the primary server had been ignored
* The segment-boundary had been wrongly handled
* Valid replication starting location had been wrongly regarded as invalid

git://git.postgresql.org/git/users/fujii/postgres.git
branch: replication

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Streaming replication and non-blocking I/O
Date: 2010-01-12 16:58:35
Message-ID: 3f0b79eb1001120858v11d2c0bbq72d096eb06a1e905@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Dec 22, 2009 at 8:49 PM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>> Umm.., I still cannot find the place where the walreceiver module is
>> loaded by using load_external_function() in your 'replication' branch.
>> Also the compilation of that branch fails. Is the 'pushed' branch the
>> latest? Sorry if I'm missing something.
>
> Ah, I see. The changes were not included in the merge commit after all,
> but I had simple forgot to "git add" them. Sorry about that, should be
> there now.

This change which moves walreceiver process into a dynamically loaded
module caused the following compile error on my MinGW environment.

---------------------------
gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith
-Wdeclaration-after-statement -Wendif-labels -fno-strict-aliasing
-fwrapv -g -I. -I../../../../src/interfaces/libpq
-I../../../../src/include -I./src/include/port/win32 -DEXEC_BACKEND
"-I../../../../src/include/port/win32" -DBUILDING_DLL -c -o
walreceiverproc.o walreceiverproc.c
dlltool --export-all --output-def libwalreceiverprocdll.def walreceiverproc.o
dllwrap -o walreceiverproc.dll --dllname walreceiverproc.dll --def
libwalreceiverprocdll.def walreceiverproc.o -L../../../../src/backend
-lpostgres -L../../../../src/interfaces/libpq -L../../../../src/port
-lpq
Info: resolving _pg_signal_mask by linking to __imp__pg_signal_mask
(auto-import)
Info: resolving _pg_signal_queue by linking to __imp__pg_signal_queue
(auto-import)
Info: resolving _InterruptPending by linking to
__imp__InterruptPending (auto-import)
Info: resolving _assert_enabled by linking to __imp__assert_enabled
(auto-import)
Info: resolving _WalRcv by linking to __imp__WalRcv (auto-import)
Info: resolving _proc_exit_inprogress by linking to
__imp__proc_exit_inprogress (auto-import)
Info: resolving _BlockSig by linking to __imp__BlockSig (auto-import)
Info: resolving _sync_method by linking to __imp__sync_method (auto-import)
Info: resolving _MyProcPid by linking to __imp__MyProcPid (auto-import)
Info: resolving _CurrentResourceOwner by linking to
__imp__CurrentResourceOwner (auto-import)
Info: resolving _TopMemoryContext by linking to
__imp__TopMemoryContext (auto-import)
Info: resolving _CurrentMemoryContext by linking to
__imp__CurrentMemoryContext (auto-import)
Info: resolving _PG_exception_stack by linking to
__imp__PG_exception_stack (auto-import)
Info: resolving _UnBlockSig by linking to __imp__UnBlockSig (auto-import)
Info: resolving _ThisTimeLineID by linking to __imp__ThisTimeLineID
(auto-import)
Info: resolving _error_context_stack by linking to
__imp__error_context_stack (auto-import)
Info: resolving _InterruptHoldoffCount by linking to
__imp__InterruptHoldoffCount (auto-import)
c:\MinGW\bin\..\lib\gcc\mingw32\3.4.2\..\..\..\..\mingw32\bin\ld.exe:
warning: auto-importing has been activated without
--enable-auto-import specified on the command line.
This should work unless it involves constant data structures
referencing symbols from auto-imported DLLs.
fu000001.o:(.idata$2+0xc): undefined reference to `libpostgres_a_iname'
fu000003.o:(.idata$2+0xc): undefined reference to `libpostgres_a_iname'
fu000005.o:(.idata$2+0xc): undefined reference to `libpostgres_a_iname'
fu000006.o:(.idata$2+0xc): undefined reference to `libpostgres_a_iname'
fu000008.o:(.idata$2+0xc): undefined reference to `libpostgres_a_iname'
fu000009.o:(.idata$2+0xc): more undefined references to
`libpostgres_a_iname' follow
nmth000000.o:(.idata$4+0x0): undefined reference to `_nm__pg_signal_mask'
nmth000002.o:(.idata$4+0x0): undefined reference to `_nm__pg_signal_queue'
nmth000004.o:(.idata$4+0x0): undefined reference to `_nm__InterruptPending'
nmth000007.o:(.idata$4+0x0): undefined reference to `_nm__assert_enabled'
nmth000012.o:(.idata$4+0x0): undefined reference to `_nm__WalRcv'
nmth000018.o:(.idata$4+0x0): undefined reference to `_nm__proc_exit_inprogress'
nmth000020.o:(.idata$4+0x0): undefined reference to `_nm__BlockSig'
nmth000023.o:(.idata$4+0x0): undefined reference to `_nm__sync_method'
nmth000026.o:(.idata$4+0x0): undefined reference to `_nm__MyProcPid'
nmth000028.o:(.idata$4+0x0): undefined reference to `_nm__CurrentResourceOwner'
nmth000030.o:(.idata$4+0x0): undefined reference to `_nm__TopMemoryContext'
nmth000032.o:(.idata$4+0x0): undefined reference to `_nm__CurrentMemoryContext'
nmth000035.o:(.idata$4+0x0): undefined reference to `_nm__PG_exception_stack'
nmth000037.o:(.idata$4+0x0): undefined reference to `_nm__UnBlockSig'
nmth000039.o:(.idata$4+0x0): undefined reference to `_nm__ThisTimeLineID'
nmth000041.o:(.idata$4+0x0): undefined reference to `_nm__error_context_stack'
nmth000043.o:(.idata$4+0x0): undefined reference to `_nm__InterruptHoldoffCount'
collect2: ld returned 1 exit status
c:\MinGW\bin\dllwrap.exe: c:\MinGW\bin\gcc exited with status 1
make[2]: *** [walreceiverproc.dll] Error 1
make[2]: Leaving directory
`/c/postgres/mmm/src/backend/postmaster/walreceiverproc'
make[1]: *** [all] Error 2
make[1]: Leaving directory `/c/postgres/mmm/src'
make: *** [all] Error 2
---------------------------

Though I marked the variables shown in the above message as PGDLLIMPORT,
the "make" still fails in the same way. I struggled with this issue
for some time, but
could not fix it yet :(

Frankly I'm not familiar with that area. So it would be nice if
someone could analyze
this issue.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Streaming replication and non-blocking I/O
Date: 2010-01-12 18:37:40
Message-ID: 9837222c1001121037o68ef98abs853e621644fed15f@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Jan 12, 2010 at 17:58, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> On Tue, Dec 22, 2009 at 8:49 PM, Heikki Linnakangas
> <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>>> Umm.., I still cannot find the place where the walreceiver module is
>>> loaded by using load_external_function() in your 'replication' branch.
>>> Also the compilation of that branch fails. Is the 'pushed' branch the
>>> latest? Sorry if I'm missing something.
>>
>> Ah, I see. The changes were not included in the merge commit after all,
>> but I had simple forgot to "git add" them. Sorry about that, should be
>> there now.
>
> This change which moves walreceiver process into a dynamically loaded
> module caused the following compile error on my MinGW environment.

That sounds strange - it should pick those up from the -lpostgres. Any
chance you have an old postgres binary around from a non-syncrep build
or something?

> ---------------------------
>
> Though I marked the variables shown in the above message as PGDLLIMPORT,
> the "make" still fails in the same way. I struggled with this issue
> for some time, but
> could not fix it yet :(
>
> Frankly I'm not familiar with that area. So it would be nice if
> someone could analyze
> this issue.

Do you have an environment to try to build it under msvc? in my
experience, that gives you easier-to-understand error messages in a
lot of cases like this - it removets the mingw black magic.

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/


From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Streaming replication and non-blocking I/O
Date: 2010-01-13 08:47:48
Message-ID: 3f0b79eb1001130047r336a9b95y725187eaa5cd152d@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Thanks for your advice!

On Wed, Jan 13, 2010 at 3:37 AM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
>> This change which moves walreceiver process into a dynamically loaded
>> module caused the following compile error on my MinGW environment.
>
> That sounds strange - it should pick those up from the -lpostgres. Any
> chance you have an old postgres binary around from a non-syncrep build
> or something?

No, there is no old postgres binary.

> Do you have an environment to try to build it under msvc?

No, unfortunately.

> in my
> experience, that gives you easier-to-understand error messages in a
> lot of cases like this - it removets the mingw black magic.

OK. I'll try to build it under msvc.

But since there seems to be a long way to go before doing that,
I would appreciate if someone could give me some advice.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Streaming replication and non-blocking I/O
Date: 2010-01-13 10:27:10
Message-ID: 4B4D9FFE.6090401@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Fujii Masao wrote:
> Done. Currently there is no new libpq function for replication. The
> walreceiver uses only existing functions like PQconnectdb, PQexec,
> PQgetCopyData, etc.
>
> git://git.postgresql.org/git/users/fujii/postgres.git
> branch: replication

Thanks!

I'm afraid we haven't quite nailed the select/poll issue yet. You copied
pq_wait() from the libpq pqSocketCheck(), but there's one big difference
between the backend and the frontend: the frontend always puts the
connection to non-blocking mode, while the backend uses blocking mode.
At least with SSL, I think it's possible for pq_wait() to return false
positives, if the SSL layer decides to renegotiate the connection
causing data to flow in the other direction in the underlying TCP
connection. A false positive would lead cause walsender to block
indefinitely on the pq_getbyte() call.

I don't even want to think about the changes required to put the backend
socket to non-blocking mode, I don't know that code well enough. Maybe
we could temporarily put it to non-blocking mode, read to see if there's
any data available, and put it back to blocking mode. But even then I
think we'd need to modify at least secure_read() to work correctly with
SSL in non-blocking mode.

Another idea is to use poll() to check for POLLHUP, on those platforms
that have poll(). AFAICS there is no equivalent for that in select(), so
for platforms that don't have poll() we would have to simply ignore the
issue or write some other platform-specific work-around (Windows
WSAEventSelect() seems to have a FD_CLOSE event for that). That would be
a quite localized change.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Streaming replication and non-blocking I/O
Date: 2010-01-14 09:02:46
Message-ID: 3f0b79eb1001140102n10fdf409id404c6d656aa245b@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Jan 13, 2010 at 7:27 PM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> the frontend always puts the
> connection to non-blocking mode, while the backend uses blocking mode.

Really? By default (i.e., without the expressly setting by using
PQsetnonblocking()), the connection is set to blocking mode even
in frontend. Am I missing something?

> At least with SSL, I think it's possible for pq_wait() to return false
> positives, if the SSL layer decides to renegotiate the connection
> causing data to flow in the other direction in the underlying TCP
> connection. A false positive would lead cause walsender to block
> indefinitely on the pq_getbyte() call.

Sorry. I could not understand that issue scenario. Could you explain
it in more detail?

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Streaming replication and non-blocking I/O
Date: 2010-01-14 10:04:24
Message-ID: 4B4EEC28.1040103@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Fujii Masao wrote:
> On Wed, Jan 13, 2010 at 7:27 PM, Heikki Linnakangas
> <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>> the frontend always puts the
>> connection to non-blocking mode, while the backend uses blocking mode.
>
> Really? By default (i.e., without the expressly setting by using
> PQsetnonblocking()), the connection is set to blocking mode even
> in frontend. Am I missing something?

That's right. The underlying socket is always put to non-blocking mode
in libpq. PQsetnonblocking() only affects whether libpq commands wait
and retry if the output buffer is full.

>> At least with SSL, I think it's possible for pq_wait() to return false
>> positives, if the SSL layer decides to renegotiate the connection
>> causing data to flow in the other direction in the underlying TCP
>> connection. A false positive would lead cause walsender to block
>> indefinitely on the pq_getbyte() call.
>
> Sorry. I could not understand that issue scenario. Could you explain
> it in more detail?

1. Walsender calls pq_wait() which calls select(), waiting for timeout,
or data to become available for reading in the underlying socket.

2. Client issues an SSL renegotiation by sending a message to the server

3. Server receives the message, and select() returns indicating that
data has arrived

4. Walsender calls HandleEndOfRep() which calls pq_getbyte().
pq_readbyte() calls SSL_read(), which receives the renegotiation message
and handles it. No application data has arrived, however, so SSL_read()
blocks for some to arrive. It never does.

I don't understand enough of SSL to know if renegotiation can actually
happen like that, but the man page of SSL_read() suggests so. But a
similar thing can happen if an SSL record is broken into two TCP
packets. select() returns immediately as the first packet arrives, but
SSL_read() will block until the 2nd packet arrives.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Streaming replication and non-blocking I/O
Date: 2010-01-14 10:09:45
Message-ID: 9837222c1001140209w7229bf12oe7a974299160a416@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2010/1/14 Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>:
> Fujii Masao wrote:
>> On Wed, Jan 13, 2010 at 7:27 PM, Heikki Linnakangas
>> <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>>> the frontend always puts the
>>> connection to non-blocking mode, while the backend uses blocking mode.
>>
>> Really? By default (i.e., without the expressly setting by using
>> PQsetnonblocking()), the connection is set to blocking mode even
>> in frontend. Am I missing something?
>
> That's right. The underlying socket is always put to non-blocking mode
> in libpq. PQsetnonblocking() only affects whether libpq commands wait
> and retry if the output buffer is full.
>
>>> At least with SSL, I think it's possible for pq_wait() to return false
>>> positives, if the SSL layer decides to renegotiate the connection
>>> causing data to flow in the other direction in the underlying TCP
>>> connection. A false positive would lead cause walsender to block
>>> indefinitely on the pq_getbyte() call.
>>
>> Sorry. I could not understand that issue scenario. Could you explain
>> it in more detail?
>
> 1. Walsender calls pq_wait() which calls select(), waiting for timeout,
> or data to become available for reading in the underlying socket.
>
> 2. Client issues an SSL renegotiation by sending a message to the server
>
> 3. Server receives the message, and select() returns indicating that
> data has arrived
>
> 4. Walsender calls HandleEndOfRep() which calls pq_getbyte().
> pq_readbyte() calls SSL_read(), which receives the renegotiation message
> and handles it. No application data has arrived, however, so SSL_read()
> blocks for some to arrive. It never does.
>
> I don't understand enough of SSL to know if renegotiation can actually
> happen like that, but the man page of SSL_read() suggests so. But a
> similar thing can happen if an SSL record is broken into two TCP
> packets. select() returns immediately as the first packet arrives, but
> SSL_read() will block until the 2nd packet arrives.

I *think* renegotiation happens based on amount of content, not amount
of time. But it could still happen in cornercases I think. If the
renegotiation happens right after a complete packet has been sent
(which would be the logical place), but not fast enough that the SSL
library gets it in one read() from the socket, you could end up in
that situation. (if the SSL library gets the renegotiation request as
part of the first read(), it would probably do the renegotiation
before returning from that call to SSL_read(), in which case the
socket would be in the correct state before you call select)

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Streaming replication and non-blocking I/O
Date: 2010-01-14 12:14:44
Message-ID: 4B4F0AB4.4010306@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

After reading up on SSL_read() and SSL_pending(), it seems that there is
unfortunately no reliable way of checking if there is incoming data that
can be read using SSL_read() without blocking, short of putting the
socket to non-blocking mode. It also seems that we can't rely on poll()
returning POLLHUP if the remote end has disconnected; it's not doing
that at least on my laptop.

So, the only solution I can see is to put the socket to non-blocking
mode. But to keep the change localized, let's switch to non-blocking
mode only temporarily, just when polling to see if there's data to read
(or EOF), and switch back immediately afterwards.

I've added a pq_getbyte_if_available() function to pqcomm.c to do that.
The API to the upper levels is quite nice, the function returns a byte
if one is available without blocking. Only minimal changes are required
elsewhere.

See that in my git repository. Attached is a new version of the whole
streaming replication patch, for the benefit of archives and git non-users.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Attachment Content-Type Size
sr-20100114.patch.gz application/x-gzip 47.2 KB

From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Streaming replication and non-blocking I/O
Date: 2010-01-14 12:46:07
Message-ID: 3f0b79eb1001140446h47a6e27er87caeee25359bfff@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Jan 14, 2010 at 9:14 PM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> After reading up on SSL_read() and SSL_pending(), it seems that there is
> unfortunately no reliable way of checking if there is incoming data that
> can be read using SSL_read() without blocking, short of putting the
> socket to non-blocking mode. It also seems that we can't rely on poll()
> returning POLLHUP if the remote end has disconnected; it's not doing
> that at least on my laptop.
>
> So, the only solution I can see is to put the socket to non-blocking
> mode. But to keep the change localized, let's switch to non-blocking
> mode only temporarily, just when polling to see if there's data to read
> (or EOF), and switch back immediately afterwards.

Agreed. Though I also read some pages referring to that issue,
I was not able to find any better action other than the temporal
switch of the blocking mode.

> I've added a pq_getbyte_if_available() function to pqcomm.c to do that.
> The API to the upper levels is quite nice, the function returns a byte
> if one is available without blocking. Only minimal changes are required
> elsewhere.

Great! Thanks a lot!

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Streaming replication and non-blocking I/O
Date: 2010-01-15 19:10:58
Message-ID: 4B50BDC2.5070304@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Fujii Masao wrote:
> On Wed, Jan 13, 2010 at 3:37 AM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
>>> This change which moves walreceiver process into a dynamically loaded
>>> module caused the following compile error on my MinGW environment.
>> That sounds strange - it should pick those up from the -lpostgres. Any
>> chance you have an old postgres binary around from a non-syncrep build
>> or something?
>
> No, there is no old postgres binary.
>
>> Do you have an environment to try to build it under msvc?
>
> No, unfortunately.
>
>> in my
>> experience, that gives you easier-to-understand error messages in a
>> lot of cases like this - it removets the mingw black magic.
>
> OK. I'll try to build it under msvc.
>
> But since there seems to be a long way to go before doing that,
> I would appreciate if someone could give me some advice.

It looks like dawn_bat is experiencing the same problem. I don't think
we want to sprinkle all those variables with PGDLLIMPORT, and it didn't
fix the problem for you earlier anyway. Is there some other way to fix this?

Do people still use MinGW for any real work? Could we just drop
walreceiver support from MinGW builds?

Or maybe we should consider splitting walreceiver into two parts after
all. Only the bare minimum that needs to access libpq would go into the
shared object, and the rest would be linked with the backend as usual.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Streaming replication and non-blocking I/O
Date: 2010-01-15 19:15:01
Message-ID: 9837222c1001151115r299387e3g95f2ad08678fc9@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2010/1/15 Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>:
> Fujii Masao wrote:
>> On Wed, Jan 13, 2010 at 3:37 AM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
>>>> This change which moves walreceiver process into a dynamically loaded
>>>> module caused the following compile error on my MinGW environment.
>>> That sounds strange - it should pick those up from the -lpostgres. Any
>>> chance you have an old postgres binary around from a non-syncrep build
>>> or something?
>>
>> No, there is no old postgres binary.
>>
>>> Do you have an environment to try to build it under msvc?
>>
>> No, unfortunately.
>>
>>> in my
>>> experience, that gives you easier-to-understand error messages in a
>>> lot of cases like this - it removets the mingw black magic.
>>
>> OK. I'll try to build it under msvc.
>>
>> But since there seems to be a long way to go before doing that,
>> I would appreciate if someone could give me some advice.
>
> It looks like dawn_bat is experiencing the same problem. I don't think
> we want to sprinkle all those variables with PGDLLIMPORT, and it didn't
> fix the problem for you earlier anyway. Is there some other way to fix this?
>
> Do people still use MinGW for any real work? Could we just drop
> walreceiver support from MinGW builds?

We don't know if this works on MSVC, because MSVC doesn't actually try
to build the walreceiver. I'm going to look at that tomorrow.

If we get the same issues there, we a problem in our code. If not, we
need to figure out what's up with mingw.

> Or maybe we should consider splitting walreceiver into two parts after
> all. Only the bare minimum that needs to access libpq would go into the
> shared object, and the rest would be linked with the backend as usual.

That would certainly be one option.

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Streaming replication and non-blocking I/O
Date: 2010-01-15 19:48:20
Message-ID: 4B50C684.30405@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Heikki Linnakangas wrote:
> Do people still use MinGW for any real work? Could we just drop
> walreceiver support from MinGW builds?
>
> Or maybe we should consider splitting walreceiver into two parts after
> all. Only the bare minimum that needs to access libpq would go into the
> shared object, and the rest would be linked with the backend as usual.
>
>

I use MinGW when doing Windows work (e.g. the threading piece in
parallel pg_restore). And I think it is generally desirable to be able
to build on Windows using an open source tool chain. I'd want a damn
good reason to abandon its use. And I don't like the idea of not
supporting walreceiver on it either. Please find another solution if
possible.

cheers

andrew


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Streaming replication and non-blocking I/O
Date: 2010-01-15 19:51:14
Message-ID: 9837222c1001151151s2eb0fae7p7b2b3826f9506f2c@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2010/1/15 Andrew Dunstan <andrew(at)dunslane(dot)net>:
>
>
> Heikki Linnakangas wrote:
>>
>> Do people still use MinGW for any real work? Could we just drop
>> walreceiver support from MinGW builds?
>>
>> Or maybe we should consider splitting walreceiver into two parts after
>> all. Only the bare minimum that needs to access libpq would go into the
>> shared object, and the rest would be linked with the backend as usual.
>>
>>
>
> I use MinGW when doing Windows work (e.g. the threading piece in parallel pg_restore).  And I think it is generally desirable to be able to build on Windows using an open source tool chain. I'd want a damn good reason to abandon its use. And I don't like the idea of not supporting walreceiver on it either. Please find another solution if possible.
>

Yeah. FWIW, I don't use mingw do do any windows development, but
definitely +1 on working hard to keep support for it if at all
possible.

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Streaming replication and non-blocking I/O
Date: 2010-01-15 20:19:54
Message-ID: 4B50CDEA.7080504@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Magnus Hagander wrote:
> 2010/1/15 Andrew Dunstan <andrew(at)dunslane(dot)net>:
>>
>> Heikki Linnakangas wrote:
>>> Do people still use MinGW for any real work? Could we just drop
>>> walreceiver support from MinGW builds?
>>>
>>> Or maybe we should consider splitting walreceiver into two parts after
>>> all. Only the bare minimum that needs to access libpq would go into the
>>> shared object, and the rest would be linked with the backend as usual.
>>>
>> I use MinGW when doing Windows work (e.g. the threading piece in parallel pg_restore). And I think it is generally desirable to be able to build on Windows using an open source tool chain. I'd want a damn good reason to abandon its use. And I don't like the idea of not supporting walreceiver on it either. Please find another solution if possible.
>
> Yeah. FWIW, I don't use mingw do do any windows development, but
> definitely +1 on working hard to keep support for it if at all
> possible.

Ok. I'll look at splitting walreceiver code between the shared module
and backend binary slightly differently. At first glance, it doesn't
seem that hard after all, and will make the code more modular anyway.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Streaming replication and non-blocking I/O
Date: 2010-01-15 20:25:08
Message-ID: 19931.1263587108@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
> Magnus Hagander wrote:
>> Yeah. FWIW, I don't use mingw do do any windows development, but
>> definitely +1 on working hard to keep support for it if at all
>> possible.

> Ok. I'll look at splitting walreceiver code between the shared module
> and backend binary slightly differently. At first glance, it doesn't
> seem that hard after all, and will make the code more modular anyway.

This is probably going in the wrong direction. There is no good reason
why that module should be failing to link, and I don't think it's going
to be "more modular" if you're forced to avoid any global variable
references at all in some arbitrary portion of the code.

I think it's a tools/build process problem and should be attacked that
way.

regards, tom lane


From: Aidan Van Dyk <aidan(at)highrise(dot)ca>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Streaming replication and non-blocking I/O
Date: 2010-01-15 20:27:10
Message-ID: 20100115202710.GU18076@oak.highrise.ca
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

* Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> [100115 15:20]:

> Ok. I'll look at splitting walreceiver code between the shared module
> and backend binary slightly differently. At first glance, it doesn't
> seem that hard after all, and will make the code more modular anyway.

Maybe an insane question, but why can postmaster just not "exec"
walreceiver? I mean, because of windows, we already have that code
around, and then walreceiver could link directly to libpq and not have
to worry at all about linking all of postmaster backends to libpq...

But I do understand that's a radical change...

a.
--
Aidan Van Dyk Create like a god,
aidan(at)highrise(dot)ca command like a king,
http://www.highrise.ca/ work like a slave.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Streaming replication and non-blocking I/O
Date: 2010-01-15 20:49:28
Message-ID: 20379.1263588568@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I wrote:
> I think it's a tools/build process problem and should be attacked that
> way.

Specifically, I think you missed out $(BE_DLLLIBS) in SHLIB_LINK.
We'll find out at the next mingw build...

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Aidan Van Dyk <aidan(at)highrise(dot)ca>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Streaming replication and non-blocking I/O
Date: 2010-01-15 20:53:16
Message-ID: 20437.1263588796@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Aidan Van Dyk <aidan(at)highrise(dot)ca> writes:
> Maybe an insane question, but why can postmaster just not "exec"
> walreceiver?

It'd greatly complicate access to shared memory.

regards, tom lane


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Streaming replication and non-blocking I/O
Date: 2010-01-15 21:09:48
Message-ID: 4B50D99C.4050907@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> I wrote:
>> I think it's a tools/build process problem and should be attacked that
>> way.
>
> Specifically, I think you missed out $(BE_DLLLIBS) in SHLIB_LINK.
> We'll find out at the next mingw build...

Thanks. But what is BE_DLLLIBS? I can't find any description of it.

I suspect the MinGW build will fail because of the missing PGDLLIMPORTs.
Before we sprinkle all the global variables it touches with that, let me
explain what I meant by dividing walreceiver code differently between
dynamically loaded module and backend code. Right now I have to go to
sleep, though, but I'll try to get back to during the weekend.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Streaming replication and non-blocking I/O
Date: 2010-01-15 21:47:18
Message-ID: 21219.1263592038@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
> Tom Lane wrote:
>> Specifically, I think you missed out $(BE_DLLLIBS) in SHLIB_LINK.
>> We'll find out at the next mingw build...

> Thanks. But what is BE_DLLLIBS? I can't find any description of it.

It was the wrong theory anyway --- it already is included (in
Makefile.shlib). But what it does is provide -lpostgres on platforms
where that is needed, such as mingw.

> I suspect the MinGW build will fail because of the missing PGDLLIMPORTs.

Yeah. On closer investigation the problem seems to be -DBUILDING_DLL,
which flips the meaning of PGDLLIMPORT. contrib/dblink, which surely
works and has the same linkage requirements as walreceiver, does *not*
use that. I've committed a patch to change that, we'll soon see if it
works...

> Before we sprinkle all the global variables it touches with that, let me
> explain what I meant by dividing walreceiver code differently between
> dynamically loaded module and backend code. Right now I have to go to
> sleep, though, but I'll try to get back to during the weekend.

Yeah, nothing to be done till we get another buildfarm cycle anyway.

regards, tom lane


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Streaming replication and non-blocking I/O
Date: 2010-01-15 23:30:44
Message-ID: 4B50FAA4.3010608@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
>> Before we sprinkle all the global variables it touches with that, let me
>> explain what I meant by dividing walreceiver code differently between
>> dynamically loaded module and backend code. Right now I have to go to
>> sleep, though, but I'll try to get back to during the weekend.
>>
>
> Yeah, nothing to be done till we get another buildfarm cycle anyway.
>
>
>

I ran an extra cycle. Still a bit of work to do:
<http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=dawn_bat&dt=2010-01-15%2023:04:54>

cheers

andrew


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Streaming replication and non-blocking I/O
Date: 2010-01-15 23:59:28
Message-ID: 24088.1263599968@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> I ran an extra cycle. Still a bit of work to do:
> <http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=dawn_bat&dt=2010-01-15%2023:04:54>

Well, at least now we're down to the variables that haven't got
PGDLLIMPORT, rather than wondering what's wrong with the build ...

regards, tom lane


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Streaming replication and non-blocking I/O
Date: 2010-01-16 07:49:42
Message-ID: 4B516F96.4060309@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
>> Before we sprinkle all the global variables it touches with that, let me
>> explain what I meant by dividing walreceiver code differently between
>> dynamically loaded module and backend code. Right now I have to go to
>> sleep, though, but I'll try to get back to during the weekend.
>
> Yeah, nothing to be done till we get another buildfarm cycle anyway.

Ok, looks like you did that anyway, let's see if it fixed it. Thanks.

So what I'm playing with is to pull walreceiver back into the backend
executable. To avoid the link dependency, walreceiver doesn't access
libpq directly, but loads a module dynamically which implements this
interface:

bool walrcv_connect(char *conninfo, XLogRecPtr startpoint)

Establish connection to the primary, and starts streaming from 'startpoint'.
Returns true on success.

bool walrcv_receive(int timeout, XLogRecPtr *recptr, char **buffer, int
*len)

Retrieve any WAL record available through the connection, blocking for
maximum of 'timeout' ms.

void walrcv_disconnect(void);

Disconnect.

This is the kind of API Greg Stark requested earlier
(http://archives.postgresql.org/message-id/407d949e0912220336u595a05e0x20bd91b9fbc08d4d@mail.gmail.com),
though I'm not planning to make it pluggable for 3rd party
implementations yet.

The module doesn't need to touch backend internals much at all, no
tinkering with shared memory for example, so I would feel much better
about moving that out of src/backend. Not sure where, though; it's not
an executable, so src/bin is hardly the right place, but I wouldn't want
to put it in contrib either, because it should still be built and
installed by default. So I'm inclined to still leave it in
src/backend/replication/

I've pushed that 'replication-dynmodule' branch in my git repo. The diff
is hard to read, because it mostly just moves code around, but I've
attached libpqwalreceiver.c here, which is the dynamic module part. You
can also browse the tree via the web interface
(http://git.postgresql.org/gitweb?p=users/heikki/postgres.git;a=tree;h=refs/heads/replication-dynmodule;hb=replication-dynmodule)

I like this division of labor much more than making the whole
walreceiver process a dynamically loaded module, so barring objections I
will review and test this more, and commit next week.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Attachment Content-Type Size
libpqwalreceiver.c text/x-csrc 8.7 KB

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Streaming replication and non-blocking I/O
Date: 2010-01-16 07:54:28
Message-ID: 4B5170B4.9020008@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Heikki Linnakangas wrote:
> I've pushed that 'replication-dynmodule' branch in my git repo. The diff
> is hard to read, because it mostly just moves code around, but I've
> attached libpqwalreceiver.c here, which is the dynamic module part. You
> can also browse the tree via the web interface
> (http://git.postgresql.org/gitweb?p=users/heikki/postgres.git;a=tree;h=refs/heads/replication-dynmodule;hb=replication-dynmodule)

I just noticed that the comment at the top of libpqwalreceiver.c is a
leftover, not much relevant to the contents of the file anymore, all the
signal handling and interaction with startup process is in
src/backend/replication/walreceiver.c now. That obviously needs to be
fixed before committing..

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Magnus Hagander <magnus(at)hagander(dot)net>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Streaming replication and non-blocking I/O
Date: 2010-01-16 12:55:12
Message-ID: m2eilqp5u7.fsf@hi-media.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
> The module doesn't need to touch backend internals much at all, no
> tinkering with shared memory for example, so I would feel much better
> about moving that out of src/backend. Not sure where, though; it's not
> an executable, so src/bin is hardly the right place, but I wouldn't want
> to put it in contrib either, because it should still be built and
> installed by default. So I'm inclined to still leave it in
> src/backend/replication/

It should be possible to be in contrib and installed by default, even
with the current tool set, by tweaking initdb to install the contrib
into template1. But that would be a packaging / dependency issue I guess
then.

Of course the extension system would ideally "create extension foo;" for
all foo in contrib at initdb time, then a user would have to "install
extension foo;" and be done with it.

Regards,
--
dim


From: Euler Taveira de Oliveira <euler(at)timbira(dot)com>
To: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Magnus Hagander <magnus(at)hagander(dot)net>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Streaming replication and non-blocking I/O
Date: 2010-01-16 13:21:24
Message-ID: 4B51BD54.6080504@timbira.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Dimitri Fontaine escreveu:
> It should be possible to be in contrib and installed by default, even
>
And it could be uninstall too. Let's not do it for core functionalities.

--
Euler Taveira de Oliveira
http://www.timbira.com/