Quick Links

Re: Synchronous replication patch v1

Lists:	pgsql-hackers

From:	"Fujii Masao" <masao(dot)fujii(at)gmail(dot)com>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	Synchronous replication patch v1
Date:	2008-10-31 11:36:39
Message-ID:	3f0b79eb0810310436w360f0afdy76ff1499b177ce0d@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Attached is a patch for a synchronous log-shipping replication which
was discussed just a month ago. I would like you to review this patch
in Nov commit fest.

The outline of this patch is as follow;

1) Walsender

This is new process to focus on sending xlog through the position
which a backend requests on commit. Walsender calculates the area
of xlog to be replicated by a logic similar to XLogWrite.

At first, walsender is forked as a normal backend by postmater (i.e.
the standby connects to postmaster just like normal frontend).
A backend works as walsender after receiving "mimic-walsender"
message. Then, walsender is handled differently from a backend.

Now, the number of walsenders is restricted to one.

2) Communication between backends and walsender

On commit, a backend tells walsender the position (LSN) to be
replicated via shmem, and wakes it up by signaling if needed.
Then, a backend sleeps until requested replication is completed.
At this time, walsender might signal a backend to wake up.

Synchronous and asynchronous replication mode are supported.
In async case, a backend basically don't need to sleep for replication.

User can tune a backend's max sleep time as a replication timeout.
Now, the timeout closes the connection to the standby, terminates
walsender, but the other postgres process continue to work.

3) Management of the xlog positions for replication

XLog positions are managed consistent. It's necessary to be careful
especially in AdvanceXLInsertBuffer and xlog_switch case.

4) Walreceiver

This is new contrib program to focus on receiving xlog and writing it.
User can specify the xlog location (where walreceiver writes xlog in
just after receiving), and the archive location (where walreceiver
archives a filled xlog file). This options are used to cooperate with
pg_standby (prevents pg_standby from reading the xlog file under
walreceiver writing)

The above is a necessary minimum function, and some requests
which came out in the discussion have not been implemented yet.
If there is other indispensable function, please let me know.

And, there are some problems in this patch;

* This patch is somewhat big, though it should be subdivided for
review.

* Source code comments and documents are insufficient.

Is it against the rule of commit fest to add such a status patch
into review-queue? If so, I would aim for 8.5. Otherwise,
I will deal with the problems also during commit fest.
What is your opinion?

For compile
----------------
* apply sync_replication_v1.patch to HEAD
* locate walsender.c on src/backend/postmaster
* locate walsender.h on src/include/postmaster
* locate walreceiver on contrib

How to use
---------------
1) Start postgres normally (don't need to configure any parameter)
2) Start walreceiver and connect with postmaster just like psql.
Wal streaming starts automatically.

Now, there are three configurable parameter in postgresql.conf.

> synchronous_replication = on # immediate replication at commit
> replication_timeout = 0ms # 0 is disabled
> wal_sender_delay = 200ms # 1-10000 milliseconds

The usage of walreceiver is as follow.

> Usage:
> walreceiver [OPTION]... [XLOGLOCATION] [ARCHIVELOCATION]
>
> Options:
> -h HOSTNAME database server host or socket directory (default: local socket)
> -p PORT database server port (default: "5432")
> -U NAME database user name (default: postgres)
> -? show usage

If you want to do replication by using walreceiver and pg_standby,
it's necessary to make the archive location of them the same.

Regards;

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment	Content-Type	Size
sync_replication_v1.tgz	application/x-gzip	26.6 KB

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Synchronous replication patch v1
Date:	2008-10-31 13:15:35
Message-ID:	490B04F7.90405@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Fujii Masao wrote:
> And, there are some problems in this patch;
>
> * This patch is somewhat big, though it should be subdivided for
> review.
>
> * Source code comments and documents are insufficient.
>
> Is it against the rule of commit fest to add such a status patch
> into review-queue? If so, I would aim for 8.5. Otherwise,
> I will deal with the problems also during commit fest.
> What is your opinion?

You can add work-in-progress patches and even just design docs to the
commitfest queue. That's perfectly OK. They will be reviewed as any
other work, but naturally if it's not a patch that's ready to be
committed without major work, it won't be committed.

I haven't looked at the patch yet, but if you think there's chances to
get it into shape for inclusion to 8.4, before the commit fest is over,
you can and should keep working on it and submit updated patches during
the commit fest. However, help with reviewing other patches would also
be very much appreciated. The idea of commitfests is that everyone stops
working on their own stuff, except for cleaning up and responding to
review comments on one's own patches that are in the queue, and helps to
review other people's patches.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Synchronous replication patch v1
Date:	2008-10-31 14:12:53
Message-ID:	490B1265.7010607@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Fujii Masao wrote:
> Attached is a patch for a synchronous log-shipping replication which
> was discussed just a month ago. I would like you to review this patch
> in Nov commit fest.

Here's some first quick comments:

AFAICS, there's no security, at all. Anyone that can log in, can become
a WAL sender, and receive all WAL for the whole cluster.

If the connection is jammed for a while, or just slow, is there
something that prevents the slave from falling so much behind that the
master checkpoints, archives, and deletes some WAL segments that are
still needed for the replication?

> The outline of this patch is as follow;
>
> 1) Walsender
>
> This is new process to focus on sending xlog through the position
> which a backend requests on commit. Walsender calculates the area
> of xlog to be replicated by a logic similar to XLogWrite.
>
> At first, walsender is forked as a normal backend by postmater (i.e.
> the standby connects to postmaster just like normal frontend).
> A backend works as walsender after receiving "mimic-walsender"
> message. Then, walsender is handled differently from a backend.
>
> Now, the number of walsenders is restricted to one.

That feels kinda weird. I think it would be better if the client
indicated in the startup message that it wants to become WAL sender.
It'll be needed for the authentication.

> And, there are some problems in this patch;
>
> * This patch is somewhat big, though it should be subdivided for
> review.

I've seen bigger :-). The signal handling changes might be a candidate
for splitting into a separate patch.

> * Source code comments and documents are insufficient.

Sure. (though I've seen worse :-)).

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	"Fujii Masao" <masao(dot)fujii(at)gmail(dot)com>
To:	"Heikki Linnakangas" <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Synchronous replication patch v1
Date:	2008-11-04 09:51:42
Message-ID:	3f0b79eb0811040151o50b74714p729a19b10cc8cb60@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Oct 31, 2008 at 10:15 PM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> Fujii Masao wrote:
>>
>> And, there are some problems in this patch;
>>
>> * This patch is somewhat big, though it should be subdivided for
>> review.
>>
>> * Source code comments and documents are insufficient.
>>
>> Is it against the rule of commit fest to add such a status patch
>> into review-queue? If so, I would aim for 8.5. Otherwise,
>> I will deal with the problems also during commit fest.
>> What is your opinion?
>
> You can add work-in-progress patches and even just design docs to the
> commitfest queue. That's perfectly OK. They will be reviewed as any other
> work, but naturally if it's not a patch that's ready to be committed without
> major work, it won't be committed.
>
> I haven't looked at the patch yet, but if you think there's chances to get
> it into shape for inclusion to 8.4, before the commit fest is over, you can
> and should keep working on it and submit updated patches during the commit
> fest. However, help with reviewing other patches would also be very much
> appreciated. The idea of commitfests is that everyone stops working on their
> own stuff, except for cleaning up and responding to review comments on one's
> own patches that are in the queue, and helps to review other people's
> patches.

OK, thanks Heikki. I will keep working on Synch Rep during commit-fest.

At first, as you say, I'll split the signal handling changes into an individual
patch ASAP.

Regards;

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

From:	"Fujii Masao" <masao(dot)fujii(at)gmail(dot)com>
To:	"Heikki Linnakangas" <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Synchronous replication patch v1
Date:	2008-11-04 13:59:58
Message-ID:	3f0b79eb0811040559q4c4483bdoc69528fefa3ebb37@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi, thank you for taking time to review the patch.

On Fri, Oct 31, 2008 at 11:12 PM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> Fujii Masao wrote:
>>
>> Attached is a patch for a synchronous log-shipping replication which
>> was discussed just a month ago. I would like you to review this patch
>> in Nov commit fest.
>
> Here's some first quick comments:
>
> AFAICS, there's no security, at all. Anyone that can log in, can become a
> WAL sender, and receive all WAL for the whole cluster.

One simple solution is to define the database only for replication. In
this solution,
we can handle the authentication for replication like the usual database access.
That is, pg_hba.conf, the cooperation with a database role, etc are
supported also
in replication. So, a user can set up the authentication rules easily.
ISTM that there
is no advantage which separates authentication for replication from the existing
mechanism.

How about this solution?

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Synchronous replication patch v1
Date:	2008-11-04 15:51:41
Message-ID:	49106F8D.9060204@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Fujii Masao wrote:
> On Fri, Oct 31, 2008 at 11:12 PM, Heikki Linnakangas
> <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>> AFAICS, there's no security, at all. Anyone that can log in, can become a
>> WAL sender, and receive all WAL for the whole cluster.
>
> One simple solution is to define the database only for replication. In
> this solution,
> we can handle the authentication for replication like the usual database access.
> That is, pg_hba.conf, the cooperation with a database role, etc are
> supported also
> in replication. So, a user can set up the authentication rules easily.

You mean like a pseudo database name in pg_hba.conf, and in the startup
message, that actually means "connect for replication"? Yeah, something
like that sounds reasonable to me.

> ISTM that there
> is no advantage which separates authentication for replication from
the existing
> mechanism.

Agreed.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	"Fujii Masao" <masao(dot)fujii(at)gmail(dot)com>
To:	"Heikki Linnakangas" <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Synchronous replication patch v1
Date:	2008-11-05 07:13:07
Message-ID:	3f0b79eb0811042313s7ecb493en23fbc860c5901395@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Nov 5, 2008 at 12:51 AM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> Fujii Masao wrote:
>>
>> On Fri, Oct 31, 2008 at 11:12 PM, Heikki Linnakangas
>> <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>>>
>>> AFAICS, there's no security, at all. Anyone that can log in, can become a
>>> WAL sender, and receive all WAL for the whole cluster.
>>
>> One simple solution is to define the database only for replication. In
>> this solution,
>> we can handle the authentication for replication like the usual database
>> access.
>> That is, pg_hba.conf, the cooperation with a database role, etc are
>> supported also
>> in replication. So, a user can set up the authentication rules easily.
>
> You mean like a pseudo database name in pg_hba.conf, and in the startup
> message, that actually means "connect for replication"? Yeah, something like
> that sounds reasonable to me.

Yes, I would define a pseudo database name for replication.

A backend works as walsender only if it received the startup packet
including the
database name for replication. But, authentication and initialization
continue till
ReadyForQuery is sent. So, I assume that walsender starts replication
after sending
ReadyForQuery and receiving a message for replication. In this design, some
features (e.g. post_auth_delay) are supported as they are. Another advantage is
that a client can use lipq, such as PQconnectdb, for the connection
for replication
as they are.

Between ReadyForQuery and a message for replication, a client can
issue some queries.
At least, my walreceiver would query timeline ID and request
xlog-switch (In my previous
patch, they are exchanged after walsender starts, but it has little
flexibility). Of course,
I have to create new function which returns current timeline ID.

Initial sequence of walsender
----------------
1) process the startup packet
1-1) if the database name for replication is specified, a backend
would declare postmaster
that I am walsender (remove its backend from BackendList, etc).
2) authentication and initialization (BackendRun, PostgresMain)
3) walsender sends ReadyForQuery
4) a client queries timeline ID and requests xlog-switch
6) a client requests the start of WAL streaming
6-1) if a backend is not walsender, it refuses the request.

I correct the code and post it ASAP.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Synchronous replication patch v1
Date:	2008-11-05 10:07:24
Message-ID:	1225879644.17744.57.camel@ebony.2ndQuadrant
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, 2008-11-04 at 22:59 +0900, Fujii Masao wrote:
> Hi, thank you for taking time to review the patch.
>
> On Fri, Oct 31, 2008 at 11:12 PM, Heikki Linnakangas
> <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> > Fujii Masao wrote:
> >>
> >> Attached is a patch for a synchronous log-shipping replication which
> >> was discussed just a month ago. I would like you to review this patch
> >> in Nov commit fest.
> >
> > Here's some first quick comments:
> >
> > AFAICS, there's no security, at all. Anyone that can log in, can become a
> > WAL sender, and receive all WAL for the whole cluster.
>
> One simple solution is to define the database only for replication. In
> this solution,
> we can handle the authentication for replication like the usual database access.
> That is, pg_hba.conf, the cooperation with a database role, etc are
> supported also
> in replication. So, a user can set up the authentication rules easily.
> ISTM that there
> is no advantage which separates authentication for replication from the existing
> mechanism.

It be easier to use libpq directly?. That would make it easier because
whatever connection method you have configured will work for replication
also.

We already have a protocol message for streaming data: COPY.

If you implemented the send as a new command, similar to COPY, it would
all work very easily. SENDFILE?

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support

From:	"Fujii Masao" <masao(dot)fujii(at)gmail(dot)com>
To:	"Simon Riggs" <simon(at)2ndquadrant(dot)com>
Cc:	"Heikki Linnakangas" <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Synchronous replication patch v1
Date:	2008-11-05 12:59:36
Message-ID:	3f0b79eb0811050459u74ea77b9uf3d2a59c6953ccd9@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi, Simon,

On Wed, Nov 5, 2008 at 7:07 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>
> On Tue, 2008-11-04 at 22:59 +0900, Fujii Masao wrote:
>> Hi, thank you for taking time to review the patch.
>>
>> On Fri, Oct 31, 2008 at 11:12 PM, Heikki Linnakangas
>> <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>> > Fujii Masao wrote:
>> >>
>> >> Attached is a patch for a synchronous log-shipping replication which
>> >> was discussed just a month ago. I would like you to review this patch
>> >> in Nov commit fest.
>> >
>> > Here's some first quick comments:
>> >
>> > AFAICS, there's no security, at all. Anyone that can log in, can become a
>> > WAL sender, and receive all WAL for the whole cluster.
>>
>> One simple solution is to define the database only for replication. In
>> this solution,
>> we can handle the authentication for replication like the usual database access.
>> That is, pg_hba.conf, the cooperation with a database role, etc are
>> supported also
>> in replication. So, a user can set up the authentication rules easily.
>> ISTM that there
>> is no advantage which separates authentication for replication from the existing
>> mechanism.
>
> It be easier to use libpq directly?. That would make it easier because
> whatever connection method you have configured will work for replication
> also.
>
> We already have a protocol message for streaming data: COPY.
>
> If you implemented the send as a new command, similar to COPY, it would
> all work very easily. SENDFILE?

Thank you for the suggestion. I will reconsider the protocol of WAL streaming
based on your suggestion.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Synchronous replication patch v1
Date:	2008-11-05 14:01:58
Message-ID:	1225893718.17744.95.camel@ebony.2ndQuadrant
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi Fujii,

Here's some initial thoughts on the structure of this. I've deliberately
not yet read other comments, so we have some independent viewpoints.
Sorry if that means we end up saying same thing twice.

On Fri, 2008-10-31 at 20:36 +0900, Fujii Masao wrote:

> 1) Walsender
>
> This is new process to focus on sending xlog through the position
> which a backend requests on commit. Walsender calculates the area
> of xlog to be replicated by a logic similar to XLogWrite.
>
> At first, walsender is forked as a normal backend by postmater (i.e.
> the standby connects to postmaster just like normal frontend).
> A backend works as walsender after receiving "mimic-walsender"
> message. Then, walsender is handled differently from a backend.
>
> Now, the number of walsenders is restricted to one.

I would think we would want this integrated into the server as an
additional special backend, similar to WALWriter. If it works for now,
that's fine for other testing. This is not an especially difficult
change, I can help with this.

> 2) Communication between backends and walsender
>
> On commit, a backend tells walsender the position (LSN) to be
> replicated via shmem, and wakes it up by signaling if needed.
> Then, a backend sleeps until requested replication is completed.
> At this time, walsender might signal a backend to wake up.
>
> Synchronous and asynchronous replication mode are supported.
> In async case, a backend basically don't need to sleep for replication.
>
> User can tune a backend's max sleep time as a replication timeout.
> Now, the timeout closes the connection to the standby, terminates
> walsender, but the other postgres process continue to work.

No comments until I've read the code.

> 3) Management of the xlog positions for replication
>
> XLog positions are managed consistent. It's necessary to be careful
> especially in AdvanceXLInsertBuffer and xlog_switch case.

Sounds good.

> 4) Walreceiver
>
> This is new contrib program to focus on receiving xlog and writing it.
> User can specify the xlog location (where walreceiver writes xlog in
> just after receiving), and the archive location (where walreceiver
> archives a filled xlog file). This options are used to cooperate with
> pg_standby (prevents pg_standby from reading the xlog file under
> walreceiver writing)

Again, I would expect this to be integrated with server. I would expect
code to live in src/postmaster/walreceiver.c, with main logic in a file
alongside xlog.c, perhaps xreceive.c. We would start WALReceiver when we
enter archive recovery mode - I already have logic for this state
change. After that you would be able to use the archive location
specified via recovery.conf.

The logic need not be any further integrated than you have here.

> The above is a necessary minimum function, and some requests
> which came out in the discussion have not been implemented yet.
> If there is other indispensable function, please let me know.
>
> And, there are some problems in this patch;
>
> * This patch is somewhat big, though it should be subdivided for
> review.
>
> * Source code comments and documents are insufficient.

Source code comments are essential. I try to put enough comments so that
each chunk of the patch has a comment to explain why that change is a
necessary part of the whole patch. Doing that is a good way to find
chunks that you can remove.

> Now, there are three configurable parameter in postgresql.conf.
>
> > synchronous_replication = on # immediate replication at commit
> > replication_timeout = 0ms # 0 is disabled
> > wal_sender_delay = 200ms # 1-10000 milliseconds

Could you write some docs for this? I just want to check how you think
it will work.

Does synchronous_replication = off mean
a) asynchronous replication or
b) no replication at all

I want to be able to specify synch, asynch or no replication.

We need an explanation and example of how to set this up when performing
a large initial base backup. Earlier we discussed using archiver to
transfer initial files and then switching to streaming mode later. How
does all that work now?
http://archives.postgresql.org/pgsql-hackers/2008-09/msg01208.php

I'll be looking at this a lot more over next few weeks/months, so this
is just a few short initial comments.

Well done for getting this together so quickly, especially with your
visit to hospital taking away time.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support

From:	"Fujii Masao" <masao(dot)fujii(at)gmail(dot)com>
To:	"Simon Riggs" <simon(at)2ndquadrant(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Synchronous replication patch v1
Date:	2008-11-06 06:59:04
Message-ID:	3f0b79eb0811052259w57e8cd6esde1f7780de4bf565@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi Simon,

On Wed, Nov 5, 2008 at 11:01 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> I would think we would want this integrated into the server as an
> additional special backend, similar to WALWriter. If it works for now,
> that's fine for other testing. This is not an especially difficult
> change, I can help with this.

I integrated walsender into the server as a special backend.
Please check "walsender process patch v1"

http://archives.postgresql.org/pgsql-hackers/2008-11/msg00294.php

> Again, I would expect this to be integrated with server. I would expect
> code to live in src/postmaster/walreceiver.c, with main logic in a file
> alongside xlog.c, perhaps xreceive.c. We would start WALReceiver when we
> enter archive recovery mode - I already have logic for this state
> change. After that you would be able to use the archive location
> specified via recovery.conf.

OK. I will try to integrate walreceiver into the server. But, I'm not
familiar with
Hot-Standby patch including the logic for such a state change. Which patch
do I need to check?

And, we have to decide where an user specifies host name and port number.
I think that recovery.conf is suitable for specifying them. And, If
they are not
specified in recovery.conf, walreceiver would not be invoked.

Is there any parameter required for walreceiver in addition to them?
(additional info for authentication?)

>> > synchronous_replication = on # immediate replication at commit
>> > replication_timeout = 0ms # 0 is disabled
>> > wal_sender_delay = 200ms # 1-10000 milliseconds
>
> Could you write some docs for this? I just want to check how you think
> it will work.
>
> Does synchronous_replication = off mean
> a) asynchronous replication or
> b) no replication at all
>
> I want to be able to specify synch, asynch or no replication.

synchronous_replication is very similar to synchronous_commit.
Docs is as follow.

8<------------------------------
Specifies whether transaction commit will wait for WAL records to be
replicated to
the standby before the command returns a "success" indication to the
client. The
default, and safe, setting is on. When off, there can be a delay between when
success is reported to the client and when the transaction is really
guaranteed to
be safe in the standby against a server crash. (The maximum delay is the same as
wal_sender_delay.) Unlike synchronous_commit, setting this parameter
to off might
cause inconsistency between the database in the primary and the transaction logs
in the standby.

This parameter can be changed at any time; the behavior for any one transaction
is determined by the setting in effect when it writes transaction
logs. It is therefore
possible, and useful, to have some transactions replication
synchronously and others
asynchronously. For example, to make a single multi-statement
transaction replication
asynchronously when the default is the opposite, issue
SET LOCAL synchronous_replication TO OFF within the transaction.
8<------------------------------

I would write the doc also about the other parameters.

> We need an explanation and example of how to set this up when performing
> a large initial base backup. Earlier we discussed using archiver to
> transfer initial files and then switching to streaming mode later. How
> does all that work now?
> http://archives.postgresql.org/pgsql-hackers/2008-09/msg01208.php

I assume the following procedure:

1) Start postgres in the primary
2) Get an online-backup in the primary
3) Locate the online-backup in the standby
4) Start postgres (with walreceiver) in the standby
# Configure restore_command, host of the primary and port in recovery.conf
5) Manual operation
# If there are missing files for PITR in the standby, copy them
from somewhere
(archive location of the primary, tape backup..etc).
The missing files might be xlog or history file. Since xlog
file segment is
switched when replication starts, the missing xlog files would
basically exist
in the archive location of the primary.

I would detail this procedure and write it in doc.

In previous discussion, there was a difference of opinion about who
copies missing
files, postgres (walsender and walreceiver) or outside of postgres.
Since we cannot
expect accurately where missing files are, I think that it's
unsuitable that postgres
copies them.

> I'll be looking at this a lot more over next few weeks/months, so this
> is just a few short initial comments.

Thank you for taking time to review the design!!

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

From:	"Fujii Masao" <masao(dot)fujii(at)gmail(dot)com>
To:	"Simon Riggs" <simon(at)2ndquadrant(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Synchronous replication patch v1
Date:	2008-11-06 08:42:48
Message-ID:	3f0b79eb0811060042p760eb90ap61abf25da1a093c5@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Nov 6, 2008 at 3:59 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> 1) Start postgres in the primary
> 2) Get an online-backup in the primary
> 3) Locate the online-backup in the standby
> 4) Start postgres (with walreceiver) in the standby
> # Configure restore_command, host of the primary and port in recovery.conf
> 5) Manual operation
> # If there are missing files for PITR in the standby, copy them
> from somewhere
> (archive location of the primary, tape backup..etc).
> The missing files might be xlog or history file. Since xlog
> file segment is
> switched when replication starts, the missing xlog files would
> basically exist
> in the archive location of the primary.

More properly, since startup process and walreceiver decide
timeline ID from the history files, all of them need to exist in
the standby (need copy if missing) before 4) starting postgres.

If the database whose timeline is the same as the primary's
exists in the standby, 2)3) getting new online-backup is not
necessary. For example, after the standby falls down, the
database at that time is applicable to restart it.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

From:	"Pavan Deolasee" <pavan(dot)deolasee(at)gmail(dot)com>
To:	"Fujii Masao" <masao(dot)fujii(at)gmail(dot)com>
Cc:	"Simon Riggs" <simon(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Synchronous replication patch v1
Date:	2008-11-06 12:35:22
Message-ID:	2e78013d0811060435u10e6542v65c21a6759a31001@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Nov 6, 2008 at 2:12 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>
> If the database whose timeline is the same as the primary's
> exists in the standby, 2)3) getting new online-backup is not
> necessary. For example, after the standby falls down, the
> database at that time is applicable to restart it.
>
>

If I remember correctly, when postgres finishes its recovery, it
increments the timeline. If this is true, whenever ACT fails and SBY
becomes primary, SBY would increment its timeline. So when the former
ACT comes back and joins the replication as SBY, would it need to get
a fresh backup before it can join as SBY ?

Thanks,
Pavan

--
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com

From:	"Fujii Masao" <masao(dot)fujii(at)gmail(dot)com>
To:	pavan(dot)deolasee(at)gmail(dot)com
Cc:	"Simon Riggs" <simon(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Synchronous replication patch v1
Date:	2008-11-10 09:45:27
Message-ID:	3f0b79eb0811100145w3d19add2q38c576eab4a6e619@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi, Pavan,

On Thu, Nov 6, 2008 at 9:35 PM, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com> wrote:
> On Thu, Nov 6, 2008 at 2:12 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>>
>> If the database whose timeline is the same as the primary's
>> exists in the standby, 2)3) getting new online-backup is not
>> necessary. For example, after the standby falls down, the
>> database at that time is applicable to restart it.
>>
>>
>
> If I remember correctly, when postgres finishes its recovery, it
> increments the timeline. If this is true, whenever ACT fails and SBY
> becomes primary, SBY would increment its timeline. So when the former
> ACT comes back and joins the replication as SBY, would it need to get
> a fresh backup before it can join as SBY ?

PITR from not online backup is tricky in the first place. We might not be
able to support the catch-up without a fresh online backup officially..

Furthermore, there is another problem. Please see the following mail.
http://archives.postgresql.org/pgsql-hackers/2008-09/msg00964.php

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

From:	"Fujii Masao" <masao(dot)fujii(at)gmail(dot)com>
To:	"Simon Riggs" <simon(at)2ndquadrant(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Synchronous replication patch v1
Date:	2008-11-11 04:27:16
Message-ID:	3f0b79eb0811102027t3cf3fd94x406671baaa2188dc@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

On Thu, Nov 6, 2008 at 3:59 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>> Again, I would expect this to be integrated with server. I would expect
>> code to live in src/postmaster/walreceiver.c, with main logic in a file
>> alongside xlog.c, perhaps xreceive.c. We would start WALReceiver when we
>> enter archive recovery mode - I already have logic for this state
>> change. After that you would be able to use the archive location
>> specified via recovery.conf.
>
> OK. I will try to integrate walreceiver into the server.

I report the current status of the coding. I'm going to post the next version
of the patch tomorrow. Please wait a little longer ;)

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center