Quick Links

Re: Cascade replication

Lists:	pgsql-hackers

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Cascade replication
Date:	2011-06-14 05:08:26
Message-ID:	BANLkTi=wZAR5DN28ZEyU6295+453_OeSAQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, May 25, 2011 at 2:01 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> I'd like to propose cascade replication feature (i.e., allow the
> standby to accept
> replication connection from another standby) for 9.2. This feature is useful to
> reduce the overhead of the master since by using that we can decrease the
> number of standbys directly connecting to the master.
>
> I attached the WIP patch, which changes walsender so that it starts replication
> even during recovery. Then, the walsender attempts to send all WAL that's
> already been fsync'd to the standby's disk (i.e., send WAL up to the bigger
> location between the receive location and the replay one). When the standby is
> promoted, all walsenders in that standby end because they cannot continue
> replication any more in that case because of the timeline mismatch.
>
> The standby must not accept replication connection from that standby itself.
> Otherwise, since any new WAL data would not appear in that standby,
> replication cannot advance any more. As a safeguard against this, I introduced
> new ID to identify each instance. The walsender sends that ID as the fourth
> field of the reply of IDENTIFY_SYSTEM, and then walreceiver checks whether
> the IDs are the same between two servers. If they are the same, which means
> that the standby is just connecting to that standby itself, so walreceiver
> emits ERROR.
>
> One remaining problem which I'll have to tackle is that: Even while walreceiver
> is not in progress (i.e., the startup process is retrieving WAL file from the
> archive), the cascading walsender should continuously send new WAL data.
> This means that the walsender should send the WAL file restored from the
> archive. The problem is that the name of such a restored WAL file is always
> "RECOVERYXLOG". For now, walsender cannot handle the WAL file with such
> a name.
>
> To address the above problem, I'm thinking to make the startup process restore
> the WAL file with its real name instead of "RECOVERYXLOG". Then, like in the
> master, the walsender can read and send the restored WAL file. The required
> WAL file can be recycled before being sent. So we might need to enable
> wal_keep_segments setting even in the standby.

Done.

Updated patch attached.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment	Content-Type	Size
cascade_replication_v1.patch	text/x-patch	43.2 KB

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Cascade replication
Date:	2011-07-04 09:24:15
Message-ID:	CA+U5nMKciZ6gZeQ060mn_UsmS45QdjAva-uBRMzjHpV7pU1mOA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Jun 14, 2011 at 6:08 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:

>> The standby must not accept replication connection from that standby itself.
>> Otherwise, since any new WAL data would not appear in that standby,
>> replication cannot advance any more. As a safeguard against this, I introduced
>> new ID to identify each instance. The walsender sends that ID as the fourth
>> field of the reply of IDENTIFY_SYSTEM, and then walreceiver checks whether
>> the IDs are the same between two servers. If they are the same, which means
>> that the standby is just connecting to that standby itself, so walreceiver
>> emits ERROR.

Thanks for waiting for review.

This part of the patch is troubling me. I think you have identified an
important problem, but this solution doesn't work fully.

If we allow standbys to connect to other standbys then we have
problems with standbys not being connected to master. This can occur
with a 1-step connection, as you point out, but it could also occur
with a 2-step, 3-step or more connection, where a circle of standbys
are all depending upon each other. Your solution only works for 1-step
connections. "Solving" that problem in a general sense might be more
dangerous than leaving it alone. I think we should think some more
about the issues there and approach them as a separate problem.

I think we should remove that and just focus on the main problem, for
now. That will make it a simpler patch and easier to commit.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Cascade replication
Date:	2011-07-05 03:34:58
Message-ID:	CAHGQGwFnY1FWvUz-b83MucyKF-t+syNZA_qHB4nRtHtoHtWong@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Jul 4, 2011 at 6:24 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> On Tue, Jun 14, 2011 at 6:08 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>
>>> The standby must not accept replication connection from that standby itself.
>>> Otherwise, since any new WAL data would not appear in that standby,
>>> replication cannot advance any more. As a safeguard against this, I introduced
>>> new ID to identify each instance. The walsender sends that ID as the fourth
>>> field of the reply of IDENTIFY_SYSTEM, and then walreceiver checks whether
>>> the IDs are the same between two servers. If they are the same, which means
>>> that the standby is just connecting to that standby itself, so walreceiver
>>> emits ERROR.
>
> Thanks for waiting for review.

Thanks for the review!

> This part of the patch is troubling me. I think you have identified an
> important problem, but this solution doesn't work fully.
>
> If we allow standbys to connect to other standbys then we have
> problems with standbys not being connected to master. This can occur
> with a 1-step connection, as you point out, but it could also occur
> with a 2-step, 3-step or more connection, where a circle of standbys
> are all depending upon each other. Your solution only works for 1-step
> connections. "Solving" that problem in a general sense might be more
> dangerous than leaving it alone. I think we should think some more
> about the issues there and approach them as a separate problem.
>
> I think we should remove that and just focus on the main problem, for
> now. That will make it a simpler patch and easier to commit.

I agree to focus on the main problem first. I removed that. Attached
is the updated version.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment	Content-Type	Size
cascade_replication_v2.patch	text/x-patch	32.1 KB

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Cascade replication
Date:	2011-07-05 11:08:11
Message-ID:	CA+U5nMJRCfU-crF68MR+O_37uTXPz8NHFGtGV8ofS2=btdi6Og@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Jul 5, 2011 at 4:34 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> On Mon, Jul 4, 2011 at 6:24 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>> On Tue, Jun 14, 2011 at 6:08 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>>
>>>> The standby must not accept replication connection from that standby itself.
>>>> Otherwise, since any new WAL data would not appear in that standby,
>>>> replication cannot advance any more. As a safeguard against this, I introduced
>>>> new ID to identify each instance. The walsender sends that ID as the fourth
>>>> field of the reply of IDENTIFY_SYSTEM, and then walreceiver checks whether
>>>> the IDs are the same between two servers. If they are the same, which means
>>>> that the standby is just connecting to that standby itself, so walreceiver
>>>> emits ERROR.
>>
>> Thanks for waiting for review.
>
> Thanks for the review!

> I agree to focus on the main problem first. I removed that. Attached
> is the updated version.

Now for the rest of the review...

I'd rather not include another chunk of code related to
wal_keep_segments. The existing code in CreateCheckPoint() should be
refactored so that we call the same code from both CreateCheckPoint()
and CreateRestartPoint().

IMHO it's time to get rid of RECOVERYXLOG as an initial target for
de-archived files. That made sense once, but now we have streaming it
makes more sense for us to de-archive straight onto the correct file
name and let the file be cleaned up later. So de-archiving it and then
copying to the new location doesn't seem the right thing to do
(especially not to copy rather than rename). RECOVERYXLOG allowed us
to de-archive the file without removing a pre-existing file, so we
must handle that still - the current patch would fail if a
pre-existing WAL file were there.

Those changes will make this code cleaner for the long term.

I don't think we should simply shutdown a WALSender when we startup.
That is indistinguishable from a failure, which is going to be very
worrying if we do a switchover. Is there another way to do this? Or if
not, at least a log message to explain it was normal that we requested
this.

It would be possible to have synchronous cascaded replication but that
is probably another patch :-)

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Cascade replication
Date:	2011-07-06 01:13:35
Message-ID:	CAHGQGwHMy0-nj-MO0Lr2vm_deheb=Le86y6U1jdkR+ERjmg7kw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Jul 5, 2011 at 8:08 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> Now for the rest of the review...

Thanks!

> I'd rather not include another chunk of code related to
> wal_keep_segments. The existing code in CreateCheckPoint() should be
> refactored so that we call the same code from both CreateCheckPoint()
> and CreateRestartPoint().

This makes sense. Will do.

> IMHO it's time to get rid of RECOVERYXLOG as an initial target for
> de-archived files. That made sense once, but now we have streaming it
> makes more sense for us to de-archive straight onto the correct file
> name and let the file be cleaned up later. So de-archiving it and then
> copying to the new location doesn't seem the right thing to do
> (especially not to copy rather than rename). RECOVERYXLOG allowed us
> to de-archive the file without removing a pre-existing file, so we
> must handle that still - the current patch would fail if a
> pre-existing WAL file were there.

You mean that we must keep a pre-existing file? If so, why do we need to
do that? Since it's older than an archived file, it seems to be OK to overwrite
it with an archived file. Or is there corner case that an archived file is older
than a pre-existing one?

If we don't need to keep a pre-existing file, I'll change the way to de-archive
according to your suggestion, as follows;

1. Rename a pre-existing file to EXISTINGXLOG
2. De-archive the file onto the correct name
3. If the de-archived file is invalid (i.e., its size is not 16MB),
remove it and
rename EXISTINGXLOG to the correct name
4. If the de-archived file is valid, remove EXISTINGXLOG
5. Replay the file with the correct name

1. De-archive the file to RECOVERYXLOG
2. If RECOVERYXLOG is valid, remove a pre-existing one and rename
RECOVERYXLOG to the correct name
3. Replay the file with the correct name

> Those changes will make this code cleaner for the long term.
>
> I don't think we should simply shutdown a WALSender when we startup.
> That is indistinguishable from a failure, which is going to be very
> worrying if we do a switchover. Is there another way to do this? Or if
> not, at least a log message to explain it was normal that we requested
> this.

What about outputing something like the following message in that case?

if ("walsender receives SIGUSR2")
ereport(LOG, "terminating walsender process due to
administrator command");

> It would be possible to have synchronous cascaded replication but that
> is probably another patch :-)

Yeah, right. You'll try? ;)

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Cascade replication
Date:	2011-07-06 05:44:10
Message-ID:	CA+U5nM+OfZr7aLs7-DYy0V-DeC-QcPArKWqB-kiKR+_rQHzJ0Q@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Jul 6, 2011 at 2:13 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:

>> IMHO it's time to get rid of RECOVERYXLOG as an initial target for
>> de-archived files. That made sense once, but now we have streaming it
>> makes more sense for us to de-archive straight onto the correct file
>> name and let the file be cleaned up later. So de-archiving it and then
>> copying to the new location doesn't seem the right thing to do
>> (especially not to copy rather than rename). RECOVERYXLOG allowed us
>> to de-archive the file without removing a pre-existing file, so we
>> must handle that still - the current patch would fail if a
>> pre-existing WAL file were there.
>

<snip>

> If we don't need to keep a pre-existing file, I'll change the way to de-archive
> according to your suggestion, as follows;
>
> 1. Rename a pre-existing file to EXISTINGXLOG
> 2. De-archive the file onto the correct name
> 3. If the de-archived file is invalid (i.e., its size is not 16MB),
> remove it and
> rename EXISTINGXLOG to the correct name
> 4. If the de-archived file is valid, remove EXISTINGXLOG
> 5. Replay the file with the correct name

I'm laughing quite hard here... :-)

> Or
>
> 1. De-archive the file to RECOVERYXLOG
> 2. If RECOVERYXLOG is valid, remove a pre-existing one and rename
> RECOVERYXLOG to the correct name
> 3. Replay the file with the correct name

Yes please, that makes sense.

>> Those changes will make this code cleaner for the long term.
>>
>> I don't think we should simply shutdown a WALSender when we startup.
>> That is indistinguishable from a failure, which is going to be very
>> worrying if we do a switchover. Is there another way to do this? Or if
>> not, at least a log message to explain it was normal that we requested
>> this.
>
> What about outputing something like the following message in that case?
>
> if ("walsender receives SIGUSR2")
> ereport(LOG, "terminating walsender process due to
> administrator command");

...which doesn't explain the situation because we don't know why
SIGUSR2 was sent.

I was thinking of something like this

LOG: requesting walsenders for cascading replication reconnect to
update timeline

but then I ask: Why not simply send a new message type saying "new
timeline id is X" and that way we don't need to restart the connection
at all?

>> It would be possible to have synchronous cascaded replication but that
>> is probably another patch :-)
>
> Yeah, right. You'll try? ;)

I'll wait for a request...

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Cascade replication
Date:	2011-07-06 07:53:46
Message-ID:	CAHGQGwHJQQe9kFO8mvz6-9NOnOtVq1rg1JXFhK9UF7ds-1sWUA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Jul 6, 2011 at 2:44 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>> 1. De-archive the file to RECOVERYXLOG
>> 2. If RECOVERYXLOG is valid, remove a pre-existing one and rename
>> RECOVERYXLOG to the correct name
>> 3. Replay the file with the correct name
>
> Yes please, that makes sense.

Will do.

>>> Those changes will make this code cleaner for the long term.
>>>
>>> I don't think we should simply shutdown a WALSender when we startup.
>>> That is indistinguishable from a failure, which is going to be very
>>> worrying if we do a switchover. Is there another way to do this? Or if
>>> not, at least a log message to explain it was normal that we requested
>>> this.
>>
>> What about outputing something like the following message in that case?
>>
>> if ("walsender receives SIGUSR2")
>> ereport(LOG, "terminating walsender process due to
>> administrator command");
>
> ...which doesn't explain the situation because we don't know why
> SIGUSR2 was sent.
>
> I was thinking of something like this
>
> LOG: requesting walsenders for cascading replication reconnect to
> update timeline

Looks better than my proposal.

> but then I ask: Why not simply send a new message type saying "new
> timeline id is X" and that way we don't need to restart the connection
> at all?

Yeah, that's very useful. But I'd like to implement that as a separate patch.

I'm thinking that in that case walsender should send the timeline history file
and walreceiver should write it down to the disk, instead of just sending
timeline ID. Otherwise, when the standby in receive side restarts, it cannot
calculate the latest timeline ID correctly.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Cascade replication
Date:	2011-07-06 08:32:13
Message-ID:	CA+U5nM+sZ2ZngGen=BzE7KsRWZ1N-fe53hM5CJuVdyoC1SvVdg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Jul 6, 2011 at 8:53 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:

>>> What about outputing something like the following message in that case?
>>>
>>> if ("walsender receives SIGUSR2")
>>> ereport(LOG, "terminating walsender process due to
>>> administrator command");
>>
>> ...which doesn't explain the situation because we don't know why
>> SIGUSR2 was sent.
>>
>> I was thinking of something like this
>>
>> LOG: requesting walsenders for cascading replication reconnect to
>> update timeline
>
> Looks better than my proposal.
>
>> but then I ask: Why not simply send a new message type saying "new
>> timeline id is X" and that way we don't need to restart the connection
>> at all?
>
> Yeah, that's very useful. But I'd like to implement that as a separate patch.
>
> I'm thinking that in that case walsender should send the timeline history file
> and walreceiver should write it down to the disk, instead of just sending
> timeline ID. Otherwise, when the standby in receive side restarts, it cannot
> calculate the latest timeline ID correctly.

OK, happy to do that part as an additional patch. That way we can get
this committed soon - in this CF.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Cascade replication
Date:	2011-07-06 11:27:39
Message-ID:	CAHGQGwFMJoryh0f3xSRfhb-rw0Pid3E8+fm11xeAiuD0=Dn8zQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Jul 6, 2011 at 4:53 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> On Wed, Jul 6, 2011 at 2:44 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>>> 1. De-archive the file to RECOVERYXLOG
>>> 2. If RECOVERYXLOG is valid, remove a pre-existing one and rename
>>> RECOVERYXLOG to the correct name
>>> 3. Replay the file with the correct name
>>
>> Yes please, that makes sense.

In #2, if the server is killed with SIGKILL just after removing a pre-existing
file and before renaming RECOVERYXLOG, we lose the file with correct name.
Even in this case, we would be able to restore it from the archive, but what if
unfortunately the archive is unavailable? We would lose the file infinitely. So
we should introduce the following safeguard?

2'. If RECOVERYXLOG is valid, move a pre-existing file to pg_xlog/backup,
rename RECOVERYXLOG to the correct name, and remove the pre-existing
file from pg_xlog/backup

Currently we give up a recovery if there is the target file in
neither the
archive nor pg_xlog. But, if we adopt the above safeguard, in that case,
we should try to read the file from also pg_xlog/backup.

In #2, there is another problem; walsender might have the pre-existing file
open, so the startup process would need to request walsenders to close the
file before removing (or renaming) it, wait for new file to appear and open it
again. This might make the code complicated. Does anyone have better
approach?

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Cascade replication
Date:	2011-07-06 12:35:22
Message-ID:	CA+U5nMLf==9gHmXZW3ioCb06r-nw6Lwcsybk75C6LpBbnnqOSQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Jul 6, 2011 at 12:27 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> On Wed, Jul 6, 2011 at 4:53 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>> On Wed, Jul 6, 2011 at 2:44 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>>>> 1. De-archive the file to RECOVERYXLOG
>>>> 2. If RECOVERYXLOG is valid, remove a pre-existing one and rename
>>>> RECOVERYXLOG to the correct name
>>>> 3. Replay the file with the correct name
>>>
>>> Yes please, that makes sense.
>
> In #2, if the server is killed with SIGKILL just after removing a pre-existing
> file and before renaming RECOVERYXLOG, we lose the file with correct name.
> Even in this case, we would be able to restore it from the archive, but what if
> unfortunately the archive is unavailable? We would lose the file infinitely. So
> we should introduce the following safeguard?
>
> 2'. If RECOVERYXLOG is valid, move a pre-existing file to pg_xlog/backup,
> rename RECOVERYXLOG to the correct name, and remove the pre-existing
> file from pg_xlog/backup
>
> Currently we give up a recovery if there is the target file in
> neither the
> archive nor pg_xlog. But, if we adopt the above safeguard, in that case,
> we should try to read the file from also pg_xlog/backup.
>
> In #2, there is another problem; walsender might have the pre-existing file
> open, so the startup process would need to request walsenders to close the
> file before removing (or renaming) it, wait for new file to appear and open it
> again. This might make the code complicated. Does anyone have better
> approach?

The risk you describe already exists in current code.

I regard it as a non-risk. The unlink() and the rename() are executed
consecutively, so the gap between them is small, so the chance of a
SIGKILL in that gap at the same time as losing the archive seems low,
and we can always get that file from the master again if we are
streaming. Any code you add to "fix" this will get executed so rarely
it probably won't work when we need it to.

In the current scheme we restart archiving from the last restartpoint,
which exists only on the archive. This new patch improves upon this by
keeping the most recent files locally, so we are less expose in the
case of archive unavailability. So this patch already improves things
and we don't need any more than that. No extra code please, IMHO.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Josh Berkus <josh(at)agliodbs(dot)com>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Cascade replication
Date:	2011-07-10 18:30:59
Message-ID:	4E19EFE3.7050303@agliodbs.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Fujii,

> In the current scheme we restart archiving from the last restartpoint,
> which exists only on the archive. This new patch improves upon this by
> keeping the most recent files locally, so we are less expose in the
> case of archive unavailability. So this patch already improves things
> and we don't need any more than that. No extra code please, IMHO.

Do you think you'll submit a new version of the patch this commitfest?

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Josh Berkus <josh(at)agliodbs(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Cascade replication
Date:	2011-07-11 01:26:44
Message-ID:	CAHGQGwHdC4j=OgCXs3U-wtNT_3qMD54HTTh8wLEM3s4Z+1=2JA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Jul 11, 2011 at 3:30 AM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
> Fujii,
>
>> In the current scheme we restart archiving from the last restartpoint,
>> which exists only on the archive. This new patch improves upon this by
>> keeping the most recent files locally, so we are less expose in the
>> case of archive unavailability. So this patch already improves things
>> and we don't need any more than that. No extra code please, IMHO.
>
> Do you think you'll submit a new version of the patch this commitfest?

Yes. I'm now updating the patch according to Simon's comments.
I will submit it today.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Josh Berkus <josh(at)agliodbs(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Cascade replication
Date:	2011-07-11 06:28:12
Message-ID:	CAHGQGwGWKkvkoj88H1dnq0eWJfU1MRyzzA12neenEPRKYtMKSw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Jul 11, 2011 at 10:26 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> On Mon, Jul 11, 2011 at 3:30 AM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
>> Do you think you'll submit a new version of the patch this commitfest?
>
> Yes. I'm now updating the patch according to Simon's comments.
> I will submit it today.

Attached is the updated version which addresses all the issues raised by Simon.

> The risk you describe already exists in current code.
>
> I regard it as a non-risk. The unlink() and the rename() are executed
> consecutively, so the gap between them is small, so the chance of a
> SIGKILL in that gap at the same time as losing the archive seems low,
> and we can always get that file from the master again if we are
> streaming. Any code you add to "fix" this will get executed so rarely
> it probably won't work when we need it to.
>
> In the current scheme we restart archiving from the last restartpoint,
> which exists only on the archive. This new patch improves upon this by
> keeping the most recent files locally, so we are less expose in the
> case of archive unavailability. So this patch already improves things
> and we don't need any more than that. No extra code please, IMHO.

Yes, I added no extra code for the risk I raised upthread.

> In #2, there is another problem; walsender might have the pre-existing file
> open, so the startup process would need to request walsenders to close the
> file before removing (or renaming) it, wait for new file to appear and open it
> again.

I implemented this.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment	Content-Type	Size
cascade_replication_v3.patch	text/x-patch	32.1 KB

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Cascade replication
Date:	2011-07-19 08:58:33
Message-ID:	CA+U5nMJTMcQWvUkSp_W2U1jbyv1YaRudpzD4vGNaOLZyWuFTTg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Jul 11, 2011 at 7:28 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:

> Attached is the updated version which addresses all the issues raised by
> Simon.

Is there any reason why we disallow cascading unless hot standby is enabled?

ISTM we can just alter the postmaster path for walsenders, patch attached.

Some people might be happier if a sync standby were not HS enabled,
yet able to cascade to other standbys for reading.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment	Content-Type	Size
allow_cascading_without_hot_standby.v1.patch	application/octet-stream	1.1 KB

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Cascade replication
Date:	2011-07-19 11:19:40
Message-ID:	CAHGQGwEH1iz4m5B8ievOZUzxt1WBbhn8n49fom3Ktz66asoM7g@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Jul 19, 2011 at 5:58 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> On Mon, Jul 11, 2011 at 7:28 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>
>> Attached is the updated version which addresses all the issues raised by
>> Simon.
>
> Is there any reason why we disallow cascading unless hot standby is enabled?
>
> ISTM we can just alter the postmaster path for walsenders, patch attached.
>
> Some people might be happier if a sync standby were not HS enabled,
> yet able to cascade to other standbys for reading.

- return CAC_STARTUP; /* normal startup */
+ {
+ if (am_walsender)
+ return CAC_OK;
+ else
+ return CAC_STARTUP; /* normal startup */
+ }

In canAcceptConnections(), am_walsender is always false, so the above CAC_OK
is never returned. You should change ProcessStartupPacket() as follows, instead.

switch (port->canAcceptConnections)
{
case CAC_STARTUP:
+ if (am_walsender)
+ {
+ port->canAcceptConnections = CAC_OK;
+ break;
+ }
ereport(FATAL,

When I fixed the above, compile the code and set up the cascading replication
environment (disable hot_standby), I got the following assertion error:

TRAP: FailedAssertion("!(slot > 0 && slot <=
PMSignalState->num_child_flags)", File: "pmsignal.c", Line: 227)

So we would still have some code to change.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Cascade replication
Date:	2011-07-19 12:09:23
Message-ID:	CA+U5nMLERKPY9BWnV7sN3_z6wAtU1zgx6jPMNh09rTm9eO2Wuw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Jul 19, 2011 at 12:19 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:

> So we would still have some code to change.

Sigh, yes, of course.

The question was whether there is any reason we need to disallow cascading?

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Cascade replication
Date:	2011-07-19 12:38:02
Message-ID:	CAHGQGwG8wC=KrcNErPewPFAqOuNqttZEe4j94PMTpy6i=4_P3w@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Jul 19, 2011 at 9:09 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> On Tue, Jul 19, 2011 at 12:19 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>
>> So we would still have some code to change.
>
> Sigh, yes, of course.
>
> The question was whether there is any reason we need to disallow cascading?

No, at least I have no clear reason for now.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Cascade replication
Date:	2011-07-19 13:03:25
Message-ID:	CA+U5nM+_kQzLuFc=e+0=8-kkAG6+DKPjdEO4ikTh-sz1K=Cv2A@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Jul 19, 2011 at 1:38 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> On Tue, Jul 19, 2011 at 9:09 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>> On Tue, Jul 19, 2011 at 12:19 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
>> wrote:
>>
>>> So we would still have some code to change.
>>
>> Sigh, yes, of course.
>>
>> The question was whether there is any reason we need to disallow
>> cascading?
>
> No, at least I have no clear reason for now.

I'll work up a proper patch. Thanks for your earlier review,

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services