Re: Switching timeline over streaming replication

From: Amit Kapila <amit(dot)kapila(at)huawei(dot)com>
To: "'Heikki Linnakangas'" <hlinnakangas(at)vmware(dot)com>
Cc: "'PostgreSQL-development'" <pgsql-hackers(at)postgreSQL(dot)org>, "'Thom Brown'" <thom(at)linux(dot)com>
Subject: Re: Switching timeline over streaming replication
Date: 2012-11-20 13:33:48
Message-ID: 006901cdc723$ac6a41c0$053ec540$@kapila@huawei.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Monday, November 19, 2012 10:54 PM Heikki Linnakangas wrote:
> On 10.10.2012 17:54, Thom Brown wrote:
> > Hmm... I get something different. When I promote standby B, standby
> > C's log shows:
> >

> > The following problems are observed while testing of the patch.
> > Defect-1:
> >
> > 1. start primary A
> > 2. start standby B following A
> > 3. start cascade standby C following B.
> > 4. Promote standby B.
> > 5. After successful time line switch in cascade standby C, stop
> C.
> > 6. Restart C, startup is failing with the following error.
> >
> > LOG: database system was shut down in recovery at 2012-11-16
> > 16:26:29 IST
> > FATAL: requested timeline 2 does not contain minimum
> > recovery point 0/30143A0 on timeline 1
> > LOG: startup process (PID 415) exited with exit code 1
> > LOG: aborting startup due to startup process failure
> >
> > The above defect is already discussed in the following link.
> > http://archives.postgresql.org/message-id/00a801cda6f3$4aba27b0$e02e77
> > 10$(at)ka
> > pila(at)huawei(dot)com
>
> Fixed now, sorry for neglecting this earlier. The problem was that if
> the primary switched to a new timeline at position X, and the standby
> followed that switch, on restart it would set minRecoveryPoint to X, and
> the new

Not sure, if above is fixed as I don't see any code change for this and in
test also it again fails.

Below is result of further testing:

Some strange logs are observed during testing.

Note: Stop the node means doing a smart shutdown.

Scenario-1:
1. start primary A
2. start standby B following A
3. start cascade standby C following B.
4. Execute the following commands in the primary A.
create table tbl(f int);
insert into tbl values(generate_series(1,1000));
5. Promote standby B.
6. Execute the following commands in the primary B.
insert into tbl values(generate_series(1001,2000));
insert into tbl values(generate_series(2001,3000));

The following logs are presents on the following standby C.
please check these are proper or not?

LOG: restarted WAL streaming at position 0/B000000 on tli 2
LOG: record with zero length at 0/B024C68
LOG: record with zero length at 0/B035528
LOG: out-of-sequence timeline ID 1 (after 2) in log segment
00000002000000000000000B, offset 0

Following two defects are found while testing the new patch.

Defect - 1:

1. start primary A
2. start standby B following A
3. start cascade standby C following B.
4. start another standby D following C.
5. Promote standby B.
6. After successful time line switch in cascade standby C & D, stop D.
7. Restart D, Startup is successful and connecting to standby C.
8. Stop C.
9. Restart C, startup is failing.

Defect-2:
1. start primary A
2. start standby B following A
3. start cascade standby C following B.
4. Start another standby D following C.
5. Execute the following commands in the primary A.
create table tbl(f int);
insert into tbl values(generate_series(1,1000));
6. Promote standby B.
7. Execute the following commands in the primary B.
insert into tbl values(generate_series(1001,2000));
insert into tbl values(generate_series(2001,3000));

The following logs are observed on standby C:

LOG: restarted WAL streaming at position 0/7000000 on tli 2
ERROR: requested WAL segment 000000020000000000000007 has already been
removed
LOG: record with zero length at 0/7028190
LOG: record with zero length at 0/7048540
LOG: out-of-sequence timeline ID 1 (after 2) in log segment
000000020000000000000007, offset 0

The following logs are observed on standby D:

LOG: restarted WAL streaming at position 0/7000000 on tli 2
LOG: replication terminated by primary server
DETAIL: End of WAL reached on timeline 2
FATAL: error reading result of streaming command: ERROR: requested WAL
segment 000000020000000000000007 has already been removed

LOG: streaming replication successfully connected to primary

8. Stop standby D normally and restart D. Restart is failing.

Code Review
------------------
1.
> Agreed. Cleaning up at end-of-xact won't help walsender or other
non-backend processes, though, because they don't do
> transactions. But a top-level ResourceOwner that's reset in the
sigsetjmp() cleanup routine would work.

Do you think cleanup of files be done as part of this patch or should it be
handled separately,
as it already exists in other paths of code. In that case may be one ToDo
item can be added.

2. Also for forbidden_in_wal_sender(firstchar);, instead of handling it as
part of each message,
isn't it better if we call only once, something like
is_command_allowed(firstchar);

switch (firstchar)

3. For function
WalRcvStreaming(void)
{
..
if (state == WALRCV_STREAMING || state == WALRCV_STARTING)
}
I think in above check, it should check the new state as well
WALRCV_RESTARTING

4. In function WalReceiverMain(),
can we have Log message similar to below for even "started WAL streaming
at position", means when it is even first time.
if (!first_stream)
ereport(LOG,
(errmsg("restarted WAL
streaming at position %X/%X on tli %u",
(uint32)
(startpoint >> 32), (uint32) startpoint,

startpointTLI)));

5. In function WalReceiverMain(),
if (walrcv_startstreaming(startpointTLI, startpoint))
{
..
}
else
ereport(LOG,
(errmsg("primary server contains no
more WAL on requested timeline %u",
startpointTLI)));
I think walrcv_startstreaming() can return false even if it cannot start
streaming due to non-availablity of WAL
at requested timeline. So "no more" in Log message may be misleading in some
cases. How about something similar to
"primary server doesn't have WAL for requested timeline"

6. *** a/src/bin/pg_controldata/pg_controldata.c
--- b/src/bin/pg_controldata/pg_controldata.c
***************
*** 237,242 **** main(int argc, char *argv[])
--- 237,244 ----
printf(_("Minimum recovery ending location: %X/%X\n"),
(uint32) (ControlFile.minRecoveryPoint >> 32),
(uint32) ControlFile.minRecoveryPoint);
+ printf(_("Recovery ending timeline: %u\n"),
+ ControlFile.minRecoveryPointTLI);

shouldn't Message "Recovery ending timeline" be "Minimum Recovery ending
timeline"

One Doubt
---------------
when pg_receivexlog is receiving xlog file from the standby server, if the
standby server got promoted
as master the connection between pg_receivexlog and standby server is
broken, because of this reason
the current xlog file is not renamed as actual file.

Whether such a scenario needs any handling?

With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kohei KaiGai 2012-11-20 13:43:09 Re: [v9.3] OAT_POST_ALTER object access hooks
Previous Message Kohei KaiGai 2012-11-20 13:23:32 Re: FDW for PostgreSQL