Re: [HACKERS] Check that streaming replica received all data after master shutdown

Lists: pgsql-generalpgsql-hackers
From: Vladimir Borodin <root(at)simply(dot)name>
To: pgsql-general(at)postgresql(dot)org
Subject: Check that streaming replica received all data after master shutdown
Date: 2015-01-05 15:15:52
Message-ID: A7683985-2EC2-40AD-AAAC-B44BD0F29723@simply.name
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Hi all.

I have a simple script for planned switchover of PostgreSQL (9.3 and 9.4) master to one of its replicas. This script checks a lot of things before doing it and one of them is that all data from master has been received by replica that is going to be promoted. Right now the check is done like below:

On the master:

postgres(at)pgtest03d ~ $ psql -t -A -c 'select pg_current_xlog_location();'
0/33000090
postgres(at)pgtest03d ~ $ /usr/pgsql-9.3/bin/pg_ctl stop -m fast
waiting for server to shut down.... done
server stopped
postgres(at)pgtest03d ~ $ /usr/pgsql-9.3/bin/pg_controldata | head
pg_control version number: 937
Catalog version number: 201306121
Database system identifier: 6061800518091528182
Database cluster state: shut down
pg_control last modified: Mon 05 Jan 2015 06:47:57 PM MSK
Latest checkpoint location: 0/34000028
Prior checkpoint location: 0/33000028
Latest checkpoint's REDO location: 0/34000028
Latest checkpoint's REDO WAL file: 0000001B0000000000000034
Latest checkpoint's TimeLineID: 27
postgres(at)pgtest03d ~ $

On the replica (after shutdown of master):

postgres(at)pgtest03g ~ $ psql -t -A -c "select pg_xlog_location_diff(pg_last_xlog_replay_location(), '0/34000028');"
104
postgres(at)pgtest03g ~ $

These 104 bytes seems to be the size of shutdown checkpoint record (as I can understand from pg_xlogdump output).

postgres(at)pgtest03g ~/9.3/data/pg_xlog $ /usr/pgsql-9.3/bin/pg_xlogdump -s 0/33000090 -t 27
rmgr: XLOG len (rec/tot): 0/ 32, tx: 0, lsn: 0/33000090, prev 0/33000028, bkp: 0000, desc: xlog switch
rmgr: XLOG len (rec/tot): 72/ 104, tx: 0, lsn: 0/34000028, prev 0/33000090, bkp: 0000, desc: checkpoint: redo 0/34000028; tli 27; prev tli 27; fpw true; xid 0/6010; oid 54128; multi 1; offset 0; oldest xid 1799 in DB 1; oldest multi 1 in DB 1; oldest running xid 0; shutdown
pg_xlogdump: FATAL: error in WAL record at 0/34000028: record with zero length at 0/34000090

postgres(at)pgtest03g ~/9.3/data/pg_xlog $

I’m not sure that these 104 bytes will always be 104 bytes to have a strict equality while checking. Could it change in the future? Or is there a better way to understand that streaming replica received all data after master shutdown? The check that pg_xlog_location_diff returns 104 bytes seems a bit strange.

Thanks.

--
May the force be with you...
http://simply.name


From: Vladimir Borodin <root(at)simply(dot)name>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Check that streaming replica received all data after master shutdown
Date: 2015-01-13 10:11:22
Message-ID: BC251D94-5366-410B-97DA-8FD6088F1903@simply.name
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers


05 янв. 2015 г., в 18:15, Vladimir Borodin <root(at)simply(dot)name> написал(а):

> Hi all.
>
> I have a simple script for planned switchover of PostgreSQL (9.3 and 9.4) master to one of its replicas. This script checks a lot of things before doing it and one of them is that all data from master has been received by replica that is going to be promoted. Right now the check is done like below:
>
> On the master:
>
> postgres(at)pgtest03d ~ $ psql -t -A -c 'select pg_current_xlog_location();'
> 0/33000090
> postgres(at)pgtest03d ~ $ /usr/pgsql-9.3/bin/pg_ctl stop -m fast
> waiting for server to shut down.... done
> server stopped
> postgres(at)pgtest03d ~ $ /usr/pgsql-9.3/bin/pg_controldata | head
> pg_control version number: 937
> Catalog version number: 201306121
> Database system identifier: 6061800518091528182
> Database cluster state: shut down
> pg_control last modified: Mon 05 Jan 2015 06:47:57 PM MSK
> Latest checkpoint location: 0/34000028
> Prior checkpoint location: 0/33000028
> Latest checkpoint's REDO location: 0/34000028
> Latest checkpoint's REDO WAL file: 0000001B0000000000000034
> Latest checkpoint's TimeLineID: 27
> postgres(at)pgtest03d ~ $
>
> On the replica (after shutdown of master):
>
> postgres(at)pgtest03g ~ $ psql -t -A -c "select pg_xlog_location_diff(pg_last_xlog_replay_location(), '0/34000028');"
> 104
> postgres(at)pgtest03g ~ $
>
> These 104 bytes seems to be the size of shutdown checkpoint record (as I can understand from pg_xlogdump output).
>
> postgres(at)pgtest03g ~/9.3/data/pg_xlog $ /usr/pgsql-9.3/bin/pg_xlogdump -s 0/33000090 -t 27
> rmgr: XLOG len (rec/tot): 0/ 32, tx: 0, lsn: 0/33000090, prev 0/33000028, bkp: 0000, desc: xlog switch
> rmgr: XLOG len (rec/tot): 72/ 104, tx: 0, lsn: 0/34000028, prev 0/33000090, bkp: 0000, desc: checkpoint: redo 0/34000028; tli 27; prev tli 27; fpw true; xid 0/6010; oid 54128; multi 1; offset 0; oldest xid 1799 in DB 1; oldest multi 1 in DB 1; oldest running xid 0; shutdown
> pg_xlogdump: FATAL: error in WAL record at 0/34000028: record with zero length at 0/34000090
>
> postgres(at)pgtest03g ~/9.3/data/pg_xlog $
>
> I’m not sure that these 104 bytes will always be 104 bytes to have a strict equality while checking. Could it change in the future? Or is there a better way to understand that streaming replica received all data after master shutdown? The check that pg_xlog_location_diff returns 104 bytes seems a bit strange.
>

+hackers

Could anyone help?

Thanks.

> Thanks.
>
> --
> May the force be with you...
> http://simply.name
>
>
>
>

--
May the force be with you...
http://simply.name


From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Vladimir Borodin <root(at)simply(dot)name>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Check that streaming replica received all data after master shutdown
Date: 2015-01-13 16:42:27
Message-ID: 20150113164227.GG1663@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Vladimir Borodin wrote:

> I’m not sure that these 104 bytes will always be 104 bytes to have a
> strict equality while checking. Could it change in the future?

There is no promise that WAL record format stays unchanged. Sometimes
we change a WAL record in a minor release.

> Or is there a better way to understand that streaming replica received
> all data after master shutdown? The check that pg_xlog_location_diff
> returns 104 bytes seems a bit strange.

I guess you could pg_xlogdump the difference and verify that it is a
shutdown checkpoint record. As far as I remember there should always be
one at the end of recovery.

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Vladimir Borodin <root(at)simply(dot)name>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Cc: <pgsql-general(at)postgresql(dot)org>
Subject: Re: Check that streaming replica received all data after master shutdown
Date: 2015-01-13 18:11:31
Message-ID: 54B55FD3.8070302@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

On 01/13/2015 12:11 PM, Vladimir Borodin wrote:
>
> 05 янв. 2015 г., в 18:15, Vladimir Borodin <root(at)simply(dot)name> написал(а):
>
>> Hi all.
>>
>> I have a simple script for planned switchover of PostgreSQL (9.3 and 9.4) master to one of its replicas. This script checks a lot of things before doing it and one of them is that all data from master has been received by replica that is going to be promoted. Right now the check is done like below:
>>
>> On the master:
>>
>> postgres(at)pgtest03d ~ $ psql -t -A -c 'select pg_current_xlog_location();'
>> 0/33000090
>> postgres(at)pgtest03d ~ $ /usr/pgsql-9.3/bin/pg_ctl stop -m fast
>> waiting for server to shut down.... done
>> server stopped
>> postgres(at)pgtest03d ~ $ /usr/pgsql-9.3/bin/pg_controldata | head
>> pg_control version number: 937
>> Catalog version number: 201306121
>> Database system identifier: 6061800518091528182
>> Database cluster state: shut down
>> pg_control last modified: Mon 05 Jan 2015 06:47:57 PM MSK
>> Latest checkpoint location: 0/34000028
>> Prior checkpoint location: 0/33000028
>> Latest checkpoint's REDO location: 0/34000028
>> Latest checkpoint's REDO WAL file: 0000001B0000000000000034
>> Latest checkpoint's TimeLineID: 27
>> postgres(at)pgtest03d ~ $
>>
>> On the replica (after shutdown of master):
>>
>> postgres(at)pgtest03g ~ $ psql -t -A -c "select pg_xlog_location_diff(pg_last_xlog_replay_location(), '0/34000028');"
>> 104
>> postgres(at)pgtest03g ~ $
>>
>> These 104 bytes seems to be the size of shutdown checkpoint record (as I can understand from pg_xlogdump output).
>>
>> postgres(at)pgtest03g ~/9.3/data/pg_xlog $ /usr/pgsql-9.3/bin/pg_xlogdump -s 0/33000090 -t 27
>> rmgr: XLOG len (rec/tot): 0/ 32, tx: 0, lsn: 0/33000090, prev 0/33000028, bkp: 0000, desc: xlog switch
>> rmgr: XLOG len (rec/tot): 72/ 104, tx: 0, lsn: 0/34000028, prev 0/33000090, bkp: 0000, desc: checkpoint: redo 0/34000028; tli 27; prev tli 27; fpw true; xid 0/6010; oid 54128; multi 1; offset 0; oldest xid 1799 in DB 1; oldest multi 1 in DB 1; oldest running xid 0; shutdown
>> pg_xlogdump: FATAL: error in WAL record at 0/34000028: record with zero length at 0/34000090
>>
>> postgres(at)pgtest03g ~/9.3/data/pg_xlog $
>>
>> I’m not sure that these 104 bytes will always be 104 bytes to have a strict equality while checking. Could it change in the future? Or is there a better way to understand that streaming replica received all data after master shutdown? The check that pg_xlog_location_diff returns 104 bytes seems a bit strange.

Don't rely on it being 104 bytes. It can vary across versions, and
across different architectures.

You could simply check that the standby's pg_last_xlog_replay_location()
> master's "Latest checkpoint location", and not care about the exact
difference.

- Heikki


From: Sameer Kumar <sameer(dot)kumar(at)ashnik(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Vladimir Borodin <root(at)simply(dot)name>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, PostgreSQL General Discussion Forum <pgsql-general(at)postgresql(dot)org>
Subject: Re: Check that streaming replica received all data after master shutdown
Date: 2015-01-15 08:43:10
Message-ID: CADp-Sm7LWp-9Ngk5BHcfR1WSwnTtCNmdFkqROA0pD50fjGF5kw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

On Wed, Jan 14, 2015 at 2:11 AM, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com
> wrote:

> On 01/13/2015 12:11 PM, Vladimir Borodin wrote:
>
>>
>> 05 янв. 2015 г., в 18:15, Vladimir Borodin <root(at)simply(dot)name> написал(а):
>>
>> Hi all.
>>>
>>> I have a simple script for planned switchover of PostgreSQL (9.3 and
>>> 9.4) master to one of its replicas. This script checks a lot of things
>>> before doing it and one of them is that all data from master has been
>>> received by replica that is going to be promoted. Right now the check is
>>> done like below:
>>>
>>> On the master:
>>>
>>> postgres(at)pgtest03d ~ $ psql -t -A -c 'select
>>> pg_current_xlog_location();'
>>> 0/33000090
>>> postgres(at)pgtest03d ~ $ /usr/pgsql-9.3/bin/pg_ctl stop -m fast
>>> waiting for server to shut down.... done
>>> server stopped
>>> postgres(at)pgtest03d ~ $ /usr/pgsql-9.3/bin/pg_controldata | head
>>> pg_control version number: 937
>>> Catalog version number: 201306121
>>> Database system identifier: 6061800518091528182
>>> Database cluster state: shut down
>>> pg_control last modified: Mon 05 Jan 2015 06:47:57 PM MSK
>>> Latest checkpoint location: 0/34000028
>>> Prior checkpoint location: 0/33000028
>>> Latest checkpoint's REDO location: 0/34000028
>>> Latest checkpoint's REDO WAL file: 0000001B0000000000000034
>>> Latest checkpoint's TimeLineID: 27
>>> postgres(at)pgtest03d ~ $
>>>
>>> On the replica (after shutdown of master):
>>>
>>> postgres(at)pgtest03g ~ $ psql -t -A -c "select
>>> pg_xlog_location_diff(pg_last_xlog_replay_location(), '0/34000028');"
>>> 104
>>> postgres(at)pgtest03g ~ $
>>>
>>> These 104 bytes seems to be the size of shutdown checkpoint record (as I
>>> can understand from pg_xlogdump output).
>>>
>>> postgres(at)pgtest03g ~/9.3/data/pg_xlog $ /usr/pgsql-9.3/bin/pg_xlogdump
>>> -s 0/33000090 -t 27
>>> rmgr: XLOG len (rec/tot): 0/ 32, tx: 0, lsn:
>>> 0/33000090, prev 0/33000028, bkp: 0000, desc: xlog switch
>>> rmgr: XLOG len (rec/tot): 72/ 104, tx: 0, lsn:
>>> 0/34000028, prev 0/33000090, bkp: 0000, desc: checkpoint: redo 0/34000028;
>>> tli 27; prev tli 27; fpw true; xid 0/6010; oid 54128; multi 1; offset 0;
>>> oldest xid 1799 in DB 1; oldest multi 1 in DB 1; oldest running xid 0;
>>> shutdown
>>> pg_xlogdump: FATAL: error in WAL record at 0/34000028: record with zero
>>> length at 0/34000090
>>>
>>> postgres(at)pgtest03g ~/9.3/data/pg_xlog $
>>>
>>> I’m not sure that these 104 bytes will always be 104 bytes to have a
>>> strict equality while checking. Could it change in the future? Or is there
>>> a better way to understand that streaming replica received all data after
>>> master shutdown? The check that pg_xlog_location_diff returns 104 bytes
>>> seems a bit strange.
>>>
>>
> Don't rely on it being 104 bytes. It can vary across versions, and across
> different architectures.
>
> You could simply check that the standby's pg_last_xlog_replay_location() >
> master's "Latest checkpoint location", and not care about the exact
> difference.
>
>
>

​I believe there were some changes made in v9.3 which will wait for pending
WALs to be replica​ted before a fast and smart shutdown (of master) can
close the replication connection.

http://git.postgresql.org/pg/commitdiff/985bd7d49726c9f178558491d31a570d47340459


From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: sameer(dot)kumar(at)ashnik(dot)com
Cc: hlinnakangas(at)vmware(dot)com, root(at)simply(dot)name, pgsql-hackers(at)postgresql(dot)org, pgsql-general(at)postgresql(dot)org
Subject: Re: [HACKERS] Check that streaming replica received all data after master shutdown
Date: 2015-01-15 10:19:01
Message-ID: 20150115.191901.06458609.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Hi,

> On Wed, Jan 14, 2015 at 2:11 AM, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com
> > wrote:
>
> > On 01/13/2015 12:11 PM, Vladimir Borodin wrote:
> >
> >>
> >> 05 янв. 2015 г., в 18:15, Vladimir Borodin <root(at)simply(dot)name> написал(а):
> >>
> >> Hi all.
> >>>
> >>> I have a simple script for planned switchover of PostgreSQL (9.3 and
> >>> 9.4) master to one of its replicas. This script checks a lot of things
> >>> before doing it and one of them is that all data from master has been
> >>> received by replica that is going to be promoted. Right now the check is
> >>> done like below:
> >>>
> >>> On the master:
> >>>
> >>> postgres(at)pgtest03d ~ $ psql -t -A -c 'select
> >>> pg_current_xlog_location();'
> >>> 0/33000090
> >>> postgres(at)pgtest03d ~ $ /usr/pgsql-9.3/bin/pg_ctl stop -m fast
> >>> waiting for server to shut down.... done
> >>> server stopped
> >>> postgres(at)pgtest03d ~ $ /usr/pgsql-9.3/bin/pg_controldata | head
> >>> pg_control version number: 937
> >>> Catalog version number: 201306121
> >>> Database system identifier: 6061800518091528182
> >>> Database cluster state: shut down
> >>> pg_control last modified: Mon 05 Jan 2015 06:47:57 PM MSK
> >>> Latest checkpoint location: 0/34000028
> >>> Prior checkpoint location: 0/33000028
> >>> Latest checkpoint's REDO location: 0/34000028
> >>> Latest checkpoint's REDO WAL file: 0000001B0000000000000034
> >>> Latest checkpoint's TimeLineID: 27
> >>> postgres(at)pgtest03d ~ $
> >>>
> >>> On the replica (after shutdown of master):
> >>>
> >>> postgres(at)pgtest03g ~ $ psql -t -A -c "select
> >>> pg_xlog_location_diff(pg_last_xlog_replay_location(), '0/34000028');"
> >>> 104
> >>> postgres(at)pgtest03g ~ $
> >>>
> >>> These 104 bytes seems to be the size of shutdown checkpoint record (as I
> >>> can understand from pg_xlogdump output).
> >>>
> >>> postgres(at)pgtest03g ~/9.3/data/pg_xlog $ /usr/pgsql-9.3/bin/pg_xlogdump
> >>> -s 0/33000090 -t 27
> >>> rmgr: XLOG len (rec/tot): 0/ 32, tx: 0, lsn:
> >>> 0/33000090, prev 0/33000028, bkp: 0000, desc: xlog switch
> >>> rmgr: XLOG len (rec/tot): 72/ 104, tx: 0, lsn:
> >>> 0/34000028, prev 0/33000090, bkp: 0000, desc: checkpoint: redo 0/34000028;
> >>> tli 27; prev tli 27; fpw true; xid 0/6010; oid 54128; multi 1; offset 0;
> >>> oldest xid 1799 in DB 1; oldest multi 1 in DB 1; oldest running xid 0;
> >>> shutdown
> >>> pg_xlogdump: FATAL: error in WAL record at 0/34000028: record with zero
> >>> length at 0/34000090
> >>>
> >>> postgres(at)pgtest03g ~/9.3/data/pg_xlog $
> >>>
> >>> I’m not sure that these 104 bytes will always be 104 bytes to have a
> >>> strict equality while checking. Could it change in the future? Or is there
> >>> a better way to understand that streaming replica received all data after
> >>> master shutdown? The check that pg_xlog_location_diff returns 104 bytes
> >>> seems a bit strange.
> >>>
> >>
> > Don't rely on it being 104 bytes. It can vary across versions, and across
> > different architectures.
> >
> > You could simply check that the standby's pg_last_xlog_replay_location() >
> > master's "Latest checkpoint location", and not care about the exact
> > difference.
> >
>
> I believe there were some changes made in v9.3 which will wait for pending
> WALs to be replica​ted before a fast and smart shutdown (of master) can
> close the replication connection.
>
> http://git.postgresql.org/pg/commitdiff/985bd7d49726c9f178558491d31a570d47340459

I don't understand the relation between it and 104 bytes, it says
that the change is backpatched up to 9.1. Since it assures all
xlog records to be transferred if no trouble happens. Relying on
the mechanism, you don't need to check that if master is known to
have gracefully shut down and had no trouble around the
environment. Judging from that you want this check, I suppose
you're not guaranteed not to have trouble or not trusting the
mechanism itself.

Given the condition, as Alvaro said upthread, verifying that the
last record is a shutdown checkpoint should raise a lot the
chance for the all record being received except for the exteme
case such that the master have upped and downed while replication
connection cannot be made. For the case, I think there's no means
to confirm that by standby alone, you should at least compare the
next LSN to the last xlog record with the old master by any
means. Or doing any sanity check of the database on the standby
utilizing the nature of the data instead?

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center


From: Sameer Kumar <sameer(dot)kumar(at)ashnik(dot)com>
To: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Vladimir Borodin <root(at)simply(dot)name>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>, PostgreSQL General Discussion Forum <pgsql-general(at)postgresql(dot)org>
Subject: Re: Check that streaming replica received all data after master shutdown
Date: 2015-01-16 04:16:28
Message-ID: CADp-Sm5UK7-VQWNTQXAfnMtdmx-yzCmPXussVwPz3--KgBkiPQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

On Thu, Jan 15, 2015 at 6:19 PM, Kyotaro HORIGUCHI <
horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:

> > On Wed, Jan 14, 2015 at 2:11 AM, Heikki Linnakangas <
> hlinnakangas(at)vmware(dot)com
> > > wrote:
> >
> > > On 01/13/2015 12:11 PM, Vladimir Borodin wrote:
> > >
> > >>
> > >> 05 янв. 2015 г., в 18:15, Vladimir Borodin <root(at)simply(dot)name>
> написал(а):
> > >>
> > >> Hi all.
> > >>>
> > >>> I have a simple script for planned switchover of PostgreSQL (9.3 and
> > >>> 9.4) master to one of its replicas. This script checks a lot of
> things
> > >>> before doing it and one of them is that all data from master has been
> > >>> received by replica that is going to be promoted. Right now the
> check is
> > >>> done like below:
> > >>>
> > >>> On the master:
> > >>>
> > >>> postgres(at)pgtest03d ~ $ psql -t -A -c 'select
> > >>> pg_current_xlog_location();'
> > >>> 0/33000090
> > >>> postgres(at)pgtest03d ~ $ /usr/pgsql-9.3/bin/pg_ctl stop -m fast
> > >>> waiting for server to shut down.... done
> > >>> server stopped
> > >>> postgres(at)pgtest03d ~ $ /usr/pgsql-9.3/bin/pg_controldata | head
> > >>> pg_control version number: 937
> > >>> Catalog version number: 201306121
> > >>> Database system identifier: 6061800518091528182
> > >>> Database cluster state: shut down
> > >>> pg_control last modified: Mon 05 Jan 2015 06:47:57 PM MSK
> > >>> Latest checkpoint location: 0/34000028
> > >>> Prior checkpoint location: 0/33000028
> > >>> Latest checkpoint's REDO location: 0/34000028
> > >>> Latest checkpoint's REDO WAL file: 0000001B0000000000000034
> > >>> Latest checkpoint's TimeLineID: 27
> > >>> postgres(at)pgtest03d ~ $
> > >>>
> > >>> On the replica (after shutdown of master):
> > >>>
> > >>> postgres(at)pgtest03g ~ $ psql -t -A -c "select
> > >>> pg_xlog_location_diff(pg_last_xlog_replay_location(), '0/34000028');"
> > >>> 104
> > >>> postgres(at)pgtest03g ~ $
> > >>>
> > >>> These 104 bytes seems to be the size of shutdown checkpoint record
> (as I
> > >>> can understand from pg_xlogdump output).
> > >>>
> > >>> postgres(at)pgtest03g ~/9.3/data/pg_xlog $
> /usr/pgsql-9.3/bin/pg_xlogdump
> > >>> -s 0/33000090 -t 27
> > >>> rmgr: XLOG len (rec/tot): 0/ 32, tx: 0, lsn:
> > >>> 0/33000090, prev 0/33000028, bkp: 0000, desc: xlog switch
> > >>> rmgr: XLOG len (rec/tot): 72/ 104, tx: 0, lsn:
> > >>> 0/34000028, prev 0/33000090, bkp: 0000, desc: checkpoint: redo
> 0/34000028;
> > >>> tli 27; prev tli 27; fpw true; xid 0/6010; oid 54128; multi 1;
> offset 0;
> > >>> oldest xid 1799 in DB 1; oldest multi 1 in DB 1; oldest running xid
> 0;
> > >>> shutdown
> > >>> pg_xlogdump: FATAL: error in WAL record at 0/34000028: record with
> zero
> > >>> length at 0/34000090
> > >>>
> > >>> postgres(at)pgtest03g ~/9.3/data/pg_xlog $
> > >>>
> > >>> I’m not sure that these 104 bytes will always be 104 bytes to have a
> > >>> strict equality while checking. Could it change in the future? Or is
> there
> > >>> a better way to understand that streaming replica received all data
> after
> > >>> master shutdown? The check that pg_xlog_location_diff returns 104
> bytes
> > >>> seems a bit strange.
> > >>>
> > >>
> > > Don't rely on it being 104 bytes. It can vary across versions, and
> across
> > > different architectures.
> > >
> > > You could simply check that the standby's
> pg_last_xlog_replay_location() >
> > > master's "Latest checkpoint location", and not care about the exact
> > > difference.
> > >
> >
> > I believe there were some changes made in v9.3 which will wait for
> pending
> > WALs to be replica​ted before a fast and smart shutdown (of master) can
> > close the replication connection.
> >
> >
> http://git.postgresql.org/pg/commitdiff/985bd7d49726c9f178558491d31a570d47340459
>
> I don't understand the relation between it and 104 bytes, it says
> that the change is backpatched up to 9.1. Since it assures all
> xlog records to be transferred if no trouble happens. Relying on
> the mechanism, you don't need to check that if master is known to
> have gracefully shut down and had no trouble around the
> environment. Judging from that you want this check, I suppose
> you're not guaranteed not to have trouble or not trusting the
> mechanism itself.
>
> ​Right! I was coming from the point that if master has shutdown gracefully
then you don't really need to worry about ensuring with such checks on
Standby (it is supposed to get the pending WAL before master goes down.

This obviously (as rightly pointed out by you), would not work if master
has not shutdown gracefully or if there is a connection issue between
master and slave while master is being shutdown (even if it is smart or
fast shutdown)​.

> Given the condition, as Alvaro said upthread, verifying that the
> last record is a shutdown checkpoint should raise a lot the
> chance for the all record being received except for the exteme
> case such that the master have upped and downed while replication
> connection cannot be made.

​I am not sure if this would cover the cases where the master has gone down
abruptly or has crashed (or the service has been killed).​

> For the case, I think there's no means
> to confirm that by standby alone, you should at least compare the
> next LSN to the last xlog record with the old master by any
> means.

​That is the method that occurred to ​me as well while reading the first
part of your comments. :)

> Or doing any sanity check of the database on the standby
> utilizing the nature of the data instead?
>

Best Regards,

*Sameer Kumar | Database Consultant*

*ASHNIK PTE. LTD.*

101 Cecil Street, #11-11 Tong Eng Building, Singapore 069533

M: *+65 8110 0350* T: +65 6438 3504 | www.ashnik.com

*[image: icons]*

[image: Email patch] <http://www.ashnik.com/>

This email may contain confidential, privileged or copyright material and
is solely for the use of the intended recipient(s).