Standby node using replication slot not visible in pg_stat_replication while catching up

Lists: pgsql-hackers
From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Standby node using replication slot not visible in pg_stat_replication while catching up
Date: 2014-03-10 12:06:53
Message-ID: CAB7nPqRxLcBc7CAiYrOD3-gMLLJO12xnA-XRboGoVr_VEPqUxg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi all,

I have been playing a bit with the replication slots, and I noticed a
weird behavior in such a scenario:
1) Create a master/slave cluster, and have slave use a replication slot
2) Stop the master
3) Create a certain amount of WAL, during my tests I played with 4~5GB of WAL
4) Restart the slave, it catches up with the WALs that master has
retained in pg_xlog.
I noticed that while the standby using the replication slot catches
up, it is not visible in pg_stat_replication on master. This makes
monitoring of the replication lag difficult to follow, particularly in
the case where the standby disconnects from the master. Once the
standby has caught up, it reappears once again in pg_stat_replication.
I didn't have a look at the code to see what is happening, but is this
behavior expected?
Regards,
--
Michael


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Standby node using replication slot not visible in pg_stat_replication while catching up
Date: 2014-03-10 12:24:13
Message-ID: 20140310122413.GA27167@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

On 2014-03-10 21:06:53 +0900, Michael Paquier wrote:
> I have been playing a bit with the replication slots, and I noticed a
> weird behavior in such a scenario:
> 1) Create a master/slave cluster, and have slave use a replication slot
> 2) Stop the master
> 3) Create a certain amount of WAL, during my tests I played with 4~5GB of WAL
> 4) Restart the slave, it catches up with the WALs that master has
> retained in pg_xlog.
> I noticed that while the standby using the replication slot catches
> up, it is not visible in pg_stat_replication on master. This makes
> monitoring of the replication lag difficult to follow, particularly in
> the case where the standby disconnects from the master. Once the
> standby has caught up, it reappears once again in pg_stat_replication.
> I didn't have a look at the code to see what is happening, but is this
> behavior expected?

Does the use of replication slots actually alter the behaviour? I don't
see how the slot code could influence things to that degree here. Could
it be that it's just restoring code from the standby's pg_xlog or using
restore_command?

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Standby node using replication slot not visible in pg_stat_replication while catching up
Date: 2014-03-10 12:47:27
Message-ID: CAB7nPqQL++uAP=enVKMizwO=Y=mQHfVTCiT4rqGHc3mZUN2bkw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Mar 10, 2014 at 9:24 PM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> Hi,
>
> On 2014-03-10 21:06:53 +0900, Michael Paquier wrote:
>> I have been playing a bit with the replication slots, and I noticed a
>> weird behavior in such a scenario:
>> 1) Create a master/slave cluster, and have slave use a replication slot
>> 2) Stop the master
>> 3) Create a certain amount of WAL, during my tests I played with 4~5GB of WAL
>> 4) Restart the slave, it catches up with the WALs that master has
>> retained in pg_xlog.
>> I noticed that while the standby using the replication slot catches
>> up, it is not visible in pg_stat_replication on master. This makes
>> monitoring of the replication lag difficult to follow, particularly in
>> the case where the standby disconnects from the master. Once the
>> standby has caught up, it reappears once again in pg_stat_replication.
>> I didn't have a look at the code to see what is happening, but is this
>> behavior expected?
>
> Does the use of replication slots actually alter the behaviour? I don't
> see how the slot code could influence things to that degree here. Could
> it be that it's just restoring code from the standby's pg_xlog or using
> restore_command?
Sorry for the noise, I'm feeling stupid. Yes the standby was using a
restore_command so it recovered the WAL from archives before reporting
activity back to master.
--
Michael