Re: equivalent to "replication_timeout" on standby server

Lists: pgsql-general
From: Samba <saasira(at)gmail(dot)com>
To: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: equivalent to "replication_timeout" on standby server
Date: 2011-11-02 15:25:32
Message-ID: CAKgWO9Lpu5A9M-hu=WdxH7rboetpHFCi9bWjChPQZ3bhmdWeBw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Hi all,

The postgres manual explains the "replication_timeout" to be used to

"Terminate replication connections that are inactive longer than the
specified number of milliseconds. This is useful for the primary server to
detect a standby crash or network outage"

Is there a similar configuration parameter that helps the WAL receiver
processes to terminate the idle connections on the standby servers?

It would be very useful (for monitoring purpose) if the termination of such
an idle connection on either master or standby servers is logged with
appropriate message.

Could some one explain me if this is possible with postgres-9.1.1?

Thanks and Regards,
Samba


From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Samba <saasira(at)gmail(dot)com>
Cc: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: equivalent to "replication_timeout" on standby server
Date: 2011-11-04 01:55:58
Message-ID: CAHGQGwF-MmK9sLZTCu715KO4i+fKuhJJDb9sP93j97SrkYQEZQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Thu, Nov 3, 2011 at 12:25 AM, Samba <saasira(at)gmail(dot)com> wrote:
> The postgres manual explains the "replication_timeout" to be used to
>
> "Terminate replication connections that are inactive longer than the
> specified number of milliseconds. This is useful for the primary server to
> detect a standby crash or network outage"
>
> Is there a similar configuration parameter that helps the WAL receiver
> processes to terminate the idle connections on the standby servers?

No.

But setting keepalive libpq parameters in primary_conninfo might be useful
to detect the termination of connection from the standby server.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


From: Samba <saasira(at)gmail(dot)com>
To: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: equivalent to "replication_timeout" on standby server
Date: 2011-11-04 13:58:00
Message-ID: CAKgWO9Lp_F+jSmxkXWRmQkoC2p6r7=wJwbiFZm5y=ftmf1i9EA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Thanks Fuji for that I hint...

I searched around on the internet for that trick and it looks like we can
make the Standby close its connection to the master much earlier than it
otherwise would;it is good for me now.

But still there seems to be two problem areas that can be improved over
time...

- although both master(with replication_timeout) and slave (with tcp
timeout option in primary_conninfo parameter) closes the connection in
quick time (based on tcp idle connection timeout), as of now they do not
log such information. It would be really helpful if such disconnects are
logged with appropriate severity so that the problem can identified early
and help in keeping track of patterns and history of such issues.
-
- Presently, neither master nor standby server attempts
to resume streaming replication when they happen to see each other after
some prolonged disconnect. It would be better if either master or slave or
both the servers makes periodic checks to find if the other is reachable
and resume the replication( if possible, or else log the message that a
full sync may be required).

Thanks and Regards,
Samba

----------------------------------------------------------------------------------------------------------------------
On Fri, Nov 4, 2011 at 7:25 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:

> On Thu, Nov 3, 2011 at 12:25 AM, Samba <saasira(at)gmail(dot)com> wrote:
> > The postgres manual explains the "replication_timeout" to be used to
> >
> > "Terminate replication connections that are inactive longer than the
> > specified number of milliseconds. This is useful for the primary server
> to
> > detect a standby crash or network outage"
> >
> > Is there a similar configuration parameter that helps the WAL receiver
> > processes to terminate the idle connections on the standby servers?
>
> No.
>
> But setting keepalive libpq parameters in primary_conninfo might be useful
> to detect the termination of connection from the standby server.
>
> Regards,
>
> --
> Fujii Masao
> NIPPON TELEGRAPH AND TELEPHONE CORPORATION
> NTT Open Source Software Center
>


From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Samba <saasira(at)gmail(dot)com>
Cc: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: equivalent to "replication_timeout" on standby server
Date: 2011-11-07 02:51:04
Message-ID: CAHGQGwHWYxgXnnsDSL7UifSyDSdZNPZWSTWQX7kXdDw5SUOfzQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Fri, Nov 4, 2011 at 10:58 PM, Samba <saasira(at)gmail(dot)com> wrote:
> although both master(with replication_timeout)  and slave (with tcp timeout
> option in primary_conninfo parameter) closes the connection in quick time
> (based on tcp idle connection  timeout), as of now they do not log such
> information. It would be really helpful if such disconnects are logged with
> appropriate severity so that the problem can identified early and help in
> keeping track of patterns and history of such issues.

Oh, really? Unless I'm missing something, when replication timeout happens,
the following log message would be logged in the master:

terminating walsender process due to replication timeout

OTOH, something like the following would be logged in the standby:

could not receive data from WAL stream......

> Presently, neither master nor standby server attempts to resume streaming
> replication when they happen to see each other after some prolonged
> disconnect. It would be better if either master or slave or both the servers
> makes periodic checks to find if the other is reachable and resume the
> replication( if possible, or else log the message that a full sync may be
> required).

The standby periodically tries reconnecting to the master after it detects
the termination of replication connection. So even after prolonged disconnect,
replication can automatically resume.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center