Re: streaming replication breaks horribly if master crashes

From: Rafael Martinez <r(dot)m(dot)guerrero(at)usit(dot)uio(dot)no>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: streaming replication breaks horribly if master crashes
Date: 2010-06-17 07:02:54
Message-ID: 4C19C89E.7050705@usit.uio.no
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Heikki Linnakangas wrote:

>
> We're not talking about a timeout for promoting standby to master. The
> problem is that the standby doesn't notice that from the master's point
> of view, the connection has been broken. Whether it's because of a
> network error or because the master server crashed doesn't matter, the
> standby should reconnect in any case. TCP keepalives are a perfect fit,
> as long as you can tune the keepalive time short enough. Where "Short
> enough" is up to the admin to decide depending on the application.
>
>

I tested this yesterday and I could not get any reaction from the wal
receiver even after using minimal values compared to the default values .

The default values in linux for tcp_keepalive_time, tcp_keepalive_intvl
and tcp_keepalive_probes are 7200, 75 and 9. I reduced these values to
60, 3, 3 and nothing happened, it continuous with status ESTABLISHED
after 60+3*3 seconds.

I did not restart the network after I changed these values on the fly
via /proc. I wonder if this is the reason the connection didn't die
neither with the new keppalive values after the connection was broken. I
will check this later today.

regards,
- --
Rafael Martinez, <r(dot)m(dot)guerrero(at)usit(dot)uio(dot)no>
Center for Information Technology Services
University of Oslo, Norway

PGP Public Key: http://folk.uio.no/rafael/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)

iEYEARECAAYFAkwZyJ4ACgkQBhuKQurGihT3kgCgn4iQkZ8YKr/nAk5/QqpwYfnc
4lsAn2CKvgeeIOon+lWRHe908hbJ+zK6
=VymH
-----END PGP SIGNATURE-----

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2010-06-17 07:13:44 Debug message in RemoveOldXlogFiles
Previous Message Jaime Casanova 2010-06-17 06:58:22 Re: Partitioning syntax