Re: streaming replication breaks horribly if master crashes

From: Greg Stark <gsstark(at)mit(dot)edu>
To: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: streaming replication breaks horribly if master crashes
Date: 2010-06-16 23:32:21
Message-ID: AANLkTinbiu_r0cYp1BNsdRNQXnwcEPWbymQQUrB_ZLWk@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jun 17, 2010 at 12:22 AM, Kevin Grittner
<Kevin(dot)Grittner(at)wicourts(dot)gov> wrote:
> "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov> wrote:
>
>> It sounds like it behaves just fine except for not detecting a
>> broken connection.
>
> Of course I meant in terms of the slave's attempts at retrieving
> more WAL, not in terms of it applying a second time line.  TCP
> keepalive timeouts don't help with that part of it, just the failure
> to recognize the broken connection.  I suppose someone could argue
> that's a *feature*, since it gives you two hours to manually
> intervene before it does something stupid, but that hardly seems
> like a solution....

It's certainly a design goal of TCP that you should be able to
disconnect the network and reconnect it everything should recover. If
no data was sent it should be able to withstand arbitrarily long
disconnections. TCP Keepalives break that but they should only break
it in the case where the network connection has definitely exceeded
the retry timeouts, not when it merely hasn't responded fast enough
for the application requirements.

--
greg

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message KaiGai Kohei 2010-06-16 23:33:06 Re: [v9.1] Add security hook on initialization of instance
Previous Message Robert Haas 2010-06-16 23:24:53 Re: hstore ==> and deprecate =>