Re: Standalone synchronous master

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Hannu Krosing <hannu(at)2ndquadrant(dot)com>, MauMau <maumau307(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Standalone synchronous master
Date: 2014-01-10 22:57:02
Message-ID: CAOuzzgr8AxUQgshF-g9DdEkYOx3+yjn-gP_W8S28nJf_eKZZ4g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Greetings,

On Friday, January 10, 2014, Andres Freund wrote:

> Hi,
>
> On 2014-01-10 17:28:55 -0500, Stephen Frost wrote:
> > > Why do you know that you didn't loose any transactions? Trivial network
> > > hiccups, a restart of a standby, IO overload on the standby all can
> > > cause a very short interruptions in the walsender connection - leading
> > > to degradation.
>
> > You know that you haven't *lost* any by virtue of the master still being
> > up. The case you describe is a double-failure scenario- the link between
> > the master and slave has to go away AND the master must accept a
> > transaction and then fail independently.
>
> Unfortunately network outages do correlate with other system
> faults. What you're wishing for really is the "I like the world to be
> friendly to me" mode.
> Even if you have only disk problems, quite often if your disks die, you
> can continue to write (especially with a BBU), but uncached reads
> fail. So the walsender connection errors out because a read failed, and
> youre degrading into async mode. *Because* your primary is about to die.

That can happen, sure, but I don't agree that people using a single drive
with a BBU or having two drives in a raid1 die at the same time cases are
reasonable arguments against this option. Not to mention that, today, if
the master has an issue then we're SOL anyway. Also, if the network fails
then likely there aren't any new transactions happening.

> > > > As pointed out by someone
> > > > previously, that's how RAID-1 works (which I imagine quite a few of
> us
> > > > use).
> > >
> > > I don't think that argument makes much sense. Raid-1 isn't safe
> > > as-is. It's only safe if you use some sort of journaling or similar
> > > ontop. If you issued a write during a crash you normally will just get
> > > either the version from before or the version after the last write
> back,
> > > depending on the state on the individual disks and which disk is
> treated
> > > as authoritative by the raid software.
>
> > Uh, you need a decent raid controller then and we're talking about after
> a
> > transaction commit/sync.
>
> Yes, if you have a BBU that memory is authoritative in most cases. But
> in that case the argument of having two disks is pretty much pointless,
> the SPOF suddenly became the battery + ram.
>

If that is a concern then use multiple controllers. Certainly not unheard
of- look at SANs...

Thanks,

Stephen

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Joshua D. Drake 2014-01-10 22:59:06 Re: Standalone synchronous master
Previous Message Andres Freund 2014-01-10 22:47:40 Re: Standalone synchronous master