Re: Sync Rep for 2011CF1

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Aidan Van Dyk <aidan(at)highrise(dot)ca>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Sync Rep for 2011CF1
Date: 2011-01-21 18:32:57
Message-ID: AANLkTikG8WMhOocX9AYsRHYPc-PgxPaG6miFDD9QH3i1@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jan 21, 2011 at 1:09 PM, Aidan Van Dyk <aidan(at)highrise(dot)ca> wrote:
> On Fri, Jan 21, 2011 at 1:03 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>>> On Fri, Jan 21, 2011 at 12:23 PM, Aidan Van Dyk <aidan(at)highrise(dot)ca> wrote:
>>>> When no sync slave is connected, yes, I want to stop things hard.
>>
>>> What you're proposing is to fail things earlier than absolutely
>>> necessary (when they try to XLOG, rather than at commit) but still
>>> later than what I think Simon is proposing (not even letting them log
>>> in).
>>
>> I can't see a reason to disallow login, because read-only transactions
>> can still run in such a situation --- and, indeed, might be fairly
>> essential if you need to inspect the database state on the way to fixing
>> the replication problem.  (Of course, we've already had the discussion
>> about it being a terrible idea to configure replication from inside the
>> database, but that doesn't mean there might not be views or status you
>> would wish to look at.)
>
> And just disallowing new logins is probably not even enough, because
> it allows current logged in clients "forward progress", leading
> towards an eventual hang (with now committed data on the master).
>
> Again, I'm trying to stop "forward progress" as soon as possible when
> a sync slave isn't replicating.  And I'ld like clients to fail with
> errors sooner (hopefully they get to the commit point) rather than
> accumulate the WAL synced to the master and just wait at the commit.
>
> So I think that's a more complete picture of my quick "not do anything
> with no synchronous slave replicating" that I think was what led to
> the no-login approach.

Well, stopping all WAL activity with an error sounds *more* reasonable
than refusing all logins, but I'm not personally sold on it. For
example, a brief network disruption on the connection between master
and standby would cause the master to grind to a halt... and then
almost immediately resume operations. More generally, if you have
short-running transactions, there's not much difference between
wait-at-commit and wait-at-WAL, and if you have long-running
transactions, then wait-at-WAL might be gumming up the works more than
necessary.

One idea might be to wait both before and after commit. If
allow_standalone_primary is off, and a commit is attempted, we check
whether there's a slave connected, and if not, wait for one to
connect. Then, we write and sync the commit WAL record. Next, we
wait for the WAL to be ack'd. Of course, the standby might disappear
between the first check and the second, but it would greatly reduce
the possibility of the master being ahead of the standby after a
crash, which might be useful for some people.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Aidan Van Dyk 2011-01-21 18:59:56 Re: Sync Rep for 2011CF1
Previous Message Kevin Grittner 2011-01-21 18:21:17 Re: SSI and Hot Standby