Re: Keepalive for max_standby_delay

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Keepalive for max_standby_delay
Date: 2010-06-03 17:18:24
Message-ID: 1275585504.21465.3023.camel@ebony
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, 2010-06-03 at 12:47 -0400, Tom Lane wrote:
> Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
> > On Wed, 2010-06-02 at 13:14 -0400, Tom Lane wrote:
> >> This patch seems to me to be going in fundamentally the wrong direction.
> >> It's adding complexity and overhead (far more than is needed), and it's
> >> failing utterly to resolve the objections that I raised to start with.
>
> > Having read your proposal, it seems changing from time-on-sender to
> > time-on-receiver is a one line change to the patch.
>
> > What else are you thinking of removing, if anything?
>
> Basically, we need to get rid of everything that feeds timestamps from
> the WAL content into the kill-delay logic.

I understand your wish to do this, though it isn't always accurate to do
that in the case where there is a backlog.

The patch already does that *mostly* for the streaming case. The only
time it does use the WAL content timestamp is when the WAL content
timestamp is later than the oldest receive time, so in that case it is
used as a correction. (see code comments and comments below also)

> >> In particular, Simon seems to be basically refusing to do anything about
> >> the complaint that the code fails unless master and standby clocks are
> >> in close sync. I do not believe that this is acceptable, and since he
> >> won't fix it, I guess I'll have to.
>
> > Syncing two servers in replication is common practice, as has been
> > explained here; I'm still surprised people think otherwise.
>
> Doesn't affect the complaint in the least: I do not find it acceptable
> to have that be *mandated* in order for our code to work sensibly.

OK, accepted.

> I would be OK with having something approaching what you want as a
> non-default optional behavior (with a clearly-documented dependency
> on having synchronized clocks).

Yes, that's what I'd been thinking also. So that gives us a way
forwards.

standby_delay_base = apply (default) | send

determines whether the standby_delay is calculated solely with reference
to the standby server (apply) or whether times from the sending server
are used (send). Use of send implies that the clocks on primary and
standby are synchronised to within a useful accuracy, in which case it
is usual to enforce that with time synchronisation software such as ntp.

> But in any case the current behavior is
> still quite broken as regards reading stale timestamps from WAL.

Agreed. That wasn't the objective of this patch or a priority.

If you're reading from an archive, you need to set max_standby_delay
appropriately, current recommendation is -1.

We can address that concern once the main streaming case is covered, or
we can add that now.

Are you planning to work on these things now as you said? How about I
apply my patch now, then you do another set of changes afterwards to add
the other items you mentioned, since that is now 2 additional parameters
and related code?

--
Simon Riggs www.2ndQuadrant.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Marc Munro 2010-06-03 17:23:12 Re: [pgsql-hackers] Daily digest v1.10705 (13 messages)
Previous Message Tom Lane 2010-06-03 17:00:49 Re: Did we really want to force an initdb in beta2?