Re: Keepalive for max_standby_delay

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Keepalive for max_standby_delay
Date: 2010-06-01 11:04:48
Message-ID: 1275390288.21465.279.camel@ebony
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thanks for the review.

On Tue, 2010-06-01 at 13:36 +0300, Heikki Linnakangas wrote:

> If we really want to try to salvage max_standby_delay with a meaning
> similar to what it has now, I think we should go with the idea some
> people bashed around earlier and define the grace period as the
> difference between a WAL record becoming available to the standby for
> replay, and between replaying it. An approximation of that is to do
> "lastIdle = gettimeofday()" in XLogPageRead() whenever it needs to wait
> for new WAL to arrive, whether that's via streaming replication or by a
> success return code from restore_command, and compare the difference of
> that with current timestamp in WaitExceedsMaxStandbyDelay().

That wouldn't cope with a continuous stream of records arriving, unless
you also include the second half of the patch.

> That's very simple, doesn't require synchronized clocks, and works the
> same with file- and stream-based setups.

Nor does it provide a mechanism for monitoring of SR. standby_delay is
explicitly defined in terms of the gap between two servers, so is a
useful real world concept. apply_delay is somewhat less interesting.

I'm sure most people would rather have monitoring and therefore the
requirement for synchronised-ish clocks, than no monitoring. If you
think no monitoring is OK, I don't, but there are other ways, so its not
a point to fight about.

> This certainly alleviates some of the problems. You still need to ensure
> that master and standby have synchronized clocks, and you still get zero
> grace time after a long period of inactivity when not using streaming
> replication, however.

Second issue can be added once we approve the rest of this if you like.

> Sending a keep-alive message every 100ms seems overly aggressive to me.

It's sent every wal_sender_delay. Why is that a negative?

--
Simon Riggs www.2ndQuadrant.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Hardik Belani 2010-06-01 11:10:49 Trigger function in a multi-threaded environment behavior
Previous Message Heikki Linnakangas 2010-06-01 10:53:29 Re: [RFC] A tackle to the leaky VIEWs for RLS