Re: Time-Delayed Standbys

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Fabrízio Mello <fabriziomello(at)gmail(dot)com>
Cc: Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Time-Delayed Standbys
Date: 2013-12-03 19:33:16
Message-ID: CA+U5nML64Y7Nf0mpQirk-tOWrV+ZQ3y8DdRgk2mP+e3kGT16Mw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 18 October 2013 19:03, Fabrízio de Royes Mello
<fabriziomello(at)gmail(dot)com> wrote:

> The attached patch is a continuation of Robert's work [1].

Reviewing v2...

> I made some changes:
> - use of Latches instead of pg_usleep, so we don't have to wakeup regularly.

OK

> - call HandleStartupProcInterrupts() before CheckForStandbyTrigger() because
> might change the trigger file's location

OK

> - compute recoveryUntilDelayTime in XLOG_XACT_COMMIT and
> XLOG_XACT_COMMIT_COMPACT checks

Why just those? Why not aborts and restore points also?

> - don't care about clockdrift because it's an admin problem.

Few minor points on things

* The code with comment "Clear any previous recovery delay time" is in
wrong place, move down or remove completely. Setting the delay to zero
doesn't prevent calling recoveryDelay(), so that logic looks wrong
anyway.

* The loop exit in recoveryDelay() is inelegant, should break if <= 0

* There's a spelling mistake in sample

* The patch has whitespace in one place

and one important point...

* The delay loop happens AFTER we check for a pause. Which means if
the user notices a problem on a commit, then hits pause button on the
standby, the pause will have no effect and the next commit will be
applied anyway. Maybe just one commit, but its an off by one error
that removes the benefit of the patch. So I think we need to test this
again after we finish delaying

if (xlogctl->recoveryPause)
recoveryPausesHere();

We need to explain in the docs that this is intended only for use in a
live streaming deployment. It will have little or no meaning in a
PITR.

I think recovery_time_delay should be called
<something>_apply_delay
to highlight the point that it is the apply of records that is
delayed, not the receipt. And hence the need to document that sync rep
is NOT slowed down by setting this value.

And to make the name consistent with other parameters, I suggest
min_standby_apply_delay

We also need to document caveats about the patch, in that it only
delays on timestamped WAL records and other records may be applied
sooner than the delay in some circumstances, so it is not a way to
avoid all cancellations.

We also need to document the behaviour of the patch is to apply all
data received as quickly as possible once triggered, so the specified
delay does not slow down promoting the server to a master. That might
also be seen as a negative behaviour, since promoting the master
effectively sets recovery_time_delay to zero.

I will handle the additional documentation, if you can update the
patch with the main review comments. Thanks.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2013-12-03 19:34:10 Re: 9.2.1 & index-only scans : abnormal heap fetches after VACUUM FULL
Previous Message Tom Lane 2013-12-03 19:31:17 Re: Extension Templates S03E11