Re: time-delayed standbys

From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Stark <gsstark(at)mit(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: time-delayed standbys
Date: 2011-06-15 05:58:43
Message-ID: BANLkTikFhVOhmf_XKERe=MkU3Ht_K7posQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Apr 21, 2011 at 12:18 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Wed, Apr 20, 2011 at 11:15 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>>> I am a bit concerned about the reliability of this approach.  If there
>>> is some network lag, or some lag in processing from the master, we
>>> could easily get the idea that there is time skew between the machines
>>> when there really isn't.  And our perception of the time skew could
>>> easily bounce around from message to message, as the lag varies.  I
>>> think it would be tremendously ironic of the two machines were
>>> actually synchronized to the microsecond, but by trying to be clever
>>> about it we managed to make the lag-time accurate only to within
>>> several seconds.
>>
>> Well, if walreceiver concludes that there is no more than a few seconds'
>> difference between the clocks, it'd probably be OK to take the master
>> timestamps at face value.  The problem comes when the skew gets large
>> (compared to the configured time delay, I guess).
>
> I suppose.  Any bound on how much lag there can be before we start
> applying to skew correction is going to be fairly arbitrary.

When the replication connection is terminated, the standby tries to read
WAL files from the archive. In this case, there is no walreceiver process,
so how does the standby calculate the clock difference?

> errmsg("parameter \"%s\" requires a temporal value", "recovery_time_delay"),

We should s/"a temporal"/"an Integer"?

After we run "pg_ctl promote", time-delayed replication should be disabled?
Otherwise, failover might take very long time when we set recovery_time_delay
to high value.

http://forge.mysql.com/worklog/task.php?id=344
According to the above page, one purpose of time-delayed replication is to
protect against user mistakes on master. But, when an user notices his wrong
operation on master, what should he do next? The WAL records of his wrong
operation might have already arrived at the standby, so neither "promote" nor
"restart" doesn't cancel that wrong operation. Instead, probably he should
shutdown the standby, investigate the timestamp of XID of the operation
he'd like to cancel, set recovery_target_time and restart the standby.
Something like this procedures should be documented? Or, we should
implement new "promote" mode which finishes a recovery as soon as
"promote" is requested (i.e., not replay all the available WAL records)?

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jaime Casanova 2011-06-15 06:09:15 Re: creating CHECK constraints as NOT VALID
Previous Message Jesper Krogh 2011-06-15 05:30:14 Re: pg_upgrade using appname to lock out other users