LogStandbySnapshot (was another thread)

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: LogStandbySnapshot (was another thread)
Date: 2010-05-04 23:42:15
Message-ID: 1273016535.4535.3155.camel@ebony
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 2010-05-04 at 13:23 -0400, Tom Lane wrote:

> * LogStandbySnapshot is merest fantasy: no guarantee that either the
> XIDs list or the locks list will be consistent with the point in WAL
> where it will get inserted. What's worse, locking things down enough
> to guarantee consistency would be horrid for performance, or maybe
> even deadlock-inducing. Could lose both ways: list might contain an
> XID whose commit/abort went to WAL before the snapshot did, or list
> might be missing an XID started just after snap was taken, The latter
> case could possibly be dealt with via nextXid filtering, but that
> doesn't fix the former case, and anyway we have both ends of the same
> problem for locks.

This was the only serious complaint on your list, so lets address it.

Clearly we don't want to lock everything down, for all the reasons you
say. That creates a gap between when data is derived and when data
logged to WAL.

LogStandbySnapshot() occurs during online checkpoints on or after the
logical checkpoint location and before the physical checkpoint location.

We start recovery from a checkpoint, so we have a starting point in WAL
for our processing. The time sequence on the primary of these related
events is

Logical Checkpoint location
newxids/commits/locks "Before1"
AccessExclusiveLocks derived
newxids/commits/locks "Before2"
AccessExclusiveLocks WAL record inserted
newxids/commits/locks "After1"
RunningXact derived
newxids/commits/locks "After2"
RunningXact WAL record inserted

though when we read them back from WAL, they will be in this order, and
we cannot tell the difference between events at Before 1 & 2 or After 1
& 2.

Logical Checkpoint location <= STANDBY_INITIALIZED
newxids/commits/locks "Before1"
newxids/commits/locks "Before2"
AccessExclusiveLocks WAL record
newxids/commits/locks "After1"
newxids/commits/locks "After2"
RunningXact WAL record <= STANDBY_SNAPSHOT_READY

We're looking for a consistent point. We don't know what the exact
time-synchronised point is on master, so we have to use an exact point
in WAL and work from there. We need to understand that the serialization
of events in the log can be slightly different to how they occurred on
the primary, but that doesn't change anything important.

So to get a set of xids + locks that are consistent at the moment the
RunningXact WAL record is read we need to

1. Begin processing incoming changes from the time we are
STANDBY_INITIALIZED, though forgive any errors for removals of missing
items until we hit STANDBY_SNAPSHOT_READY
a) locks - we ignore missing locks in StandbyReleaseLocks()
b) xids - we ignore missing xids in KnownAssignedXidsRemove()

2. Any transaction commits/aborts from the time we are
STANDBY_INITIALIZED, through to STANDBY_SNAPSHOT_READY need to be saved,
so that we can remove them again from the snapshot state. That is
because events might otherwise exist in the standby that will never be
removed from snapshot. We do this by simple test whether the related xid
has already completed.
a) locks - we ignore locks for already completed xids in
StandbyAcquireAccessExclusiveLock()
b) xids - we ignore already completed xids in
ProcArrayApplyRecoveryInfo()

We currently do all of the above. So it looks correct to me.

--
Simon Riggs www.2ndQuadrant.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Joshua D. Drake 2010-05-04 23:51:12 Re: max_standby_delay considered harmful
Previous Message Josh Berkus 2010-05-04 23:40:22 Need to contact driver authors about change in index naming behavior ...