Re: GetOldestXmin going backwards is dangerous after all

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: GetOldestXmin going backwards is dangerous after all
Date: 2013-02-02 00:24:02
Message-ID: 27553.1359764642@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> Having said that, I agree that a fix in GetOldestXmin() would be nice
> if we could find one, but since the comment describes at least three
> different ways the value can move backwards, I'm not sure that there's
> really a practical solution there, especially if you want something we
> can back-patch.

Actually, wait a second. As you say, the comment describes three known
ways to make it go backwards. It strikes me that all three are fixable:

* if allDbs is FALSE and there are no transactions running in the current
* database, GetOldestXmin() returns latestCompletedXid. If a transaction
* begins after that, its xmin will include in-progress transactions in other
* databases that started earlier, so another call will return a lower value.

The reason this is a problem is that GetOldestXmin ignores XIDs of
processes that are connected to other DBs. It now seems to me that this
is a flat-out bug. It can ignore their xmins, but it should include
their XIDs, because the point of considering those XIDs is that they may
contribute to the xmins of snapshots computed in the future by processes
in our own DB. And snapshots never exclude any XIDs on the basis of
which DB they're in. (They can't really, since we can't know when the
snap is taken whether it might be used to examine shared catalogs.)

* There are also replication-related effects: a walsender
* process can set its xmin based on transactions that are no longer running
* in the master but are still being replayed on the standby, thus possibly
* making the GetOldestXmin reading go backwards. In this case there is a
* possibility that we lose data that the standby would like to have, but
* there is little we can do about that --- data is only protected if the
* walsender runs continuously while queries are executed on the standby.
* (The Hot Standby code deals with such cases by failing standby queries
* that needed to access already-removed data, so there's no integrity bug.)

This is just bogus. Why don't we make it a requirement on walsenders
that they never move their advertised xmin backwards (or initially set
it to less than the prevailing global xmin)? There's no real benefit to
allowing them to try to move the global xmin backwards, because any data
that they might hope to protect that way could be gone already.

* The return value is also adjusted with vacuum_defer_cleanup_age, so
* increasing that setting on the fly is another easy way to make
* GetOldestXmin() move backwards, with no consequences for data integrity.

And as for that, it's been pretty clear for awhile that allowing
vacuum_defer_cleanup_age to change on the fly was a bad idea we'd
eventually have to undo. The day of reckoning has arrived: it needs
to be PGC_POSTMASTER.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Janes 2013-02-02 04:48:48 Re: autovacuum not prioritising for-wraparound tables
Previous Message Tom Lane 2013-02-01 23:56:21 Re: GetOldestXmin going backwards is dangerous after all