Re: Freezing without write I/O

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: Freezing without write I/O
Date: 2013-09-18 13:22:35
Message-ID: 20130918132235.GC21051@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2013-09-16 16:59:28 +0300, Heikki Linnakangas wrote:
> Here's a rebased version of the patch, including the above-mentioned fixes.
> Nothing else new.

* We need some higherlevel description of the algorithm somewhere in the
source. I don't think I've understood the concept from the patch alone
without having read the thread previously.
* why do we need to do the PageUpdateNeedsFreezing() dance in
heap_page_prune? No xids should change during it.
* Why can we do a GetOldestXmin(allDbs = false) in
BeginXidLSNRangeSwitch()?
* Is there any concrete reasoning behind the current values for
XID_LSN_RANGE_INTERVAL and NUM_XID_LSN_RANGES or just gut feeling?
* the lsn ranges file can possibly become bigger than 512bytes (the size
we assume to be written atomically) and you write it inplace. If we
fail halfway through writing, we seem to be able to recover by using
the pageMatureLSN from the last checkpoint, but it seems better to
do the fsync(),rename(),fsync() dance either way.
* Should we preemptively freeze tuples on a page in lazy_scan_heap if we
already have dirtied the page? That would make future modifcations
cheaper.
* lazy_scan_heap now blocks acquiring a cleanup lock on every buffer
that contains dead tuples. Shouldn't we use some kind of cutoff xid
there? That might block progress too heavily. Also the comment above
it still refers to the old logic.
* There's no way to force a full table vacuum anymore, that seems
problematic to me.
* I wonder if CheckPointVarsup() doesn't need to update
minRecoveryPoint. StartupVarsup() should be ok, because we should only
read one from the future during a basebackup?
* xidlsnranges_recently[_dirtied] are not obvious on a first glance. Why
can't we just reset dirty before the WriteXidLSNRangesFile() call?
There's only one process doing the writeout. Just because the
checkpointing process could be killed?
* I think we should either not require consuming an multixactid or use a
function that doesn't need MultiXactIdSetOldestMember(). If the
transaction doing so lives for long it will unnecessarily prevent
truncation of mxacts.
* switchFinishXmin and nextSwitchXid should probably be either volatile
or have a compiler barrier between accessing shared memory and
checking them. The compiler very well could optimize them away and
access shmem all the time which could lead to weird results.
* I wonder whether the fact that we're doing the range switches after
acquiring an xid could be problematic if we're preventing xid
allocation due to the checks earlier in that function?
* I think heap_lock_tuple() needs to unset all-visible, otherwise we
won't vacuum that page again which can lead to problems since we
don't do full-table vacuums again?

So, I think that's enough for a first look. Will think about general
issues a bit more.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dimitri Fontaine 2013-09-18 13:26:05 Re: Where to load modules from?
Previous Message Andres Freund 2013-09-18 13:19:27 Re: psql should show disabled internal triggers