Re: CacheInvalidateRelcache in btree is a crummy idea

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: CacheInvalidateRelcache in btree is a crummy idea
Date: 2014-02-05 14:51:46
Message-ID: 16938.1391611906@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> writes:
> On 02/02/2014 11:45 PM, Tom Lane wrote:
>> So I'm thinking my commit d2896a9ed, which introduced this mechanism,
>> was poorly thought out and we should just remove the relcache invals
>> as per the attached patch. Letting _bt_getroot() update the cached
>> metapage at next use should be a lot cheaper than a full relcache
>> rebuild for the index.

> Looks good to me.

I did think of one possible objection to this idea. Although
_bt_getroot() checks that the page it arrives at is a usable fast root,
it could still fail if the page number is off the end of the relation;
that is, we have a cache entry that predates an index truncation.
Currently, the only way for a btree to get physically shorter is a
REINDEX, which implies a relfilenode change and hence a (transactional)
relcache flush, so this couldn't happen. And if we tried to allow
VACUUM to truncate an index without a full lock, we'd probably have
the same type of issue for any concurrent process that's following
a stale cross-page link, not just a link to the root. So I'm not
particularly impressed by this objection, but it could be made.

If we ever did need to make that work, a possible solution would be
to refactor things so that the metapage cache lives at the smgr level
not the relcache level, where it'd get blown away by a cross-backend
smgr inval (CacheInvalidateSmgr) --- which is nontransactional,
thereby fixing the basic problem with the way it's being done now.
There would still be some issues about locking, but at least the
cache per se wouldn't be a hazard anymore.

Since I'm not aware of any plans to make on-the-fly btree truncation
work, I won't bother to implement that now, but I thought I'd mention
it for the archives.

regards, tom lane

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2014-02-05 14:57:11 Re: Re: Misaligned BufferDescriptors causing major performance problems on AMD
Previous Message Amit Kapila 2014-02-05 14:48:52 Re: Performance Improvement by reducing WAL for Update Operation