Re: [PATCH] introduce XLogLockBlockRangeForCleanup()

Lists: pgsql-hackers
From: Abhijit Menon-Sen <ams(at)2ndQuadrant(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: [PATCH] introduce XLogLockBlockRangeForCleanup()
Date: 2014-06-13 08:40:12
Message-ID: 20140613084012.GA16567@toroid.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

nbtxlog.c:btree_xlog_vacuum() contains the following comment:

* XXX we don't actually need to read the block, we just need to
* confirm it is unpinned. If we had a special call into the
* buffer manager we could optimise this so that if the block is
* not in shared_buffers we confirm it as unpinned.

The attached patch introduces an XLogLockBlockRangeForCleanup() function
that addresses this, as well as a "special call into the buffer manager"
that makes it possible to lock the buffers without reading them.

The patch is by Simon Riggs, with some minor edits and testing by me.

I'm adding the patch to the CommitFest, and will follow up with some
performance numbers soon.

-- Abhijit

Attachment Content-Type Size
XLogLockBlockRangeForCleanup.v3.patch text/x-diff 14.2 KB

From: Amit Khandekar <amit(dot)khandekar(at)enterprisedb(dot)com>
To: Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] introduce XLogLockBlockRangeForCleanup()
Date: 2014-07-03 05:45:53
Message-ID: CACoZds3Q4dufqdFGJvMqihbprPx2rR_4xL3=YtMZiZ6Joc2uXg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 13 June 2014 14:10, Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com> wrote:

> nbtxlog.c:btree_xlog_vacuum() contains the following comment:
>
> * XXX we don't actually need to read the block, we just need to
> * confirm it is unpinned. If we had a special call into the
> * buffer manager we could optimise this so that if the block is
> * not in shared_buffers we confirm it as unpinned.
>
> The attached patch introduces an XLogLockBlockRangeForCleanup() function
> that addresses this, as well as a "special call into the buffer manager"
> that makes it possible to lock the buffers without reading them.
>

In GetBufferWithoutRelcache(), I was wondering, rather than calling
PinBuffer(), if we do this :
LockBufHdr(buf);
PinBuffer_Locked(buf);
valid = ((buf->flags & BM_VALID) != 0);
then we can avoid having the new buffer access strategy BAS_DISCARD that is
introduced in this patch. And so the code changes in freelist.c would not
be necessary.

Will give further thought on the overall logic in
XLogLockBlockRangeForCleanup().

> will follow up with some performance numbers soon.
>

Yes, that would be nice.

-Amit


From: Abhijit Menon-Sen <ams(at)2ndQuadrant(dot)com>
To: Amit Khandekar <amit(dot)khandekar(at)enterprisedb(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH] introduce XLogLockBlockRangeForCleanup()
Date: 2014-07-03 06:11:39
Message-ID: 20140703061139.GF10574@toroid.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

At 2014-07-03 11:15:53 +0530, amit(dot)khandekar(at)enterprisedb(dot)com wrote:
>
> In GetBufferWithoutRelcache(), I was wondering, rather than calling
> PinBuffer(), if we do this :
> LockBufHdr(buf);
> PinBuffer_Locked(buf);
> valid = ((buf->flags & BM_VALID) != 0);
> then we can avoid having the new buffer access strategy BAS_DISCARD
> that is introduced in this patch. And so the code changes in
> freelist.c would not be necessary.

Thanks for the suggestion. It sounds sensible at first glance. I'll
think about it a little more, then try it and see.

> > will follow up with some performance numbers soon.
>
> Yes, that would be nice.

I'm afraid I haven't had a chance to do this yet.

-- Abhijit


From: Abhijit Menon-Sen <ams(at)2ndQuadrant(dot)com>
To: Amit Khandekar <amit(dot)khandekar(at)enterprisedb(dot)com>
Cc: simon(at)2ndQuadrant(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH] introduce XLogLockBlockRangeForCleanup()
Date: 2014-07-03 10:51:59
Message-ID: 20140703105159.GI10574@toroid.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

FYI, I've attached a patch that does what you suggested. I haven't done
anything else (i.e. testing) with it yet.

-- Abhijit

Attachment Content-Type Size
xlog-lock-blockrange-2.diff text/x-diff 8.7 KB

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Amit Khandekar <amit(dot)khandekar(at)enterprisedb(dot)com>
Cc: Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] introduce XLogLockBlockRangeForCleanup()
Date: 2014-07-03 11:29:34
Message-ID: CA+U5nMLSHYFkwTaHFg+jO3gvOL3aUOkEeWZUwXBOhgM-KBEyyQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 3 July 2014 06:45, Amit Khandekar <amit(dot)khandekar(at)enterprisedb(dot)com> wrote:

> In GetBufferWithoutRelcache(), I was wondering, rather than calling
> PinBuffer(), if we do this :
> LockBufHdr(buf);
> PinBuffer_Locked(buf);
> valid = ((buf->flags & BM_VALID) != 0);
> then we can avoid having the new buffer access strategy BAS_DISCARD that is
> introduced in this patch. And so the code changes in freelist.c would not be
> necessary.

That looks like a good idea, thanks.

I think we should say this though

LockBufHdr(buf);
valid = ((buf->flags & BM_VALID) != 0);
if (valid)
PinBuffer_Locked(buf);
else
UnlockBufHdr(buf);

since otherwise we would access the buffer flags without the spinlock
and we would leak a pin if the buffer was not valid

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Amit Khandekar <amit(dot)khandekar(at)enterprisedb(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] introduce XLogLockBlockRangeForCleanup()
Date: 2014-07-04 09:37:14
Message-ID: CACoZds3KL9MxRe5PPDruT2s+Ay==iGrWGLHJqPygPTtXciR_=g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 3 July 2014 16:59, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:

>
> I think we should say this though
>
> LockBufHdr(buf);
> valid = ((buf->flags & BM_VALID) != 0);
> if (valid)
> PinBuffer_Locked(buf);
> else
> UnlockBufHdr(buf);
>
> since otherwise we would access the buffer flags without the spinlock
> and we would leak a pin if the buffer was not valid
>

Ah right. That is essential.


From: Abhijit Menon-Sen <ams(at)2ndQuadrant(dot)com>
To: Amit Khandekar <amit(dot)khandekar(at)enterprisedb(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH] introduce XLogLockBlockRangeForCleanup()
Date: 2014-07-04 13:41:56
Message-ID: 20140704134156.GA29276@toroid.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Updated patch attached, thanks.

Amit: what's your conclusion from the review?

-- Abhijit

Attachment Content-Type Size
xlog-lock-blockrange-3.diff text/x-diff 8.8 KB

From: Amit Khandekar <amit(dot)khandekar(at)enterprisedb(dot)com>
To: Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] introduce XLogLockBlockRangeForCleanup()
Date: 2014-07-07 08:32:15
Message-ID: CACoZds2nbjNUBBJ4vo0fepS+NVBAhYLfK5NZNiUa3PBr2-bmWQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 4 July 2014 19:11, Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com> wrote:

> Updated patch attached, thanks.
>
> Amit: what's your conclusion from the review?
>

Other than some minor comments as mentioned below, I don't have any more
issues, it looks all good.

XLogLockBlockRangeForCleanup() function header comments has the function
name spelled: XLogBlockRangeForCleanup

In GetBufferWithoutRelcache(), we can call BufferDescriptorGetBuffer(buf)
rather than BufferDescriptorGetBuffer(&BufferDescriptors[buf_id]).

> -- Abhijit
>


From: Abhijit Menon-Sen <ams(at)2ndQuadrant(dot)com>
To: Amit Khandekar <amit(dot)khandekar(at)enterprisedb(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH] introduce XLogLockBlockRangeForCleanup()
Date: 2014-07-07 08:46:09
Message-ID: 20140707084608.GA10806@toroid.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

At 2014-07-07 14:02:15 +0530, amit(dot)khandekar(at)enterprisedb(dot)com wrote:
>
> Other than some minor comments as mentioned below, I don't have any
> more issues, it looks all good.

Thank you. (Updated patch attached.)

-- Abhijit

Attachment Content-Type Size
xlog-lock-blockrange-4.diff text/x-diff 8.8 KB

From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Abhijit Menon-Sen <ams(at)2ndQuadrant(dot)com>, Amit Khandekar <amit(dot)khandekar(at)enterprisedb(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] introduce XLogLockBlockRangeForCleanup()
Date: 2014-08-20 08:07:44
Message-ID: 53F45750.70409@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 07/07/2014 11:46 AM, Abhijit Menon-Sen wrote:
> At 2014-07-07 14:02:15 +0530, amit(dot)khandekar(at)enterprisedb(dot)com wrote:
>>
>> Other than some minor comments as mentioned below, I don't have any
>> more issues, it looks all good.
>
> Thank you. (Updated patch attached.)

I don't think the new GetBufferWithoutRelcache function is in line with
the existing ReadBuffer API. I think it would be better to add a new
ReadBufferMode - RBM_CACHE_ONLY? - that only returns the buffer if it's
already in cache, and InvalidBuffer otherwise.

- Heikki


From: Abhijit Menon-Sen <ams(at)2ndQuadrant(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Amit Khandekar <amit(dot)khandekar(at)enterprisedb(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH] introduce XLogLockBlockRangeForCleanup()
Date: 2015-06-23 13:09:10
Message-ID: 20150623130909.GA13652@toroid.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

At 2014-08-20 11:07:44 +0300, hlinnakangas(at)vmware(dot)com wrote:
>
> I don't think the new GetBufferWithoutRelcache function is in line
> with the existing ReadBuffer API. I think it would be better to add a
> new ReadBufferMode - RBM_CACHE_ONLY? - that only returns the buffer if
> it's already in cache, and InvalidBuffer otherwise.

Hi Heikki.

I liked the idea of having a new ReadBufferMode of RBM_CACHE_ONLY, but I
wasn't happy with the result when I tried to modify the code to suit. It
didn't make the code easier for me to follow.

(I'll say straight up that it can be done the way you said, and if the
consensus is that it would be an improvement, I'll do it that way. I'm
just not convinced about it, hence this mail.)

For example:

> static Buffer
> ReadBuffer_common(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
> BlockNumber blockNum, ReadBufferMode mode,
> BufferAccessStrategy strategy, bool *hit)
> {
> volatile BufferDesc *bufHdr;
> Block bufBlock;
> bool found;
> bool isExtend;
> bool isLocalBuf = SmgrIsTemp(smgr);
>
> *hit = false;
>
> /* Make sure we will have room to remember the buffer pin */
> ResourceOwnerEnlargeBuffers(CurrentResourceOwner);
>
> isExtend = (blockNum == P_NEW);

isLocalBuf and isExtend will both be false for the RBM_CACHE_ONLY case,
which is good for us. But that probably needs to be mentioned in a
comment here.

> TRACE_POSTGRESQL_BUFFER_READ_START(forkNum, blockNum,
> smgr->smgr_rnode.node.spcNode,
> smgr->smgr_rnode.node.dbNode,
> smgr->smgr_rnode.node.relNode,
> smgr->smgr_rnode.backend,
> isExtend);

We know we're not going to be doing IO in the RBM_CACHE_ONLY case, but
that's probably OK? I don't understand this TRACE_* stuff. But we need
to either skip this, or also do the corresponding TRACE_*_DONE later.

> if (isLocalBuf)
> {
> bufHdr = LocalBufferAlloc(smgr, forkNum, blockNum, &found);
> if (found)
> pgBufferUsage.local_blks_hit++;
> else
> pgBufferUsage.local_blks_read++;
> }
> else
> {
> /*
> * lookup the buffer. IO_IN_PROGRESS is set if the requested block is
> * not currently in memory.
> */
> bufHdr = BufferAlloc(smgr, relpersistence, forkNum, blockNum,
> strategy, &found);
> if (found)
> pgBufferUsage.shared_blks_hit++;
> else
> pgBufferUsage.shared_blks_read++;
> }

The nicest option I could think of here was to copy the shared_buffers
lookup from BufferAlloc into a new BufferAlloc_shared function, and add
a new branch for RBM_CACHE_ONLY. That would look like:

if (isLocalBuf)

else if (mode == RBM_CACHE_ONLY)
{
/*
* We check to see if the buffer is already cached, and if it's
* not, we return InvalidBuffer because we know it's not pinned.
*/
bufHdr = BufferAlloc_shared(…, &found);
if (!found)
return InvalidBuffer;
LockBufferForCleanup(BufferDescriptorGetBuffer(bufHdr));
return BufferDescriptorGetBuffer(bufHdr);
}
else
{
/*
* lookup the buffer …
}

…or if we went with the rest of the code and didn't do the lock/return
immediately, we would fall through to this code:

> /* At this point we do NOT hold any locks. */
>
> /* if it was already in the buffer pool, we're done */
> if (found)
> {
> if (!isExtend)
> {
> /* Just need to update stats before we exit */
> *hit = true;
> VacuumPageHit++;
>
> if (VacuumCostActive)
> VacuumCostBalance += VacuumCostPageHit;
>
> TRACE_POSTGRESQL_BUFFER_READ_DONE(forkNum, blockNum,
> smgr->smgr_rnode.node.spcNode,
> smgr->smgr_rnode.node.dbNode,
> smgr->smgr_rnode.node.relNode,
> smgr->smgr_rnode.backend,
> isExtend,
> found);
>
> /*
> * In RBM_ZERO_AND_LOCK mode the caller expects the page to be
> * locked on return.
> */
> if (!isLocalBuf)
> {
> if (mode == RBM_ZERO_AND_LOCK)
> LWLockAcquire(bufHdr->content_lock, LW_EXCLUSIVE);
> else if (mode == RBM_ZERO_AND_CLEANUP_LOCK)
> LockBufferForCleanup(BufferDescriptorGetBuffer(bufHdr));
> }
>
> return BufferDescriptorGetBuffer(bufHdr);
> }

Some of this isn't applicable to RBM_CACHE_ONLY, so we could either skip
it until we reach the RBM_ZERO_AND_CLEANUP_LOCK part, or add a new
branch before the «if (!isExtend)» one.

In summary, we're either adding one new branch that handles everything
related to RBM_CACHE_ONLY and side-stepping some things on the way that
happen to not apply (isLocalBuf, isExtend, TRACE_*), or we're adding a
bunch of tests to ReadBuffer_common.

But splitting BufferAlloc into BufferAlloc_shared and the rest of
BufferAlloc doesn't look especially nice either. If you look at the
shared_buffers lookup from BufferAlloc:

> /* see if the block is in the buffer pool already */
> LWLockAcquire(newPartitionLock, LW_SHARED);
> buf_id = BufTableLookup(&newTag, newHash);
> if (buf_id >= 0)
> {
> /*
> * Found it. Now, pin the buffer so no one can steal it from the
> * buffer pool, and check to see if the correct data has been loaded
> * into the buffer.
> */
> buf = GetBufferDescriptor(buf_id);
>
> valid = PinBuffer(buf, strategy);
>
> /* Can release the mapping lock as soon as we've pinned it */
> LWLockRelease(newPartitionLock);
>
> *foundPtr = TRUE;
>
> if (!valid)
> {
> /*
> * We can only get here if (a) someone else is still reading in
> * the page, or (b) a previous read attempt failed. We have to
> * wait for any active read attempt to finish, and then set up our
> * own read attempt if the page is still not BM_VALID.
> * StartBufferIO does it all.
> */
> if (StartBufferIO(buf, true))
> {
> /*
> * If we get here, previous attempts to read the buffer must
> * have failed ... but we shall bravely try again.
> */
> *foundPtr = FALSE;
> }
> }
>
> return buf;
> }
>
> /*
> * Didn't find it in the buffer pool. We'll have to initialize a new
> * buffer. Remember to unlock the mapping lock while doing the work.
> */
> LWLockRelease(newPartitionLock);
>
> /* Loop here in case we have to try another victim buffer */
> for (;;)
> {

There are two options here. I split out BufferAlloc_shared and either
make BufferAlloc call it, or not.

If I make BufferAlloc call it, the LWLockAcquire/Release could move to
the _shared function without confusion, but I'll either have to return
the PinBuffer BM_VALID test result via a validPtr, or repeat the test
in the caller before calling StartBufferIO. Both the INIT_BUFFERTAG and
BufTableHashCode() would also have to be in both functions, since they
are used for BufTableInsert() later in BufferAlloc.

On the other hand, if I don't make BufferAlloc call the _shared
function, then we end up with a new few-line special-case function plus
either a self-contained new branch in ReadBuffer_common, or a bunch of
RBM_CACHE_ONLY tests. That doesn't really seem an improvement over what
the original patch did, i.e. introduce a single special-case function to
handle a single special-case access.

I'm open to suggestions.

-- Abhijit

P.S. For reference, I've attached the patch that I posted earlier,
rebased to apply to master.

Attachment Content-Type Size
20150623-xlog-lock-blockrange-5.diff text/x-diff 9.4 KB