Re: crash-safe visibility map, take five

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: crash-safe visibility map, take five
Date: 2011-05-10 12:48:32
Message-ID: BANLkTi=b7jVmq6fA_EXLCYgzuyV1u9at4A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, May 9, 2011 at 10:25 PM, Merlin Moncure <mmoncure(at)gmail(dot)com> wrote:
> On Fri, May 6, 2011 at 5:47 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> On Wed, Mar 30, 2011 at 8:52 AM, Heikki Linnakangas
>> <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>>>> Another question:
>>>> To address the problem in
>>>> http://archives.postgresql.org/pgsql-hackers/2010-02/msg02097.php
>>>> , should we just clear the vm before the log of insert/update/delete?
>>>> This may reduce the performance, is there another solution?
>>>
>>> Yeah, that's a straightforward way to fix it. I don't think the performance
>>> hit will be too bad. But we need to be careful not to hold locks while doing
>>> I/O, which might require some rearrangement of the code. We might want to do
>>> a similar dance that we do in vacuum, and call visibilitymap_pin first, then
>>> lock and update the heap page, and then set the VM bit while holding the
>>> lock on the heap page.
>>
>> Here's an attempt at implementing the necessary gymnastics.
>
> Is there a quick synopsis of why you have to do (sometimes) the
> pin->lock->unlock->pin->lock mechanic? How come you only can fail to
> get the pin at most once?

I thought I'd explained it fairly thoroughly in the comments, but
evidently not. Suggestions for improvement are welcome.

Here goes in more detail: Every time we insert, update, or delete a
tuple in a particular heap page, we must check whether the page is
marked all-visible. If it is, then we need to clear the page-level
bit marking it as all-visible, and also the corresponding page in the
visibility map. On the other hand, if the page isn't marked
all-visible, then we needn't touch the visibility map at all. So,
there are either one or two buffers involved: the buffer containing
the heap page ("buffer") and possibly also a buffer containing the
visibility map page in which the bit for the heap page is to be found
("vmbuffer"). Before taking an exclusive content-lock on the heap
buffer, we check whether the page appears to be all-visible. If it
does, then we pin the visibility map page and then lock the buffer.
If not, we just lock the buffer. However, since we weren't holding
any lock, it's possible that between the time when we checked the
visibility map bit and the time when we obtained the exclusive
buffer-lock, the visibility map bit might have changed from clear to
set (because someone is concurrently running VACUUM on the table; or
on platforms with weak memory-ordering, someone was running VACUUM
"almost" concurrently). If that happens, we give up our buffer lock,
go pin the visibility map page, and reacquire the buffer lock.

At this point in the process, we know that *if* the page is marked
all-visible, *then* we have the appropriate visibility map page
pinned. There are three possible pathways: (1) If the buffer
initially appeared to be all-visible, we will have pinned the
visibility map page before acquiring the exclusive lock; (2) If the
buffer initially appeared NOT to be all-visible, but by the time we
obtained the exclusive lock it now appeared to be all-visible, then we
will have done the unfortunate unlock-pin-relock dance, and the
visibility map page will now be pinned; (3) if the buffer initially
appeared NOT to be all-visible, and by the time we obtained the
exclusive lock it STILL appeared NOT to be all-visible, then we don't
have the visibility map page pinned - but that's OK, because in this
case no operation on the visibility map needs to be performed.

Now it is very possible that in case (1) or (2) the visibility map
bit, though we saw it set at some point, will actually have been
cleared in the meantime. In case (1), this could happen before we
obtain the exclusive lock; while in case (2), it could happen after we
give up the lock to go pin the visibility map page and before we
reacquire it. This will typically happen when a buffer has been
sitting around for a while in an all-visible state and suddenly two
different backends both try to update or delete tuples in that buffer
at almost exactly the same time. But it causes no great harm - both
backends will pin the visibility map page, whichever one gets the
exclusive lock on the heap page first will clear it, and when the
other backend gets the heap page afterwards, it will see that the bit
has already been cleared and do nothing further. We've wasted the
effort of pinning and unpinning the visibility map page when it wasn't
really necessary, but that's not the end of the world.

We could avoid all of this complexity - and the possibility of pinning
the visibility map page needlessly - by locking the heap buffer first
and then pinning the visibility map page if the heap page is
all-visible. However, that would involve holding the lock on the heap
buffer across a possible disk I/O to bring the visibility map page
into memory, which is something the existing code tries pretty hard to
avoid.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2011-05-10 12:49:12 Re: switch UNLOGGED to LOGGED
Previous Message Leonardo Francalanci 2011-05-10 12:03:46 Re: switch UNLOGGED to LOGGED