Re: Crash safe visibility map vs hint bits

From: "Jesper(at)Krogh(dot)cc" <jesper(at)krogh(dot)cc>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Crash safe visibility map vs hint bits
Date: 2010-12-04 08:22:00
Message-ID: CCA5101C-C4D4-409F-8D3C-89F8E850E810@krogh.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Den 4 Dec 2010 kl. 08:48 skrev Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>:

> On 04.12.2010 09:14, Jesper(at)Krogh(dot)cc wrote:
>> There has been a lot discussion about index-only scans and how to make the visibillity map crash safe. Then followed by a good discussion about hint bits.
>>
>> What seems to be the main concern is the added wal volume and it makes me wonder if there is a way in-between that looks more like hint bits.
>>
>> How about lazily wal-log the complete visibility map say every X minutes or N amount of tuple updates and make the wal recovery jobs of rechecking visibility of pages touched by the wal stream on recovery.
>
> If you WAL-log the visibility map changes after-the-fact, it doesn't solve the race condition we're struggling with: the visibility map change might hit the disk before the PD_ALL_VISIBLE to the heap page. If you crash, you can end up with a situation where the PD_ALL_VISIBLE flag on the heap page is not set, but the bit in the visibility map is. Which causes serious issues later on.

My imagination is probably not as good, but if you at time A wallog the complete map and at A+1 you update a tuple so the visibility bit is cleared but the map bit change does not happen due to a crash. Then at wal replay time you restore the map from time A and if the tuple change at A+1 is represented in the wal stream the you also update the visibility map. This is the situation where the heap tuple hit disk but the map is left in a broken state? Or is it a different similar looking situation?

The tuple change in the wal stream will require the system to reinspect the page anyway so there shouldn't be any additional disk io on replay due to this.

Jesper
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2010-12-04 08:27:08 Re: Crash safe visibility map vs hint bits
Previous Message Heikki Linnakangas 2010-12-04 07:53:49 Re: Streaming replication document