Re: Visibility map thoughts

From: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Visibility map thoughts
Date: 2007-11-05 22:27:26
Message-ID: 472F98CE.1020600@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tom Lane wrote:
> Heikki Linnakangas <heikki(at)enterprisedb(dot)com> writes:
>> Though 8.3 isn't out of the oven just yet, I've been thinking about the
>> dead space map a lot, and decided I have to start writing down those
>> thoughts.
>
> I think we should do this at the same time as pushing the FSM out of
> shared memory, and design a data structure that serves both needs.

I'm afraid the data structure required for FSM is very different.

For the visibility map, we need something that can be quickly addressed
by heap block number. For FSM, we need something that's addressable by
tuple size.

Cache locality of having the FSM and the visibility map in the same
structure is also not very good. Visibility map is accessed by reads,
and updates/inserts/deletes on pages that previously had a set bit in
the map, while the FSM is only needed for cold updates and deletes.

>> Where to store the visibility map?
>> ----------------------------------
>
>> a) In a fixed size shared memory struct. Not acceptable.
>
>> b) In the special area of every nth heap page. I played a bit with this
>> back in February 2006, because it's simple and requires few changes.
>
>> c) As a new relation kind, like toast tables.
>
>> d) As an auxiliary smgr relation, attached to the heap.
>
>> I'm leaning towards D at the moment. It requires a little bit of changes
>> to the buffer manager API and elsewhere, but not too much.
>
> I like B. The main objection I have to D is that it'll be far more
> invasive to the buffer manager (and a bunch of other code) than you
> admit; for starters RelFileNode won't work as-is.

One problem is that you have to atomically update the visibility map
when you update the heap. That means that you have to lock the
visibility map page and the heap page at the same time. If the
visibility map is in the heap, you need to take care that you don't
deadlock.

We can't directly use B for the FSM, because we also need a FSM for
indexes, so we'll need something else for indexes anyway.

B is also problematic if we ever get to do upgrades without
dump/restore, because it requires changes to the format of the heap itself.

> What I'm envisioning is that we dedicate a quarter or half of every n'th
> heap page to visibility + free space map. Probably a byte per page is
> needed (this would give us 1 visibility bit and 7 bits for free space,
> or other tradeoffs if needed). So there would be 2K or 4K heap pages
> associated with each such special page. Or we could dial it down to
> even less, say 1K heap pages per special page, which would have the
> advantage of reducing update contention for those pages.

How would you search that for a page with X bytes of free space?

> I don't think this really works. You are effectively assuming that no
> kind of crash-induced corruption can set a bit that you didn't intend
> to set.

Don't we already assume that for hint-bit updates?

> Stated in those terms it seems obviously bogus. What we will
> need is that every WAL-logged update operation also include the
> appropriate setting or clearing of the page's visibility bit; where
> necessary, add new information to the WAL trace to allow this.

I think we already emit a WAL record whenever we would set the bit in
the visibility map, so we can certainly do that. It was more an
interesting observation than anything else.

>> To further reduce contention, we can have a copy of the bit in the page
>> header of the heap page itself. That way we'd only need to access the
>> visibility map on the first update on a page that clears a bit.
>
> Seems exceedingly prone to corruption.

It does?

It seems fairly simple to keep it up-to-date. Whenever we update the
visibility map, we're holding the heap page locked as well.

If you were thinking of hardware failure, I don't see how that bit would
be more susceptible to that, and the potential damage isn't any bigger
than from any other random bit-flip either.

We can double-check and correct both the bit in the page header and in
the visibility map whenever we perform a regular vacuum (or prune), if
they did get corrupt for some reason.

>> - We don't need to clear the bit on HOT updates, because by definition
>> none of the indexed columns changed.
>
> Huh? I don't think I believe that, and I definitely don't believe your
> argument for it.

HOT-updating a row doesn't change what an index-only scan would see. An
index-only scan only needs columns that are in the index, and since it's
a HOT update, none of those columns have changed, so it doesn't matter
which tuple we're returning them from.

Pages that have HOT updated tuples, but haven't otherwise been modified
since last vacuum are also not interesting for VACUUM, because a prune
will do the job of getting rid of dead HOT-updated tuples just as well.

Am I missing something?

Another thought: Should we support having a FSM and visibility map on
temp tables? Doesn't seem very useful, but why not, I guess.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Mark Wong 2007-11-05 22:33:54 Re: Test lab
Previous Message Bruce Momjian 2007-11-05 22:15:59 Re: Segmentation fault using digest from pg_crypto