Re: Visibility map thoughts

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Visibility map thoughts
Date: 2007-11-05 20:41:38
Message-ID: 27582.1194295298@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Heikki Linnakangas <heikki(at)enterprisedb(dot)com> writes:
> Though 8.3 isn't out of the oven just yet, I've been thinking about the
> dead space map a lot, and decided I have to start writing down those
> thoughts.

I think we should do this at the same time as pushing the FSM out of
shared memory, and design a data structure that serves both needs.

> Where to store the visibility map?
> ----------------------------------

> a) In a fixed size shared memory struct. Not acceptable.

> b) In the special area of every nth heap page. I played a bit with this
> back in February 2006, because it's simple and requires few changes.

> c) As a new relation kind, like toast tables.

> d) As an auxiliary smgr relation, attached to the heap.

> I'm leaning towards D at the moment. It requires a little bit of changes
> to the buffer manager API and elsewhere, but not too much.

I like B. The main objection I have to D is that it'll be far more
invasive to the buffer manager (and a bunch of other code) than you
admit; for starters RelFileNode won't work as-is. Another problem with
D is that you can't readily make the number of blocks per visibility
page an exact power of 2, unless you are willing to waste near half of
each visibility page. That will complicate and slow down addressing
... ok, maybe not much, but some.

What I'm envisioning is that we dedicate a quarter or half of every n'th
heap page to visibility + free space map. Probably a byte per page is
needed (this would give us 1 visibility bit and 7 bits for free space,
or other tradeoffs if needed). So there would be 2K or 4K heap pages
associated with each such special page. Or we could dial it down to
even less, say 1K heap pages per special page, which would have the
advantage of reducing update contention for those pages.

> Setting a bit is just a hint. It's ok to lose it on crash. However,
> setting a bit mustn't hit the disk too early. What might otherwise
> happen is that the change that made all tuples on a page visible, like
> committing an inserting transaction, isn't replayed after crash, but the
> set bit is already on disk. In other words, setting a bit must set the
> LSN of the visibility map page.

I don't think this really works. You are effectively assuming that no
kind of crash-induced corruption can set a bit that you didn't intend
to set. Stated in those terms it seems obviously bogus. What we will
need is that every WAL-logged update operation also include the
appropriate setting or clearing of the page's visibility bit; where
necessary, add new information to the WAL trace to allow this.

> To further reduce contention, we can have a copy of the bit in the page
> header of the heap page itself. That way we'd only need to access the
> visibility map on the first update on a page that clears a bit.

Seems exceedingly prone to corruption.

> - We don't need to clear the bit on HOT updates, because by definition
> none of the indexed columns changed.

Huh? I don't think I believe that, and I definitely don't believe your
argument for it.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2007-11-05 20:53:10 Re: should I worry?
Previous Message Heikki Linnakangas 2007-11-05 20:40:58 Re: should I worry?