Re: the big picture for index-only scans

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Gokulakannan Somasundaram <gokul007(at)gmail(dot)com>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: the big picture for index-only scans
Date: 2011-08-19 20:02:04
Message-ID: CA+Tgmobq9ZumpCtzP2oSFp30ggxQP0F_NzMUoLMtb49+Vr23AA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Aug 19, 2011 at 2:51 PM, Gokulakannan Somasundaram
<gokul007(at)gmail(dot)com> wrote:
> On Sat, Aug 20, 2011 at 2:25 AM, Heikki Linnakangas
> <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>>
>> On 19.08.2011 21:06, Gokulakannan Somasundaram wrote:
>>>
>>> If you are following the same design that Heikki put forward, then there
>>> is
>>> a problem with it in maintaining the bits in page and the bits in
>>> visibility
>>> map in sync, which we have already discussed.
>>
>> Are you referring to this:
>> http://archives.postgresql.org/pgsql-hackers/2010-02/msg02097.php ? I
>> believe Robert's changes to make the visibility map crash-safe covers that.
>> Clearing the bit in the visibility map now happens within the same critical
>> section as clearing the flag on the heap page and writing th WAL record.
>>
> In that case, say a 100 sessions are trying to update records which fall
> under the 8000*4 heap pages( i assume 2 bits per visibility map - 8 * 1024 *
> 4 exact) covered by one page of visibility map,

There are about 8000 visibility map bytes per page, so about 64000
bits, each covering one page. So a visibility map page covers about
512MB of heap.

> won't it make the 99
> sessions wait for that visibility map while holding the exclusive lock on
> the 99 heap pages?

Hmm, you have a point. If 100 backends simultaneously write to 100
different pages, and all of those pages are all-visible, then it's
possible that they could end up fighting over the buffer content lock
on the visibility map page. But why would you expect that to matter?
In a heavily updated table, the proportion of visibility map bits that
are set figures to be quite low, since they're only set during VACUUM.
To have 100 backends simultaneously pick different pages to write
each of which is all-visible seems really unlucky. Even if it does
happen from time to time, I suspect the effects would be largely
masked by WALInsertLock contention. The visibility map content lock
is only taken very briefly, whereas the operations protected by
WALInsertLock are much more complex.

This does, however, remind me of two other points:

1. Heikki's idea of trying to set visibility map bits more
aggressively is probably a good one, but it would be possible to
overdo it, because setting visibility map bits is not free. It has an
immediate cost - in that we have to write xlog - and a deferred cost -
in that it will impose overhead when those pages are re-dirtied. At
the moment, I think we're probably too far in the opposite direction -
i.e. we leave the visibility map bits unset for too long, leading to a
massive amount of deferred work that gets done all at once when VACUUM
finally runs. But we shouldn't overcorrect.

2. While we're tinkering with the visibility map, we should think
about whether it makes sense to carve out some more bits for such
purposes as we may in the future require. Even if we allowed each
heap page a byte in the visibility map instead of a single bit, the
visibility map would still be roughly 1000 times smaller than the
heap; and if there are any situations where the page-level locks
become choke points, this would mitigate that effect. There might
also be some advantage in that bytes can be atomically set, while bits
can't, although I can't immediately think how we'd leverage that.
Alternatively, we could widen the field to something less than a full
byte, like 2 or 4 bits, if that seems better.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2011-08-19 20:09:35 Re: FATAL: ReleaseSavepoint: unexpected state STARTED
Previous Message Tom Lane 2011-08-19 19:31:40 Re: FATAL: ReleaseSavepoint: unexpected state STARTED