Re: Visibility map thoughts

From: "Gokulakannan Somasundaram" <gokul007(at)gmail(dot)com>
To: "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Visibility map thoughts
Date: 2007-11-05 13:50:12
Message-ID: 9362e74e0711050550o464a458foce2890909fbb2845@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

This is my feedback regarding the Visibility map.

I just want to disagree on the fact that the DSM/Visibility map would not
focus on Vacuum. I think Index only scans should be thought of as a fringe
benefit of DSMs/ Visibility Maps. There are some basic assumptions in the
design, which says that
a) The inserts won't increase the size of the table. If it increases, it has
to lock one full page of Visibility map and this is not suitable for tables,
which are short-lived like partitioned tables
b) Even if the inserts don't increase the size of the table, it might make
DSM useless, if lot of inserts keep converting the all-visible ones to
uncertain ones. For that matter, even the Deletes and Updates are also going
to make lot of pages into uncertain ones.
c) Visibility map gets useless, when there is a long running batch query /
periodic background queries which run for longer times
d) More updates- more blocks of uncertainity - space usage by DSM and the
reference made to DSM is just an overhead
e) Lot of times, people may not need index-only scans. Again this gets to be
a overhead
f) If there are scheduled reboots, the DSM crashes and periodic slow-downs
in the queries during the time, the DSM gets re-constructed.

I am not opposing this, as it is a redundant feature for Thick indexes.
After all every one of us, want Postgres to be the fastest one in the world.

But because DSM has a inherent assumption that lot of tables will become
static and all the tuples would be visible to everyone. If there are such
tables, then definitely Thick index becomes a overhead in terms of space.
But DSM should not become overhead at any cost, as it is a memory resident
one at all times and also always gets into the lifecycle of a query. Only
way to achieve it is to make it a dual purpose one. It should help Vacuum,
freezing and visibility checks.

Thanks,
Gokul.

On 11/5/07, Heikki Linnakangas <heikki(at)enterprisedb(dot)com> wrote:
>
> Though 8.3 isn't out of the oven just yet, I've been thinking about the
> dead space map a lot, and decided I have to start writing down those
> thoughts.
>
> Reducing VACUUM time is important, but the real big promise is the
> ability to do index-only-scans. Because that's the main focus of this
> exercise, I'm calling it the the Visibility Map from now on, because
> it's not about tracking dead space, but tuple visibility in general.
> Don't worry, reduced VACUUM times on read-mostly tables with hot spots
> will still fall out of it.
>
>
>
> What to store in the Visibility Map?
> ------------------------------------
>
> Each potential user of the visibility map needs slightly different
> information:
>
> a) VACUUM needs to know if a page has enough dead tuples that it's
> interesting to visit.
> b) VACUUM FREEZE needs to know if a page has any non-frozen tuples that
> can be frozen.
> c) Index-only-scans needs to know if all tuples on a page are visible to
> all transactions. HOT updates are OK, though.
>
> From this point on, this document concentrates on a map suited for
> index-only-scans.
>
> The basic structure is a bitmap with one bit per heap page. A set bit
> means "all tuples on heap page are visible to all transactions". A
> cleared bit means that we don't know if that's true or not.
>
> This kind of map is useful for VACUUM as well, though VACUUM could get
> away with less strict semantics, and we might not want to visit a page
> in VACUUM if there's only very few dead tuples on a page. What this
> implementation gives us is a conservative VACUUM that will always find
> all the same dead tuples as a regular VACUUM (except for HOT updated
> tuples which can be pruned without scanning indexes). VACUUM using the
> visibility map can't update stats, so we'd still want regular vacuums
> every now and then, or rely on ANALYZE to do that.
>
> It's not useful for VACUUM FREEZE, unless we're willing to freeze much
> more aggressively, and change the meaning of a set bit to "all tuples on
> heap page are frozen".
>
> Other kinds of visibility maps could be useful in some narrow scenarios.
> For example, if we had a map of pages that have no live tuples, we could
> skip those pages in sequential scans. We could also use more than one
> bit per page to store more information, but let's keep it simple for now.
>
>
> Where to store the visibility map?
> ----------------------------------
> a) In a fixed size shared memory struct. Not acceptable.
>
> b) In the special area of every nth heap page. I played a bit with this
> back in February 2006, because it's simple and requires few changes.
>
> c) As a new relation kind, like toast tables.
>
> d) As an auxiliary smgr relation, attached to the heap.
>
> I'm leaning towards D at the moment. It requires a little bit of changes
> to the buffer manager API and elsewhere, but not too much.
>
> B is not good because we might want to have other kinds of auxiliary
> files in the future (FSM in particular). And it is a bit hacky anyway.
> Let's do something more generic.
>
> C doesn't seem like the right approach, because a visibility map doesn't
> really have much in common with "real" relations.
>
> In any case, the bitmap is divided into pages, and the pages live in
> shared_buffers, and are managed by the buffer manager like any other page.
>
> Updating the visibility map
> ---------------------------
>
> A single page in the visibility map covers a wide range of heap pages
> ((8192 - page header) * 8 = ~65000 heap pages). It can easily become
> locking bottleneck, if every update on any page in that range needs to
> lock the visibility map page.
>
> Setting a bit is just a hint. It's ok to lose it on crash. However,
> setting a bit mustn't hit the disk too early. What might otherwise
> happen is that the change that made all tuples on a page visible, like
> committing an inserting transaction, isn't replayed after crash, but the
> set bit is already on disk. In other words, setting a bit must set the
> LSN of the visibility map page.
>
> Clearing a bit is the opposite. Clearing a bit mustn't be lost, but it's
> ok if it hits the disk before the operation that cleared it. Therefore
> we don't need to set the LSN, but it must be redone at WAL replay of
> heap insert/update/delete.
>
> Because a concurrent update on the same word might be lost if we didn't
> lock the page at all, we need to lock the visibility map page to set or
> clear a bit. Since these are quick operations, especially clearing a
> bit, perhaps we could use just the buffer header spinlock.
>
> To further reduce contention, we can have a copy of the bit in the page
> header of the heap page itself. That way we'd only need to access the
> visibility map on the first update on a page that clears a bit.
>
> When to update the visibility map?
> ----------------------------------
> - Clear bit on every update/insert/delete.
> - Set bit in vacuum or prune, if all tuples on page are now visible to
> all transactions.
> - If prune sees that all tuples are LIVE or INSERT_IN_PROGRESS, we can't
> set the bit yet, but as soon as the maximum xmin on the page <
> OldestXmin, we can. Call PageSetPrunable accordingly, to prune again
> after that happens.
> - We don't need to clear the bit on HOT updates, because by definition
> none of the indexed columns changed.
>
>
>
>
> Next we need to figure out how to actually do the index-only-scans,
> using the visibility map. And how to use it for VACUUMs; Itagaki's dsm
> patch addressed that, and assuming it does what we want, that part of
> the patch is still usable.
>
> --
> Heikki Linnakangas
> EnterpriseDB http://www.enterprisedb.com
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
> choose an index scan if your joining column's datatypes do not
> match
>

--
Thanks,
Gokul.
CertoSQL Project,
Allied Solution Group.
(www.alliedgroups.com)

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Gokulakannan Somasundaram 2007-11-05 14:07:26 Re: Fwd: Clarification about HOT
Previous Message Tom Lane 2007-11-05 13:46:06 Re: Fwd: Clarification about HOT