Quick Links

Re: Visibility map thoughts

Lists:	pgsql-hackers

From:	Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
To:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Visibility map thoughts
Date:	2007-11-05 09:52:54
Message-ID:	472EE7F6.4070005@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Though 8.3 isn't out of the oven just yet, I've been thinking about the
dead space map a lot, and decided I have to start writing down those
thoughts.

Reducing VACUUM time is important, but the real big promise is the
ability to do index-only-scans. Because that's the main focus of this
exercise, I'm calling it the the Visibility Map from now on, because
it's not about tracking dead space, but tuple visibility in general.
Don't worry, reduced VACUUM times on read-mostly tables with hot spots
will still fall out of it.

What to store in the Visibility Map?
------------------------------------

Each potential user of the visibility map needs slightly different
information:

a) VACUUM needs to know if a page has enough dead tuples that it's
interesting to visit.
b) VACUUM FREEZE needs to know if a page has any non-frozen tuples that
can be frozen.
c) Index-only-scans needs to know if all tuples on a page are visible to
all transactions. HOT updates are OK, though.

From this point on, this document concentrates on a map suited for
index-only-scans.

The basic structure is a bitmap with one bit per heap page. A set bit
means "all tuples on heap page are visible to all transactions". A
cleared bit means that we don't know if that's true or not.

This kind of map is useful for VACUUM as well, though VACUUM could get
away with less strict semantics, and we might not want to visit a page
in VACUUM if there's only very few dead tuples on a page. What this
implementation gives us is a conservative VACUUM that will always find
all the same dead tuples as a regular VACUUM (except for HOT updated
tuples which can be pruned without scanning indexes). VACUUM using the
visibility map can't update stats, so we'd still want regular vacuums
every now and then, or rely on ANALYZE to do that.

It's not useful for VACUUM FREEZE, unless we're willing to freeze much
more aggressively, and change the meaning of a set bit to "all tuples on
heap page are frozen".

Other kinds of visibility maps could be useful in some narrow scenarios.
For example, if we had a map of pages that have no live tuples, we could
skip those pages in sequential scans. We could also use more than one
bit per page to store more information, but let's keep it simple for now.

Where to store the visibility map?
----------------------------------
a) In a fixed size shared memory struct. Not acceptable.

b) In the special area of every nth heap page. I played a bit with this
back in February 2006, because it's simple and requires few changes.

c) As a new relation kind, like toast tables.

d) As an auxiliary smgr relation, attached to the heap.

I'm leaning towards D at the moment. It requires a little bit of changes
to the buffer manager API and elsewhere, but not too much.

B is not good because we might want to have other kinds of auxiliary
files in the future (FSM in particular). And it is a bit hacky anyway.
Let's do something more generic.

C doesn't seem like the right approach, because a visibility map doesn't
really have much in common with "real" relations.

In any case, the bitmap is divided into pages, and the pages live in
shared_buffers, and are managed by the buffer manager like any other page.

Updating the visibility map
---------------------------

A single page in the visibility map covers a wide range of heap pages
((8192 - page header) * 8 = ~65000 heap pages). It can easily become
locking bottleneck, if every update on any page in that range needs to
lock the visibility map page.

Setting a bit is just a hint. It's ok to lose it on crash. However,
setting a bit mustn't hit the disk too early. What might otherwise
happen is that the change that made all tuples on a page visible, like
committing an inserting transaction, isn't replayed after crash, but the
set bit is already on disk. In other words, setting a bit must set the
LSN of the visibility map page.

Clearing a bit is the opposite. Clearing a bit mustn't be lost, but it's
ok if it hits the disk before the operation that cleared it. Therefore
we don't need to set the LSN, but it must be redone at WAL replay of
heap insert/update/delete.

Because a concurrent update on the same word might be lost if we didn't
lock the page at all, we need to lock the visibility map page to set or
clear a bit. Since these are quick operations, especially clearing a
bit, perhaps we could use just the buffer header spinlock.

To further reduce contention, we can have a copy of the bit in the page
header of the heap page itself. That way we'd only need to access the
visibility map on the first update on a page that clears a bit.

When to update the visibility map?
----------------------------------
- Clear bit on every update/insert/delete.
- Set bit in vacuum or prune, if all tuples on page are now visible to
all transactions.
- If prune sees that all tuples are LIVE or INSERT_IN_PROGRESS, we can't
set the bit yet, but as soon as the maximum xmin on the page <
OldestXmin, we can. Call PageSetPrunable accordingly, to prune again
after that happens.
- We don't need to clear the bit on HOT updates, because by definition
none of the indexed columns changed.

Next we need to figure out how to actually do the index-only-scans,
using the visibility map. And how to use it for VACUUMs; Itagaki's dsm
patch addressed that, and assuming it does what we want, that part of
the patch is still usable.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	"Gokulakannan Somasundaram" <gokul007(at)gmail(dot)com>
To:	"Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Visibility map thoughts
Date:	2007-11-05 13:50:12
Message-ID:	9362e74e0711050550o464a458foce2890909fbb2845@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

This is my feedback regarding the Visibility map.

I just want to disagree on the fact that the DSM/Visibility map would not
focus on Vacuum. I think Index only scans should be thought of as a fringe
benefit of DSMs/ Visibility Maps. There are some basic assumptions in the
design, which says that
a) The inserts won't increase the size of the table. If it increases, it has
to lock one full page of Visibility map and this is not suitable for tables,
which are short-lived like partitioned tables
b) Even if the inserts don't increase the size of the table, it might make
DSM useless, if lot of inserts keep converting the all-visible ones to
uncertain ones. For that matter, even the Deletes and Updates are also going
to make lot of pages into uncertain ones.
c) Visibility map gets useless, when there is a long running batch query /
periodic background queries which run for longer times
d) More updates- more blocks of uncertainity - space usage by DSM and the
reference made to DSM is just an overhead
e) Lot of times, people may not need index-only scans. Again this gets to be
a overhead
f) If there are scheduled reboots, the DSM crashes and periodic slow-downs
in the queries during the time, the DSM gets re-constructed.

I am not opposing this, as it is a redundant feature for Thick indexes.
After all every one of us, want Postgres to be the fastest one in the world.

But because DSM has a inherent assumption that lot of tables will become
static and all the tuples would be visible to everyone. If there are such
tables, then definitely Thick index becomes a overhead in terms of space.
But DSM should not become overhead at any cost, as it is a memory resident
one at all times and also always gets into the lifecycle of a query. Only
way to achieve it is to make it a dual purpose one. It should help Vacuum,
freezing and visibility checks.

Thanks,
Gokul.

On 11/5/07, Heikki Linnakangas <heikki(at)enterprisedb(dot)com> wrote:
>
> Though 8.3 isn't out of the oven just yet, I've been thinking about the
> dead space map a lot, and decided I have to start writing down those
> thoughts.
>
> Reducing VACUUM time is important, but the real big promise is the
> ability to do index-only-scans. Because that's the main focus of this
> exercise, I'm calling it the the Visibility Map from now on, because
> it's not about tracking dead space, but tuple visibility in general.
> Don't worry, reduced VACUUM times on read-mostly tables with hot spots
> will still fall out of it.
>
>
>
> What to store in the Visibility Map?
> ------------------------------------
>
> Each potential user of the visibility map needs slightly different
> information:
>
> a) VACUUM needs to know if a page has enough dead tuples that it's
> interesting to visit.
> b) VACUUM FREEZE needs to know if a page has any non-frozen tuples that
> can be frozen.
> c) Index-only-scans needs to know if all tuples on a page are visible to
> all transactions. HOT updates are OK, though.
>
> From this point on, this document concentrates on a map suited for
> index-only-scans.
>
> The basic structure is a bitmap with one bit per heap page. A set bit
> means "all tuples on heap page are visible to all transactions". A
> cleared bit means that we don't know if that's true or not.
>
> This kind of map is useful for VACUUM as well, though VACUUM could get
> away with less strict semantics, and we might not want to visit a page
> in VACUUM if there's only very few dead tuples on a page. What this
> implementation gives us is a conservative VACUUM that will always find
> all the same dead tuples as a regular VACUUM (except for HOT updated
> tuples which can be pruned without scanning indexes). VACUUM using the
> visibility map can't update stats, so we'd still want regular vacuums
> every now and then, or rely on ANALYZE to do that.
>
> It's not useful for VACUUM FREEZE, unless we're willing to freeze much
> more aggressively, and change the meaning of a set bit to "all tuples on
> heap page are frozen".
>
> Other kinds of visibility maps could be useful in some narrow scenarios.
> For example, if we had a map of pages that have no live tuples, we could
> skip those pages in sequential scans. We could also use more than one
> bit per page to store more information, but let's keep it simple for now.
>
>
> Where to store the visibility map?
> ----------------------------------
> a) In a fixed size shared memory struct. Not acceptable.
>
> b) In the special area of every nth heap page. I played a bit with this
> back in February 2006, because it's simple and requires few changes.
>
> c) As a new relation kind, like toast tables.
>
> d) As an auxiliary smgr relation, attached to the heap.
>
> I'm leaning towards D at the moment. It requires a little bit of changes
> to the buffer manager API and elsewhere, but not too much.
>
> B is not good because we might want to have other kinds of auxiliary
> files in the future (FSM in particular). And it is a bit hacky anyway.
> Let's do something more generic.
>
> C doesn't seem like the right approach, because a visibility map doesn't
> really have much in common with "real" relations.
>
> In any case, the bitmap is divided into pages, and the pages live in
> shared_buffers, and are managed by the buffer manager like any other page.
>
> Updating the visibility map
> ---------------------------
>
> A single page in the visibility map covers a wide range of heap pages
> ((8192 - page header) * 8 = ~65000 heap pages). It can easily become
> locking bottleneck, if every update on any page in that range needs to
> lock the visibility map page.
>
> Setting a bit is just a hint. It's ok to lose it on crash. However,
> setting a bit mustn't hit the disk too early. What might otherwise
> happen is that the change that made all tuples on a page visible, like
> committing an inserting transaction, isn't replayed after crash, but the
> set bit is already on disk. In other words, setting a bit must set the
> LSN of the visibility map page.
>
> Clearing a bit is the opposite. Clearing a bit mustn't be lost, but it's
> ok if it hits the disk before the operation that cleared it. Therefore
> we don't need to set the LSN, but it must be redone at WAL replay of
> heap insert/update/delete.
>
> Because a concurrent update on the same word might be lost if we didn't
> lock the page at all, we need to lock the visibility map page to set or
> clear a bit. Since these are quick operations, especially clearing a
> bit, perhaps we could use just the buffer header spinlock.
>
> To further reduce contention, we can have a copy of the bit in the page
> header of the heap page itself. That way we'd only need to access the
> visibility map on the first update on a page that clears a bit.
>
> When to update the visibility map?
> ----------------------------------
> - Clear bit on every update/insert/delete.
> - Set bit in vacuum or prune, if all tuples on page are now visible to
> all transactions.
> - If prune sees that all tuples are LIVE or INSERT_IN_PROGRESS, we can't
> set the bit yet, but as soon as the maximum xmin on the page <
> OldestXmin, we can. Call PageSetPrunable accordingly, to prune again
> after that happens.
> - We don't need to clear the bit on HOT updates, because by definition
> none of the indexed columns changed.
>
>
>
>
> Next we need to figure out how to actually do the index-only-scans,
> using the visibility map. And how to use it for VACUUMs; Itagaki's dsm
> patch addressed that, and assuming it does what we want, that part of
> the patch is still usable.
>
> --
> Heikki Linnakangas
> EnterpriseDB http://www.enterprisedb.com
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
> choose an index scan if your joining column's datatypes do not
> match
>

--
Thanks,
Gokul.
CertoSQL Project,
Allied Solution Group.
(www.alliedgroups.com)

From:	"Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>
To:	"Gokulakannan Somasundaram" <gokul007(at)gmail(dot)com>
Cc:	"PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Visibility map thoughts
Date:	2007-11-05 14:38:17
Message-ID:	472F2AD9.6070200@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Gokulakannan Somasundaram wrote:
> a) The inserts won't increase the size of the table. If it increases, it has
> to lock one full page of Visibility map and this is not suitable for tables,
> which are short-lived like partitioned tables

The overhead of locking a page is very small.

Actually, extending a heap only needs to touch the visibility map when
we need a new visibility map page, if we initialize all bits to zero.
Like we do already anyway.

> b) Even if the inserts don't increase the size of the table, it might make
> DSM useless, if lot of inserts keep converting the all-visible ones to
> uncertain ones. For that matter, even the Deletes and Updates are also going
> to make lot of pages into uncertain ones.

Sure. If you have a lot of (random) inserts/updates/deletes, it becomes
much less useful.

A small mitigating factor is that an insert/update/delete will fetch the
heap page to memory anyway. Therefore having to access it just after the
update is cheap. This helps inserts in particular, because after the
inserting transaction is < OldestXmin, we can set the bit again.

> c) Visibility map gets useless, when there is a long running batch query /
> periodic background queries which run for longer times

Yeah, long running transactions are a bitch in many ways.

> d) More updates- more blocks of uncertainity - space usage by DSM and the
> reference made to DSM is just an overhead
> e) Lot of times, people may not need index-only scans. Again this gets to be
> a overhead

The beauty of this approach is that the overhead is very small.

> f) If there are scheduled reboots, the DSM crashes and periodic slow-downs
> in the queries during the time, the DSM gets re-constructed.

That's rubbish.

> I am not opposing this, as it is a redundant feature for Thick indexes.
> After all every one of us, want Postgres to be the fastest one in the world.

And also the easiest to maintain, most space-efficient, most reliable
and so forth...

> But because DSM has a inherent assumption that lot of tables will become
> static and all the tuples would be visible to everyone. If there are such
> tables, then definitely Thick index becomes a overhead in terms of space.
> But DSM should not become overhead at any cost, as it is a memory resident
> one at all times and also always gets into the lifecycle of a query. Only
> way to achieve it is to make it a dual purpose one. It should help Vacuum,
> freezing and visibility checks.

I don't understand this paragraph.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	"Gokulakannan Somasundaram" <gokul007(at)gmail(dot)com>
To:	"Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Visibility map thoughts
Date:	2007-11-05 16:13:07
Message-ID:	9362e74e0711050813o7eb2a353q7a7ea6cc9b86b608@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 11/5/07, Heikki Linnakangas <heikki(at)enterprisedb(dot)com> wrote:
>
> Gokulakannan Somasundaram wrote:
> > a) The inserts won't increase the size of the table. If it increases, it
> has
> > to lock one full page of Visibility map and this is not suitable for
> tables,
> > which are short-lived like partitioned tables
>
> The overhead of locking a page is very small.
>
> Actually, extending a heap only needs to touch the visibility map when
> we need a new visibility map page, if we initialize all bits to zero.
> Like we do already anyway.

As you have pointed out 1 page in the visibility map points to 65535 pages
in the heap. So even if we are locking the visibility map for a small time,
it will affect all those scans, which will need to access these pages.

> b) Even if the inserts don't increase the size of the table, it might make
> > DSM useless, if lot of inserts keep converting the all-visible ones to
> > uncertain ones. For that matter, even the Deletes and Updates are also
> going
> > to make lot of pages into uncertain ones.
>
> Sure. If you have a lot of (random) inserts/updates/deletes, it becomes
> much less useful.
>
> A small mitigating factor is that an insert/update/delete will fetch the
> heap page to memory anyway. Therefore having to access it just after the
> update is cheap. This helps inserts in particular, because after the
> inserting transaction is < OldestXmin, we can set the bit again.

But we can set this bit only after a Vacuum process. The tuples might not be
there, till the Vacuum process pitches in and marks this.

> c) Visibility map gets useless, when there is a long running batch query /
> > periodic background queries which run for longer times
>
> Yeah, long running transactions are a bitch in many ways.

But these are determined by business conditions and we need to provide
efficient solutions to deal with it.

> d) More updates- more blocks of uncertainity - space usage by DSM and the
> > reference made to DSM is just an overhead
> > e) Lot of times, people may not need index-only scans. Again this gets
> to be
> > a overhead
>
> The beauty of this approach is that the overhead is very small.
>
> > f) If there are scheduled reboots, the DSM crashes and periodic
> slow-downs
> > in the queries during the time, the DSM gets re-constructed.
>
> That's rubbish.

I think DSM is not WAL-Logged. So when it gets reconstructed every time
for a big table, isn't it a overhead?

> I am not opposing this, as it is a redundant feature for Thick indexes.
> > After all every one of us, want Postgres to be the fastest one in the
> world.
>
> And also the easiest to maintain, most space-efficient, most reliable
> and so forth...

Provided it supports Vacuuming & Freezing..

> But because DSM has a inherent assumption that lot of tables will become
> > static and all the tuples would be visible to everyone. If there are
> such
> > tables, then definitely Thick index becomes a overhead in terms of
> space.
> > But DSM should not become overhead at any cost, as it is a memory
> resident
> > one at all times and also always gets into the lifecycle of a query.
> Only
> > way to achieve it is to make it a dual purpose one. It should help
> Vacuum,
> > freezing and visibility checks.
>
> I don't understand this paragraph.

Because updates, inserts and deletes reduce the utility of Visibility map,
it seems to be designed for more static tables, which don't experience much
of these operations.

--
Thanks,
Gokul.
CertoSQL Project,
Allied Solution Group.
(www.alliedgroups.com)

From:	Simon Riggs <simon(at)2ndquadrant(dot)com>
To:	Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Visibility map thoughts
Date:	2007-11-05 18:25:00
Message-ID:	1194287100.4315.120.camel@ebony.site
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, 2007-11-05 at 09:52 +0000, Heikki Linnakangas wrote:

> Reducing VACUUM time is important, but the real big promise is the
> ability to do index-only-scans.

Have you thought about how index-only scans work work? Seems like we
need a rough plan for that before we go and build the visibility map,
your other notes for which sound very good.

I'm thinking that looking in the visibility map will have a cost also,
so how will we know whether to bother looking? I'm assuming that we
won't want to do that lookup in all cases, since it could easily just
add pathlength and contention in the normal OLTP case. Presumably there
would be a test in the planner to see if an index-only plan was
possible?

I'm racking my brain trying to think of a query that will benefit from
index-only scans without specifically creating covered indexes. Apart
from count(*) queries and RI lookups. I can't see RI lookups being much
cheaper with this technique, do you see something there?

ISTM that it would make most sense to do it during BitmapIndex scans.
Specifically, as an intermediate step between BitmapIndex scan and
BitmapHeap scan. That would be fairly likely to be a win in most cases
because the bitmaps should compare easily and the amortised cost per row
is likely to be very small, even if no heap lookups are avoided.

Anyway, just a few initial thoughts.

--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com

From:	Jeff Davis <pgsql(at)j-davis(dot)com>
To:	Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Visibility map thoughts
Date:	2007-11-05 19:46:11
Message-ID:	1194291971.22428.123.camel@dogma.ljc.laika.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, 2007-11-05 at 09:52 +0000, Heikki Linnakangas wrote:
> Reducing VACUUM time is important, but the real big promise is the
> ability to do index-only-scans. Because that's the main focus of this
> exercise, I'm calling it the the Visibility Map from now on, because
> it's not about tracking dead space, but tuple visibility in general.
> Don't worry, reduced VACUUM times on read-mostly tables with hot spots
> will still fall out of it.

I like "Visibility map" because it's a positive name, and that prevents
confusion over double-negatives (for the same reason it's called
"synchronous_commit" even though the new feature is the ability to be
asynchronous). With Dead Space Map, I wouldn't immediately know whether
a 1 means "always visible" or "might be invisible".

However, "DSM" is a much less overloaded acronym than "VM" ;)

Regarding the focus, it will depend a lot on the user. Some people will
care more about VACUUM and some will care more about index-only scans.

> It's not useful for VACUUM FREEZE, unless we're willing to freeze much
> more aggressively, and change the meaning of a set bit to "all tuples on
> heap page are frozen".
>

This means that a regular VACUUM will no longer be enough to ensure
safety from transaction id wraparound.

I don't think this will be hard to fix, but it's an extra detail that
would need to be decided. The most apparent options appear to be:

1) Do as you say above. What are some of the cost trade-offs here? It
seems that frequent VACUUM FREEZE runs would keep the visibility map
mostly full, but will also cause more writing. I suppose the worst case
is that every tuple write needs results in two data page writes, one
normal write and another to freeze it later, which sounds bad. Maybe
there's a way to try to freeze the tuples on a page before it's written
out?

The idea of "frozen" and "always visible" seem very close in concept to
me.

2) Change to autovacuum to FREEZE on the forced autovacuum to prevent
wraparound.

3) Use multiple bits per visibility map

4) Have multiple types of visibility maps

The more I think about the visibility map, the more I think it will be a
huge win for PostgreSQL. It's especially nice that it works so well with
HOT. Thanks for working on it!

Regards,
Jeff Davis

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Visibility map thoughts
Date:	2007-11-05 20:41:38
Message-ID:	27582.1194295298@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Heikki Linnakangas <heikki(at)enterprisedb(dot)com> writes:
> Though 8.3 isn't out of the oven just yet, I've been thinking about the
> dead space map a lot, and decided I have to start writing down those
> thoughts.

I think we should do this at the same time as pushing the FSM out of
shared memory, and design a data structure that serves both needs.

> Where to store the visibility map?
> ----------------------------------

> a) In a fixed size shared memory struct. Not acceptable.

> b) In the special area of every nth heap page. I played a bit with this
> back in February 2006, because it's simple and requires few changes.

> c) As a new relation kind, like toast tables.

> d) As an auxiliary smgr relation, attached to the heap.

> I'm leaning towards D at the moment. It requires a little bit of changes
> to the buffer manager API and elsewhere, but not too much.

I like B. The main objection I have to D is that it'll be far more
invasive to the buffer manager (and a bunch of other code) than you
admit; for starters RelFileNode won't work as-is. Another problem with
D is that you can't readily make the number of blocks per visibility
page an exact power of 2, unless you are willing to waste near half of
each visibility page. That will complicate and slow down addressing
... ok, maybe not much, but some.

What I'm envisioning is that we dedicate a quarter or half of every n'th
heap page to visibility + free space map. Probably a byte per page is
needed (this would give us 1 visibility bit and 7 bits for free space,
or other tradeoffs if needed). So there would be 2K or 4K heap pages
associated with each such special page. Or we could dial it down to
even less, say 1K heap pages per special page, which would have the
advantage of reducing update contention for those pages.

> Setting a bit is just a hint. It's ok to lose it on crash. However,
> setting a bit mustn't hit the disk too early. What might otherwise
> happen is that the change that made all tuples on a page visible, like
> committing an inserting transaction, isn't replayed after crash, but the
> set bit is already on disk. In other words, setting a bit must set the
> LSN of the visibility map page.

I don't think this really works. You are effectively assuming that no
kind of crash-induced corruption can set a bit that you didn't intend
to set. Stated in those terms it seems obviously bogus. What we will
need is that every WAL-logged update operation also include the
appropriate setting or clearing of the page's visibility bit; where
necessary, add new information to the WAL trace to allow this.

> To further reduce contention, we can have a copy of the bit in the page
> header of the heap page itself. That way we'd only need to access the
> visibility map on the first update on a page that clears a bit.

Seems exceedingly prone to corruption.

> - We don't need to clear the bit on HOT updates, because by definition
> none of the indexed columns changed.

Huh? I don't think I believe that, and I definitely don't believe your
argument for it.

regards, tom lane

From:	Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Visibility map thoughts
Date:	2007-11-05 22:27:26
Message-ID:	472F98CE.1020600@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Tom Lane wrote:
> Heikki Linnakangas <heikki(at)enterprisedb(dot)com> writes:
>> Though 8.3 isn't out of the oven just yet, I've been thinking about the
>> dead space map a lot, and decided I have to start writing down those
>> thoughts.
>
> I think we should do this at the same time as pushing the FSM out of
> shared memory, and design a data structure that serves both needs.

I'm afraid the data structure required for FSM is very different.

For the visibility map, we need something that can be quickly addressed
by heap block number. For FSM, we need something that's addressable by
tuple size.

Cache locality of having the FSM and the visibility map in the same
structure is also not very good. Visibility map is accessed by reads,
and updates/inserts/deletes on pages that previously had a set bit in
the map, while the FSM is only needed for cold updates and deletes.

>> Where to store the visibility map?
>> ----------------------------------
>
>> a) In a fixed size shared memory struct. Not acceptable.
>
>> b) In the special area of every nth heap page. I played a bit with this
>> back in February 2006, because it's simple and requires few changes.
>
>> c) As a new relation kind, like toast tables.
>
>> d) As an auxiliary smgr relation, attached to the heap.
>
>> I'm leaning towards D at the moment. It requires a little bit of changes
>> to the buffer manager API and elsewhere, but not too much.
>
> I like B. The main objection I have to D is that it'll be far more
> invasive to the buffer manager (and a bunch of other code) than you
> admit; for starters RelFileNode won't work as-is.

One problem is that you have to atomically update the visibility map
when you update the heap. That means that you have to lock the
visibility map page and the heap page at the same time. If the
visibility map is in the heap, you need to take care that you don't
deadlock.

We can't directly use B for the FSM, because we also need a FSM for
indexes, so we'll need something else for indexes anyway.

B is also problematic if we ever get to do upgrades without
dump/restore, because it requires changes to the format of the heap itself.

> What I'm envisioning is that we dedicate a quarter or half of every n'th
> heap page to visibility + free space map. Probably a byte per page is
> needed (this would give us 1 visibility bit and 7 bits for free space,
> or other tradeoffs if needed). So there would be 2K or 4K heap pages
> associated with each such special page. Or we could dial it down to
> even less, say 1K heap pages per special page, which would have the
> advantage of reducing update contention for those pages.

How would you search that for a page with X bytes of free space?

> I don't think this really works. You are effectively assuming that no
> kind of crash-induced corruption can set a bit that you didn't intend
> to set.

Don't we already assume that for hint-bit updates?

> Stated in those terms it seems obviously bogus. What we will
> need is that every WAL-logged update operation also include the
> appropriate setting or clearing of the page's visibility bit; where
> necessary, add new information to the WAL trace to allow this.

I think we already emit a WAL record whenever we would set the bit in
the visibility map, so we can certainly do that. It was more an
interesting observation than anything else.

>> To further reduce contention, we can have a copy of the bit in the page
>> header of the heap page itself. That way we'd only need to access the
>> visibility map on the first update on a page that clears a bit.
>
> Seems exceedingly prone to corruption.

It does?

It seems fairly simple to keep it up-to-date. Whenever we update the
visibility map, we're holding the heap page locked as well.

If you were thinking of hardware failure, I don't see how that bit would
be more susceptible to that, and the potential damage isn't any bigger
than from any other random bit-flip either.

We can double-check and correct both the bit in the page header and in
the visibility map whenever we perform a regular vacuum (or prune), if
they did get corrupt for some reason.

>> - We don't need to clear the bit on HOT updates, because by definition
>> none of the indexed columns changed.
>
> Huh? I don't think I believe that, and I definitely don't believe your
> argument for it.

HOT-updating a row doesn't change what an index-only scan would see. An
index-only scan only needs columns that are in the index, and since it's
a HOT update, none of those columns have changed, so it doesn't matter
which tuple we're returning them from.

Pages that have HOT updated tuples, but haven't otherwise been modified
since last vacuum are also not interesting for VACUUM, because a prune
will do the job of getting rid of dead HOT-updated tuples just as well.

Am I missing something?

Another thought: Should we support having a FSM and visibility map on
temp tables? Doesn't seem very useful, but why not, I guess.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
To:	Jeff Davis <pgsql(at)j-davis(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Visibility map thoughts
Date:	2007-11-05 22:45:33
Message-ID:	472F9D0D.4050106@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Jeff Davis wrote:
> On Mon, 2007-11-05 at 09:52 +0000, Heikki Linnakangas wrote:
>> It's not useful for VACUUM FREEZE, unless we're willing to freeze much
>> more aggressively, and change the meaning of a set bit to "all tuples on
>> heap page are frozen".
>
> This means that a regular VACUUM will no longer be enough to ensure
> safety from transaction id wraparound.

Good point. So we'd still need regular VACUUMs every now and then.

(Gosh, we really need a name for the sort of vacuum. I was about to say
"we'd still need regular regular VACUUMs" :-))

> I don't think this will be hard to fix, but it's an extra detail that
> would need to be decided. The most apparent options appear to be:
>
> 1) Do as you say above. What are some of the cost trade-offs here? It
> seems that frequent VACUUM FREEZE runs would keep the visibility map
> mostly full, but will also cause more writing. I suppose the worst case
> is that every tuple write needs results in two data page writes, one
> normal write and another to freeze it later, which sounds bad. Maybe
> there's a way to try to freeze the tuples on a page before it's written
> out?

It would also create more WAL traffic, because freezing tuples needs to
be WAL-logged.

> 2) Change to autovacuum to FREEZE on the forced autovacuum to prevent
> wraparound.

Doesn't necessarily need to be a VACUUM FREEZE. Just a "regular VACUUM"
instead of using the visibility map.

> 3) Use multiple bits per visibility map

That would work.

> 4) Have multiple types of visibility maps

I'd rather not do that. It starts to get more complex and more expensive
to update.

5) Have a more fine-grain equivalent of relfrozenxid. For example one
frozenxid per visibility map page, so that whenever you update the
visibility map, you also update the frozenxid. To advance the
relfrozenxid in pg_class, you scan the visibility map and set
relfrozenxid to the smallest frozenxid. Unlike relfrozenxid, it could be
set to FrozenXid if the group of pages are totally frozen.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	Gregory Stark <stark(at)enterprisedb(dot)com>
To:	"Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>
Cc:	"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Visibility map thoughts
Date:	2007-11-06 06:50:48
Message-ID:	877ikvx3xz.fsf@oxford.xeocode.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

"Heikki Linnakangas" <heikki(at)enterprisedb(dot)com> writes:

> One problem is that you have to atomically update the visibility map when
> you update the heap. That means that you have to lock the visibility map
> page and the heap page at the same time. If the visibility map is in the
> heap, you need to take care that you don't deadlock.

Well that's still a problem if it's in another filenode.

On the other hand if you allocate a whole byte to every page you could set it
atomically and not have to lock the page. Effectively having the lock on the
original page protect the info byte. Whereas setting a single bit requires
protecting against someone setting one of the other bits corresponding to
another page entirely.

>>> - We don't need to clear the bit on HOT updates, because by definition none
>>> of the indexed columns changed.
>>
>> Huh? I don't think I believe that, and I definitely don't believe your
>> argument for it.
>
> HOT-updating a row doesn't change what an index-only scan would see. An
> index-only scan only needs columns that are in the index, and since it's a HOT
> update, none of those columns have changed, so it doesn't matter which tuple
> we're returning them from.
>
> Pages that have HOT updated tuples, but haven't otherwise been modified since
> last vacuum are also not interesting for VACUUM, because a prune will do the
> job of getting rid of dead HOT-updated tuples just as well.
>
> Am I missing something?

I think the point is that for index-only scans you really just want to know
the visibility of the whole chain. The whole chain is either known-visible or
not. A HOT update doesn't change that, it just changes which version along the
chain is visible and the values of the non-indexed columns in that version.

Some thought has to be given to the transition stages when you create or drop
an index but I don't see a problem there. If you have a "broken" hot chain it
doesn't change the visibility rules for anyone who does use an old index (and
nobody who could see the broken end of the chain should be using the new index
anyways).

The problem with this is that it's different from what a DSM would need. We
could skip vacuuming such HOT updated dead tuples, assuming a page prune will
get it the next time we access the page I suppose. Or we could use a separate
bit for more aggressive vacuum information.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's Slony Replication support!

From:	"Gokulakannan Somasundaram" <gokul007(at)gmail(dot)com>
To:	"Gregory Stark" <stark(at)enterprisedb(dot)com>
Cc:	"Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Visibility map thoughts
Date:	2007-11-06 08:09:44
Message-ID:	9362e74e0711060009q7919caf1s42f4ab1bc80cfc09@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Just one more thought on the same. This implementation also assumes
that there won't be any update chains across pages, which is the
current stage.

Heikki,
Is it planned as a optional feature? (I support the optional
feature model)

Thanks,
Gokul.

On Nov 6, 2007 12:20 PM, Gregory Stark <stark(at)enterprisedb(dot)com> wrote:
> "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com> writes:
>
> > One problem is that you have to atomically update the visibility map when
> > you update the heap. That means that you have to lock the visibility map
> > page and the heap page at the same time. If the visibility map is in the
> > heap, you need to take care that you don't deadlock.
>
> Well that's still a problem if it's in another filenode.
>
> On the other hand if you allocate a whole byte to every page you could set it
> atomically and not have to lock the page. Effectively having the lock on the
> original page protect the info byte. Whereas setting a single bit requires
> protecting against someone setting one of the other bits corresponding to
> another page entirely.
>
> >>> - We don't need to clear the bit on HOT updates, because by definition none
> >>> of the indexed columns changed.
> >>
> >> Huh? I don't think I believe that, and I definitely don't believe your
> >> argument for it.
> >
> > HOT-updating a row doesn't change what an index-only scan would see. An
> > index-only scan only needs columns that are in the index, and since it's a HOT
> > update, none of those columns have changed, so it doesn't matter which tuple
> > we're returning them from.
> >
> > Pages that have HOT updated tuples, but haven't otherwise been modified since
> > last vacuum are also not interesting for VACUUM, because a prune will do the
> > job of getting rid of dead HOT-updated tuples just as well.
> >
> > Am I missing something?
>
> I think the point is that for index-only scans you really just want to know
> the visibility of the whole chain. The whole chain is either known-visible or
> not. A HOT update doesn't change that, it just changes which version along the
> chain is visible and the values of the non-indexed columns in that version.
>
> Some thought has to be given to the transition stages when you create or drop
> an index but I don't see a problem there. If you have a "broken" hot chain it
> doesn't change the visibility rules for anyone who does use an old index (and
> nobody who could see the broken end of the chain should be using the new index
> anyways).
>
> The problem with this is that it's different from what a DSM would need. We
> could skip vacuuming such HOT updated dead tuples, assuming a page prune will
> get it the next time we access the page I suppose. Or we could use a separate
> bit for more aggressive vacuum information.
>
>
> --
> Gregory Stark
> EnterpriseDB http://www.enterprisedb.com
> Ask me about EnterpriseDB's Slony Replication support!
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/docs/faq
>

--
Thanks,
Gokul.
CertoSQL Project,
Allied Solution Group.
(www.alliedgroups.com)

From:	Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
To:	Gokulakannan Somasundaram <gokul007(at)gmail(dot)com>
Cc:	Gregory Stark <stark(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Visibility map thoughts
Date:	2007-11-06 11:03:11
Message-ID:	473049EF.3090000@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Gokulakannan Somasundaram wrote:
> Just one more thought on the same. This implementation also assumes
> that there won't be any update chains across pages, which is the
> current stage.

No, it doesn't assume that.

> Heikki,
> Is it planned as a optional feature? (I support the optional
> feature model)

I'm not planning it. Clearly you are :-).

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	Mark Mielke <mark(at)mark(dot)mielke(dot)cc>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	Heikki Linnakangas <heikki(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Visibility map thoughts
Date:	2007-11-06 13:01:16
Message-ID:	4730659C.2060607@mark.mielke.cc
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Simon Riggs wrote:
> On Mon, 2007-11-05 at 09:52 +0000, Heikki Linnakangas wrote:
> I'm racking my brain trying to think of a query that will benefit from
> index-only scans without specifically creating covered indexes. Apart
> from count(*) queries and RI lookups. I can't see RI lookups being much
> cheaper with this technique, do you see something there
I'm not sure what RI lookup is. Sorry. :-)

My list would be:
- EXISTS / NOT EXISTS
- COUNT(*)
- Tables that are heavily updated - any case where the index entry often
maps to a non-visible tuple.

Beyond that, yeah, I cannot think of other benefits.

Cheers,
mark

--
Mark Mielke <mark(at)mielke(dot)cc>

From:	Andrew Dunstan <andrew(at)dunslane(dot)net>
To:	Mark Mielke <mark(at)mark(dot)mielke(dot)cc>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, Heikki Linnakangas <heikki(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Visibility map thoughts
Date:	2007-11-06 13:14:49
Message-ID:	473068C9.8080401@dunslane.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Mark Mielke wrote:
> I'm not sure what RI lookup is. Sorry. :-)
>
>

RI = Referential Integrity. i.e. Foreign Keys.

cheers

andrew

From:	"Marko Kreen" <markokr(at)gmail(dot)com>
To:	"Mark Mielke" <mark(at)mark(dot)mielke(dot)cc>
Cc:	"Simon Riggs" <simon(at)2ndquadrant(dot)com>, "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Visibility map thoughts
Date:	2007-11-06 13:17:05
Message-ID:	e51f66da0711060517p229164a7uf59140d1d5d90645@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 11/6/07, Mark Mielke <mark(at)mark(dot)mielke(dot)cc> wrote:
> Simon Riggs wrote:
> > On Mon, 2007-11-05 at 09:52 +0000, Heikki Linnakangas wrote:
> > I'm racking my brain trying to think of a query that will benefit from
> > index-only scans without specifically creating covered indexes. Apart
> > from count(*) queries and RI lookups. I can't see RI lookups being much
> > cheaper with this technique, do you see something there
> I'm not sure what RI lookup is. Sorry. :-)
>
> My list would be:
> - EXISTS / NOT EXISTS
> - COUNT(*)
> - Tables that are heavily updated - any case where the index entry often
> maps to a non-visible tuple.
>
> Beyond that, yeah, I cannot think of other benefits.

Joins?

--
marko

From:	"Marko Kreen" <markokr(at)gmail(dot)com>
To:	"Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>
Cc:	"Jeff Davis" <pgsql(at)j-davis(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Visibility map thoughts
Date:	2007-11-06 13:19:16
Message-ID:	e51f66da0711060519q2158b3a1pbf6120b283b7b4bb@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 11/6/07, Heikki Linnakangas <heikki(at)enterprisedb(dot)com> wrote:
> Jeff Davis wrote:
> > On Mon, 2007-11-05 at 09:52 +0000, Heikki Linnakangas wrote:
> >> It's not useful for VACUUM FREEZE, unless we're willing to freeze much
> >> more aggressively, and change the meaning of a set bit to "all tuples on
> >> heap page are frozen".
> >
> > This means that a regular VACUUM will no longer be enough to ensure
> > safety from transaction id wraparound.
>
> Good point. So we'd still need regular VACUUMs every now and then.
>
> (Gosh, we really need a name for the sort of vacuum. I was about to say
> "we'd still need regular regular VACUUMs" :-))

As the new VACUUM variant will be somewhat unsafe, it should
not replace "regular" VACUUM but get separate name.

VACUUM FAST maybe? Informally "fastvacuum". Something with
"lazy" or "partial" would also be possibility.

--
marko

From:	Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
To:	Marko Kreen <markokr(at)gmail(dot)com>
Cc:	Jeff Davis <pgsql(at)j-davis(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Visibility map thoughts
Date:	2007-11-06 13:24:59
Message-ID:	47306B2B.2050806@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Marko Kreen wrote:
> On 11/6/07, Heikki Linnakangas <heikki(at)enterprisedb(dot)com> wrote:
>> (Gosh, we really need a name for the sort of vacuum. I was about to say
>> "we'd still need regular regular VACUUMs" :-))
>
> As the new VACUUM variant will be somewhat unsafe, it should
> not replace "regular" VACUUM but get separate name.

What do you mean by unsafe? It is supposed to reclaim all dead tuples a
normal vacuum would, except for HOT updated tuples that can be pruned
without scanning indexes. It doesn't advance the relfrozenxid or update
stats, though.

> VACUUM FAST maybe? Informally "fastvacuum". Something with
> "lazy" or "partial" would also be possibility.

We already call the regular vacuum "lazy" in the source code, as opposed
to VACUUM FULL. Partial is also bit misleading; while it doesn't scan
the whole table, it should find all dead tuples.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
To:	Mark Mielke <mark(at)mark(dot)mielke(dot)cc>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Visibility map thoughts
Date:	2007-11-06 13:29:03
Message-ID:	47306C1F.4020302@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Mark Mielke wrote:
> Simon Riggs wrote:
>> On Mon, 2007-11-05 at 09:52 +0000, Heikki Linnakangas wrote:
>> I'm racking my brain trying to think of a query that will benefit from
>> index-only scans without specifically creating covered indexes. Apart
>> from count(*) queries and RI lookups. I can't see RI lookups being much
>> cheaper with this technique, do you see something there
> I'm not sure what RI lookup is. Sorry. :-)

Referential Integrity. For example, if you insert a row to table Child,
that has a foreign key reference to table Parent, a RI trigger is fired
that checks the there's a row in Parent table for that key.

Unfortunately that lookup is done with "FOR SHARE", index-only scan
won't help because we have to go and lock the heap tuple anyway :(.

> My list would be:
> - EXISTS / NOT EXISTS
> - COUNT(*)

Yeah, those are good candidates.

> - Tables that are heavily updated - any case where the index entry often
> maps to a non-visible tuple.

Heavily updated tuples won't benefit from the visibility map, because
the bits in the map will be clear all the time due to the updates.

> Beyond that, yeah, I cannot think of other benefits.

Many-to-many relationships is one example:

CREATE TABLE aa (id INTEGER PRIMARY KEY);
CREATE TABLE bb (id INTEGER PRIMARY KEY);
CREATE TABLE aa_bb (aid INTEGER REFERENCES aa (id), bid INTEGER
REFERENCES bb (id));

The relationship table will usually have indexes in both directions:

CREATE INDEX i_aa_bb_1 ON aa_bb (aid, bid);
CREATE INDEX i_aa_bb_2 ON aa_bb (bid, aid);

And of course people will start adding columns to indexes, to make use
of index-only-scans, once we have the capability.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	"Marko Kreen" <markokr(at)gmail(dot)com>
To:	"Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>
Cc:	"Jeff Davis" <pgsql(at)j-davis(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Visibility map thoughts
Date:	2007-11-06 13:40:22
Message-ID:	e51f66da0711060540w3777cf22o885dd69d9d84b521@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 11/6/07, Heikki Linnakangas <heikki(at)enterprisedb(dot)com> wrote:
> Marko Kreen wrote:
> > On 11/6/07, Heikki Linnakangas <heikki(at)enterprisedb(dot)com> wrote:
> >> (Gosh, we really need a name for the sort of vacuum. I was about to say
> >> "we'd still need regular regular VACUUMs" :-))
> >
> > As the new VACUUM variant will be somewhat unsafe, it should
> > not replace "regular" VACUUM but get separate name.
>
> What do you mean by unsafe? It is supposed to reclaim all dead tuples a
> normal vacuum would, except for HOT updated tuples that can be pruned
> without scanning indexes. It doesn't advance the relfrozenxid or update
> stats, though.

I got the impression from this paragraph:

"This kind of map is useful for VACUUM as well, though VACUUM could get
away with less strict semantics, and we might not want to visit a page
in VACUUM if there's only very few dead tuples on a page."

That seems to hint that the current VACUUM needs still be
run occasionally.

> > VACUUM FAST maybe? Informally "fastvacuum". Something with
> > "lazy" or "partial" would also be possibility.
>
> We already call the regular vacuum "lazy" in the source code, as opposed
> to VACUUM FULL. Partial is also bit misleading; while it doesn't scan
> the whole table, it should find all dead tuples.

As the VACUUM FULL is unusable in real life, maybe we should
move current VACUUM functionality under VACUUM FULL ;)

--
marko

From:	Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
To:	Gregory Stark <stark(at)enterprisedb(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Visibility map thoughts
Date:	2007-11-06 13:52:58
Message-ID:	473071BA.30407@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Gregory Stark wrote:
> "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com> writes:
>
>> One problem is that you have to atomically update the visibility map when
>> you update the heap. That means that you have to lock the visibility map
>> page and the heap page at the same time. If the visibility map is in the
>> heap, you need to take care that you don't deadlock.
>
> Well that's still a problem if it's in another filenode.
>
> On the other hand if you allocate a whole byte to every page you could set it
> atomically and not have to lock the page. Effectively having the lock on the
> original page protect the info byte. Whereas setting a single bit requires
> protecting against someone setting one of the other bits corresponding to
> another page entirely.

I don't buy that. I believe at least on some architectures you'd get a
word-long load+modify+store, and scribble the neighboring bytes.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	Simon Riggs <simon(at)2ndquadrant(dot)com>
To:	Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
Cc:	Mark Mielke <mark(at)mark(dot)mielke(dot)cc>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Visibility map thoughts
Date:	2007-11-06 14:08:37
Message-ID:	1194358117.4268.44.camel@ebony.site
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, 2007-11-06 at 13:29 +0000, Heikki Linnakangas wrote:

> > My list would be:
> > - EXISTS / NOT EXISTS
> > - COUNT(*)
>
> Yeah, those are good candidates.

Yeah.

> Many-to-many relationships is one example:

OK, thats a very good one.

> And of course people will start adding columns to indexes, to make use
> of index-only-scans, once we have the capability.

Not too keen on that. Very difficult to judge whether its worth the
benefit for creating lots of extra columns in indexes. Specifically,
this isn't going to speed up any existing application without additional
design work.

But seems like we have reasonable reason for them without that.

Do we know how much faster things might go if we do that?

--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com

From:	Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Visibility map thoughts
Date:	2007-11-06 14:18:21
Message-ID:	473077AD.30101@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Simon Riggs wrote:
> I'm thinking that looking in the visibility map will have a cost also,
> so how will we know whether to bother looking? I'm assuming that we
> won't want to do that lookup in all cases, since it could easily just
> add pathlength and contention in the normal OLTP case.

Yeah, I believe that's true. There's a non-zero cost to return tuples
from the index as well. Now that we do b-tree scans page at a time, if
we're going to return data from the index, we need to copy all the
matching data to backend-private memory when we step to a page.

> Presumably there
> would be a test in the planner to see if an index-only plan was
> possible?

Yes, you can check if all the columns you need are in the index. You
won't know for sure how much of the visibility map is set until
execution time, however.

I'm thinking that we're going to have an entry in pg_stats on the
fraction of set bits in the map, and factor that in the cost estimate of
index scans.

> ISTM that it would make most sense to do it during BitmapIndex scans.
> Specifically, as an intermediate step between BitmapIndex scan and
> BitmapHeap scan. That would be fairly likely to be a win in most cases
> because the bitmaps should compare easily and the amortised cost per row
> is likely to be very small, even if no heap lookups are avoided.

That only helps with COUNT(*), because until the BitmapHeap scan all you
have is a bitmap of tuples. You can't return any values from just a
bitmap. (Seems easy to do, so it's still probably worth doing just for
COUNT(*), though)

A major design question is how where and how the executor for index-only
scans should work. The required planner changes will follow from that.
The simplest solution is to just modify nodeIndexScan.c a little bit to
check the visibility map before fetching the heap tuple. If the bit in
the visibility map is set, skip the heap fetch and return the required
values from the index tuple instead. That would give most of the benefit
already.

The indexam API needs to be modified as well, because there's currently
no API to return index tuples from an index.

There's some more fancy stuff we could do if we separated the index scan
and heap fetch to two different executor nodes. You could for example do
an index-only scan and join, and check visibility only after the join,
avoiding unnecessarily accessing the heap for tuples that don't satisfy
the join condition. In fact we could do that without the visibility map
as well.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	Mark Mielke <mark(at)mark(dot)mielke(dot)cc>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	Heikki Linnakangas <heikki(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Visibility map thoughts
Date:	2007-11-06 14:21:00
Message-ID:	4730784C.7020004@mark.mielke.cc
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Simon Riggs wrote:
> On Tue, 2007-11-06 at 13:29 +0000, Heikki Linnakangas wrote:
>
>> And of course people will start adding columns to indexes, to make use
>> of index-only-scans, once we have the capability.
>>
> Not too keen on that. Very difficult to judge whether its worth the
> benefit for creating lots of extra columns in indexes. Specifically,
> this isn't going to speed up any existing application without additional
> design work.
>
> But seems like we have reasonable reason for them without that.
>
> Do we know how much faster things might go if we do that?
>
Effectively - you get a materialized view with limitations (no joins or
calculations), with rows in B-Tree order. Update speed would suffer, but
I would expect nearly all random access queries to improve, and the
fewer the columns included in the index, the less data that needs to be
scanned to find the data you want.

I have some data that might benefit. For example, on one system I
synchronize data from ACCPAC on MSSQL into PGSQL, then use only a subset
of the columns in the ACCPAC tables in my PGSQL queries.

I say might, because the ACCPAC data is so sprawled out that my
"materialized view" does significant calculation calculating aggregates
and fields with conditional values. The ACCPAC query based entirely on a
view takes over 1 second to run. The query on the "materialized view"
row takes 0.01 seconds. Quite a difference. :-)

Cheers,
mark

--
Mark Mielke <mark(at)mielke(dot)cc>

From:	"Gokulakannan Somasundaram" <gokul007(at)gmail(dot)com>
To:	"Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>
Cc:	"Gregory Stark" <stark(at)enterprisedb(dot)com>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Visibility map thoughts
Date:	2007-11-06 14:42:50
Message-ID:	9362e74e0711060642w6db04f61pd8b56efc69a8aa11@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Nov 6, 2007 4:33 PM, Heikki Linnakangas <heikki(at)enterprisedb(dot)com> wrote:
> Gokulakannan Somasundaram wrote:
> > Just one more thought on the same. This implementation also assumes
> > that there won't be any update chains across pages, which is the
> > current stage.
>
> No, it doesn't assume that.

Say, if there is a tuple, which is visible to everyone, the Vacuum
process might have marked that block as visible. But the new update
which has happened, which is linked to the index tuple through the
head of the update chain might not be visible to everyone. How do you
think this design takes care of it?

>
> > Heikki,
> > Is it planned as a optional feature? (I support the optional
> > feature model)
>
> I'm not planning it. Clearly you are :-).

If you are planning it as a non-optional feature, you do realize that
there might be tables which don't need this index only scan and still
incur the extra overhead of maintaining the Visibility map.

--
Thanks,
Gokul.
CertoSQL Project,
Allied Solution Group.
(www.alliedgroups.com)

From:	Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
To:	Gokulakannan Somasundaram <gokul007(at)gmail(dot)com>
Cc:	Gregory Stark <stark(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Visibility map thoughts
Date:	2007-11-06 14:54:22
Message-ID:	4730801E.50804@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Gokulakannan Somasundaram wrote:
> On Nov 6, 2007 4:33 PM, Heikki Linnakangas <heikki(at)enterprisedb(dot)com> wrote:
>> Gokulakannan Somasundaram wrote:
>>> Just one more thought on the same. This implementation also assumes
>>> that there won't be any update chains across pages, which is the
>>> current stage.
>> No, it doesn't assume that.
>
> Say, if there is a tuple, which is visible to everyone, the Vacuum
> process might have marked that block as visible. But the new update
> which has happened, which is linked to the index tuple through the
> head of the update chain might not be visible to everyone. How do you
> think this design takes care of it?

Oh, I see. If it's a HOT update, the index-only-scan would still see the
right thing, because the old and the new row version have the same value
in the indexed columns.

Now if you also delete the heap-only tuple on the new page, that's
different from the page where the root tuple is, that's an interesting
situation. I suppose you'd have to clear the bit in the page holding the
root tuple instead of the page where the tuple itself is.

This is pretty academical, of course, because we don't know how exactly
the cross-page HOT update chains would work.

>>> Heikki,
>>> Is it planned as a optional feature? (I support the optional
>>> feature model)
>> I'm not planning it. Clearly you are :-).
>
> If you are planning it as a non-optional feature, you do realize that
> there might be tables which don't need this index only scan and still
> incur the extra overhead of maintaining the Visibility map.

Oh you were asking if I'm planning the visibility map as an optional
feature. I thought you meant if I'm planning the cross-page HOT update
chains.

I'm thinking the visibility map would always be there, assuming that
updating it is cheap enough. If we have the bit in the heap page header
as well, I believe it would be practically zero-cost.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	"Gokulakannan Somasundaram" <gokul007(at)gmail(dot)com>
To:	"Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>
Cc:	"Gregory Stark" <stark(at)enterprisedb(dot)com>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Visibility map thoughts
Date:	2007-11-06 15:44:00
Message-ID:	9362e74e0711060744t7155ba68laf979f69ed9bd1f1@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

By the way, please have a look at how i have converted a index scan
into a index only scan in the thick index patch. Currently it doesn't
convert those queries which doesn't have where clause. I hope you
would be able refine it further.
For example, if there is a query like select count(1) from table. Then
we can scan through all the index pages and the visibility map to get
the count. Currently it goes for Full table scan. there should be
something like full index scan, which should take care of it. Thick
index implementation will only go for index-only scans, if there is a
query with where clause (select count(1) from table where idxcol
between n1 and n2)

Thanks,
Gokul.

On Nov 6, 2007 8:24 PM, Heikki Linnakangas <heikki(at)enterprisedb(dot)com> wrote:
> Gokulakannan Somasundaram wrote:
> > On Nov 6, 2007 4:33 PM, Heikki Linnakangas <heikki(at)enterprisedb(dot)com> wrote:
> >> Gokulakannan Somasundaram wrote:
> >>> Just one more thought on the same. This implementation also assumes
> >>> that there won't be any update chains across pages, which is the
> >>> current stage.
> >> No, it doesn't assume that.
> >
> > Say, if there is a tuple, which is visible to everyone, the Vacuum
> > process might have marked that block as visible. But the new update
> > which has happened, which is linked to the index tuple through the
> > head of the update chain might not be visible to everyone. How do you
> > think this design takes care of it?
>
> Oh, I see. If it's a HOT update, the index-only-scan would still see the
> right thing, because the old and the new row version have the same value
> in the indexed columns.
>
> Now if you also delete the heap-only tuple on the new page, that's
> different from the page where the root tuple is, that's an interesting
> situation. I suppose you'd have to clear the bit in the page holding the
> root tuple instead of the page where the tuple itself is.
>
> This is pretty academical, of course, because we don't know how exactly
> the cross-page HOT update chains would work.
>
> >>> Heikki,
> >>> Is it planned as a optional feature? (I support the optional
> >>> feature model)
> >> I'm not planning it. Clearly you are :-).
> >
> > If you are planning it as a non-optional feature, you do realize that
> > there might be tables which don't need this index only scan and still
> > incur the extra overhead of maintaining the Visibility map.
>
> Oh you were asking if I'm planning the visibility map as an optional
> feature. I thought you meant if I'm planning the cross-page HOT update
> chains.
>
> I'm thinking the visibility map would always be there, assuming that
> updating it is cheap enough. If we have the bit in the heap page header
> as well, I believe it would be practically zero-cost.
>
>
> --
> Heikki Linnakangas
> EnterpriseDB http://www.enterprisedb.com
>

--
Thanks,
Gokul.
CertoSQL Project,
Allied Solution Group.
(www.alliedgroups.com)

From:	Jeff Davis <pgsql(at)j-davis(dot)com>
To:	Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Visibility map thoughts
Date:	2007-11-06 17:19:14
Message-ID:	1194369554.22428.177.camel@dogma.ljc.laika.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, 2007-11-05 at 22:45 +0000, Heikki Linnakangas wrote:
> > 1) Do as you say above. What are some of the cost trade-offs here? It
> > seems that frequent VACUUM FREEZE runs would keep the visibility map
> > mostly full, but will also cause more writing. I suppose the worst case
> > is that every tuple write needs results in two data page writes, one
> > normal write and another to freeze it later, which sounds bad. Maybe
> > there's a way to try to freeze the tuples on a page before it's written
> > out?
>
> It would also create more WAL traffic, because freezing tuples needs to
> be WAL-logged.

The thought crossed my mind, but I couldn't think of any reason that
would need to be logged. Of course you're right, and the comments
explain it well.

> 5) Have a more fine-grain equivalent of relfrozenxid. For example one
> frozenxid per visibility map page, so that whenever you update the
> visibility map, you also update the frozenxid. To advance the
> relfrozenxid in pg_class, you scan the visibility map and set
> relfrozenxid to the smallest frozenxid. Unlike relfrozenxid, it could be
> set to FrozenXid if the group of pages are totally frozen.
>

Wouldn't that still require WAL traffic? Otherwise how can you guarantee
that the FrozenXid hits disk before TruncateCLOG truncates the old xmin
away?

Regards,
Jeff Davis

From:	Jeff Davis <pgsql(at)j-davis(dot)com>
To:	Mark Mielke <mark(at)mark(dot)mielke(dot)cc>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, Heikki Linnakangas <heikki(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Visibility map thoughts
Date:	2007-11-06 17:50:16
Message-ID:	1194371416.22428.183.camel@dogma.ljc.laika.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, 2007-11-06 at 08:01 -0500, Mark Mielke wrote:
> Simon Riggs wrote:
> > On Mon, 2007-11-05 at 09:52 +0000, Heikki Linnakangas wrote:
> > I'm racking my brain trying to think of a query that will benefit from
> > index-only scans without specifically creating covered indexes. Apart
> > from count(*) queries and RI lookups. I can't see RI lookups being much
> > cheaper with this technique, do you see something there
> I'm not sure what RI lookup is. Sorry. :-)
>
> My list would be:
> - EXISTS / NOT EXISTS
> - COUNT(*)
> - Tables that are heavily updated - any case where the index entry often
> maps to a non-visible tuple.
>
> Beyond that, yeah, I cannot think of other benefits.
>

What about range queries or sorts?

The correlation of an index to itself is 100% :)

Regards,
Jeff Davis

From:	Gregory Stark <stark(at)enterprisedb(dot)com>
To:	"Gokulakannan Somasundaram" <gokul007(at)gmail(dot)com>
Cc:	"Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Visibility map thoughts
Date:	2007-11-06 20:04:29
Message-ID:	87r6j316pe.fsf@oxford.xeocode.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

"Gokulakannan Somasundaram" <gokul007(at)gmail(dot)com> writes:

> For example, if there is a query like select count(1) from table. Then
> we can scan through all the index pages and the visibility map to get
> the count. Currently it goes for Full table scan. there should be
> something like full index scan, which should take care of it.

Huh, that does lead me to something we hadn't mentioned previously. If we kept
count of how many tuples were on each known-visible page then we could do a
full tables scan and still skip heap page fetches if we don't need any columns
(ie, for select count(*)).

I guess it's not terribly useful for anything other than select count(*)
though. But it would be nice. It would not suffer from the concurrency issues
that caching a single count would have because only vacuum (and page pruning)
would every update the count. Normal updates/delete/inserts would only have to
clear the bit and only if the page was marked as having the bit set.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's PostGIS support!

From:	Gregory Stark <stark(at)enterprisedb(dot)com>
To:	"Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>
Cc:	"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Visibility map thoughts
Date:	2007-11-06 20:13:19
Message-ID:	87mytr16ao.fsf@oxford.xeocode.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

"Heikki Linnakangas" <heikki(at)enterprisedb(dot)com> writes:

> I don't buy that. I believe at least on some architectures you'd get a
> word-long load+modify+store, and scribble the neighboring bytes.

Hm, I mis-remembered this bit of advice from the glibc info doc. I remembered
thinking it was strange when I read it but I guess my memory exaggerated how
strange it was:

.> In practice, you can assume that `int' is atomic. You can also assume
.> that pointer types are atomic; that is very convenient. Both of these
.> assumptions are true on all of the machines that the GNU C library supports
.> and on all POSIX systems we know of.

I suppose if we could keep count of tuples and a count of free space and use a
whole word. Map files would be 1M per 2G heap file (on an 8kb blocksize and
4-byte words). More complicated than necessary but I'm just thinking out loud.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's RemoteDBA services!

From:	Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
To:	Jeff Davis <pgsql(at)j-davis(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Visibility map thoughts
Date:	2007-11-07 10:45:22
Message-ID:	47319742.2060202@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Jeff Davis wrote:
> On Mon, 2007-11-05 at 22:45 +0000, Heikki Linnakangas wrote:
>>> 1) Do as you say above. What are some of the cost trade-offs here? It
>>> seems that frequent VACUUM FREEZE runs would keep the visibility map
>>> mostly full, but will also cause more writing. I suppose the worst case
>>> is that every tuple write needs results in two data page writes, one
>>> normal write and another to freeze it later, which sounds bad. Maybe
>>> there's a way to try to freeze the tuples on a page before it's written
>>> out?
>> It would also create more WAL traffic, because freezing tuples needs to
>> be WAL-logged.
>
> The thought crossed my mind, but I couldn't think of any reason that
> would need to be logged. Of course you're right, and the comments
> explain it well.
>
>> 5) Have a more fine-grain equivalent of relfrozenxid. For example one
>> frozenxid per visibility map page, so that whenever you update the
>> visibility map, you also update the frozenxid. To advance the
>> relfrozenxid in pg_class, you scan the visibility map and set
>> relfrozenxid to the smallest frozenxid. Unlike relfrozenxid, it could be
>> set to FrozenXid if the group of pages are totally frozen.
>>
>
> Wouldn't that still require WAL traffic? Otherwise how can you guarantee
> that the FrozenXid hits disk before TruncateCLOG truncates the old xmin
> away?

Updating the fine-grain frozenxid would still need to be WAL-logged. But
it would be lot less frequent than aggressively freezing tuples.
Compared to the idea of having a separate bitmap or two bits per tuple
in one data structure, you wouldn't necessarily have to freeze tuples to
advance it, you could just observe what the smallest xid on a group of
pages is. Like regular lazy vacuum does right now for relfrozenxid, just
more fine-grained.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To:	"Simon Riggs" <simon(at)2ndquadrant(dot)com>, "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>
Cc:	"PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Visibility map thoughts
Date:	2007-11-07 18:20:12
Message-ID:	4731AD7C.EE98.0025.0@wicourts.gov
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

>>> On Tue, Nov 6, 2007 at 8:18 AM, in message <473077AD(dot)30101(at)enterprisedb(dot)com>,
Heikki Linnakangas <heikki(at)enterprisedb(dot)com> wrote:

> The indexam API needs to be modified as well, because there's currently
> no API to return index tuples from an index.

I know this is tangential, but expanding the types of selection
criteria which can be applied to index entries (beyond equality and
range tests) might fall out of this, yes?

-Kevin

From:	Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
To:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Visibility map thoughts
Date:	2007-11-07 21:13:34
Message-ID:	47322A7E.8060000@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Kevin Grittner wrote:
>>>> On Tue, Nov 6, 2007 at 8:18 AM, in message <473077AD(dot)30101(at)enterprisedb(dot)com>,
> Heikki Linnakangas <heikki(at)enterprisedb(dot)com> wrote:
>
>> The indexam API needs to be modified as well, because there's currently
>> no API to return index tuples from an index.
>
> I know this is tangential, but expanding the types of selection
> criteria which can be applied to index entries (beyond equality and
> range tests) might fall out of this, yes?

What else so you have in mind?

With index-only-scans, it would sometimes make sense to do a full index
scan. I wasn't thinking of anything else on that front myself.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To:	"Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>
Cc:	"Simon Riggs" <simon(at)2ndquadrant(dot)com>, "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Visibility map thoughts
Date:	2007-11-07 21:42:53
Message-ID:	4731DCFC.EE98.0025.0@wicourts.gov
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

>>> On Wed, Nov 7, 2007 at 3:13 PM, in message
<47322A7E(dot)8060000(at)enterprisedb(dot)com>, Heikki Linnakangas
<heikki(at)enterprisedb(dot)com> wrote:
> Kevin Grittner wrote:
>>>>> On Tue, Nov 6, 2007 at 8:18 AM, in message <473077AD(dot)30101(at)enterprisedb(dot)com>,
>> Heikki Linnakangas <heikki(at)enterprisedb(dot)com> wrote:
>>
>>> The indexam API needs to be modified as well, because there's currently
>>> no API to return index tuples from an index.
>>
>> I know this is tangential, but expanding the types of selection
>> criteria which can be applied to index entries (beyond equality and
>> range tests) might fall out of this, yes?
>
> What else so you have in mind?
>
> With index-only-scans, it would sometimes make sense to do a full index
> scan. I wasn't thinking of anything else on that front myself.

I know this issue on this thread has come up at least one or two
other times lately:

http://archives.postgresql.org/pgsql-performance/2007-08/msg00113.php

I know it's a largely independent issue, but your comment about the
API not giving access to the index tuples echoed comments regarding
what it would take to allow optimizations in this area.

-Kevin

From:	Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
To:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Visibility map thoughts
Date:	2007-11-08 09:21:50
Message-ID:	4732D52E.6010602@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Kevin Grittner wrote:
> I know this issue on this thread has come up at least one or two
> other times lately:
>
> http://archives.postgresql.org/pgsql-performance/2007-08/msg00113.php
>
> I know it's a largely independent issue, but your comment about the
> API not giving access to the index tuples echoed comments regarding
> what it would take to allow optimizations in this area.

If we do separate the index scan and heap fetch to two executor nodes,
it would handle that case as well, I think. It's not directly related to
index-only scans, but returning data from an index is required for that
as well.

It wouldn't be a bitmap index scan anymore, BTW.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com