Re: vacuum, performance, and MVCC

Lists: pgsql-hackers
From: Bruce Momjian <bruce(at)momjian(dot)us>
To: PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Csaba Nagy <nagy(at)ecircle-ag(dot)com>, Hannu Krosing <hannu(at)skype(dot)net>, Mark Woodward <pgsql(at)mohawksoft(dot)com>, "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>, Christopher Browne <cbbrowne(at)acm(dot)org>
Subject: Re: vacuum, performance, and MVCC
Date: 2006-06-24 17:42:00
Message-ID: 200606241742.k5OHg0b07414@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

bruce wrote:
> Tom Lane wrote:
> > Bruce Momjian <bruce(at)momjian(dot)us> writes:
> > > I think at some point we have to admit that _polling_ the tables, which
> > > is what autovacuum does, just isn't going to work well, no matter how
> > > much it is tweeked, and another approach should be considered for
> > > certain workload cases.
> >
> > Autovacuum polls in its current, first-generation implementation;
> > what I said upthread was it needs to be smarter than that. I am not
> > sure how you get from that to the conclusion that the very next step
> > is to abandon the vacuuming approach altogether.
>
> I am not ready to abandon autovacuum, but as I stated later the UPDATE
> with no key change case is common enought that it could be handled
> better without involving autovacuum and its limitations.
>
> As I remember, most databases have problem with DELETE/INSERT cycles,
> but we seem to be hit by UPDATE performance more than most, and more
> than is wise.

In an attempt to get some concrete on these ideas... ;-)

I think the major complexity in doing an in-place UPDATE when no key
columns change is allowing rollback on transaction abort (or backend
crash), and properly handling visibility for transactions in progress.

If the old and new rows are on the same heap page (perhaps a necessary
limitation), you can easily update the heap item id to point to the new
heap row. All indexes will then point to the new row, and sequential
scans will still see both rows (which is good). That leave the rollback
issue (undoing the item id change), and having index scans for current
backends still see the old row.

OK, I have an idea. Right now, an UPDATE where the old and new rows are
on the same page have two index entries. What if we made only one index
entry for both? We already have UPDATE chaining, where the old row
points to the new one. If no key columns change, we set a bit in the
heap that the chaining points to the old and new row (both on the same
page), so an index scan uses one index entry to see the old and new row,
and once the old row is no longer visible, the page index id can be
updated to point to the new row and the old row can be marked free and
perhaps used for a future UPDATE. (UPDATE already tries to do keep
updates on the same heap page.)

FYI, the reason heap cleanup is possible once you go with a single index
entry for old and new rows is because there is no index cleanup (and
single-row index cleanup is very expensive).

--
Bruce Momjian bruce(at)momjian(dot)us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Csaba Nagy <nagy(at)ecircle-ag(dot)com>, Hannu Krosing <hannu(at)skype(dot)net>, Mark Woodward <pgsql(at)mohawksoft(dot)com>, "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>, Christopher Browne <cbbrowne(at)acm(dot)org>
Subject: Re: vacuum, performance, and MVCC
Date: 2006-06-24 19:01:01
Message-ID: Pine.OSF.4.61.0606242148480.44498@kosh.hut.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, 24 Jun 2006, Bruce Momjian wrote:

> OK, I have an idea. Right now, an UPDATE where the old and new rows are
> on the same page have two index entries. What if we made only one index
> entry for both? We already have UPDATE chaining, where the old row
> points to the new one. If no key columns change, we set a bit in the
> heap that the chaining points to the old and new row (both on the same
> page), so an index scan uses one index entry to see the old and new row,
> and once the old row is no longer visible, the page index id can be
> updated to point to the new row and the old row can be marked free and
> perhaps used for a future UPDATE. (UPDATE already tries to do keep
> updates on the same heap page.)

In fact, that's what I originally thought Mark was suggesting. A couple of
points:

Why the limitation of old and new row being on the same page?

This only works if none of the updated columns are indexed. That's a bit
annoying. It would be nice to be able to create new index tuples in those
indexes that contain one of the changed columns, but not in others.

What happens if you create a new index that contains one of
the changed columns?

- Heikki


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Csaba Nagy <nagy(at)ecircle-ag(dot)com>, Hannu Krosing <hannu(at)skype(dot)net>, Mark Woodward <pgsql(at)mohawksoft(dot)com>, "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>, Christopher Browne <cbbrowne(at)acm(dot)org>
Subject: Re: vacuum, performance, and MVCC
Date: 2006-06-24 22:41:43
Message-ID: 200606242241.k5OMfhm26768@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Heikki Linnakangas wrote:
> On Sat, 24 Jun 2006, Bruce Momjian wrote:
>
> > OK, I have an idea. Right now, an UPDATE where the old and new rows are
> > on the same page have two index entries. What if we made only one index
> > entry for both? We already have UPDATE chaining, where the old row
> > points to the new one. If no key columns change, we set a bit in the
> > heap that the chaining points to the old and new row (both on the same
> > page), so an index scan uses one index entry to see the old and new row,
> > and once the old row is no longer visible, the page index id can be
> > updated to point to the new row and the old row can be marked free and
> > perhaps used for a future UPDATE. (UPDATE already tries to do keep
> > updates on the same heap page.)
>
> In fact, that's what I originally thought Mark was suggesting. A couple of
> points:
>
> Why the limitation of old and new row being on the same page?

Because having them be on the same page is the only way you can update
the page item pointer so when you recycle the row, you the indexes are
now pointing to the new version. Pages look like:

[marker][item1][item2][item3]...[tuple1][tuple2][tuple3]

and indexes only point to items, not to tuples. This allows tuples to
be compacted on the page without affecting the indexes.

If tuple1 is updated to tuple2, once tuple1 is no longer visible to any
backends, you can modify item1 to point to tuple2, and you can mark the
space used by tuple1 as reusable:

[marker][item1(tuple2)][item2][item3]...[free][tuple2][tuple3]

> This only works if none of the updated columns are indexed. That's a bit
> annoying. It would be nice to be able to create new index tuples in those

The hope is that a commonly updated tuple will eventually be on a page
where there is sufficient free space for updated version to stay on
there. For an active server, there might be several updated versions of
rows on the same page.

> indexes that contain one of the changed columns, but not in others.

If you can't expire the old row because one of the indexed columns was
modified, I see no reason to try to reduce the additional index entries.

> What happens if you create a new index that contains one of
> the changed columns?

Uh, I had not thought of that. You could easily create two index
entries for the old and new rows, but then the heap bit saying there is
only one index row would be inaccurate for the new index. I suppose you
could create new rows in all indexes and clear the heap bit.

--
Bruce Momjian bruce(at)momjian(dot)us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Csaba Nagy <nagy(at)ecircle-ag(dot)com>, Hannu Krosing <hannu(at)skype(dot)net>, Mark Woodward <pgsql(at)mohawksoft(dot)com>, "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>, Christopher Browne <cbbrowne(at)acm(dot)org>
Subject: Re: vacuum, performance, and MVCC
Date: 2006-06-25 18:13:48
Message-ID: Pine.OSF.4.61.0606250954090.124217@kosh.hut.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, 24 Jun 2006, Bruce Momjian wrote:

> Because having them be on the same page is the only way you can update
> the page item pointer so when you recycle the row, you the indexes are
> now pointing to the new version. Pages look like:
>
> [marker][item1][item2][item3]...[tuple1][tuple2][tuple3]
>
> and indexes only point to items, not to tuples. This allows tuples to
> be compacted on the page without affecting the indexes.
>
> If tuple1 is updated to tuple2, once tuple1 is no longer visible to any
> backends, you can modify item1 to point to tuple2, and you can mark the
> space used by tuple1 as reusable:
>
> [marker][item1(tuple2)][item2][item3]...[free][tuple2][tuple3]

Ok, now I think I get it. So the limitation of old and new tuple being on
the same page is required to make it possible to remove the old tuple
without touching the indexes?

If you move the new tuple (logically, by modifying item pointers) on
vacuum, isn't there a risk that a concurrent seqscan misses it?

> If you can't expire the old row because one of the indexed columns was
> modified, I see no reason to try to reduce the additional index entries.

It won't enable early expiration, but it means less work to do on update.
If there's a lot of indexes, not having to add so many index tuples can be
a significant saving.

To summarise, we have two issues related to frequent updates:
1. Index and heap bloat, requiring frequent vacuum.
2. Updates are more expensive than on other DBMSs, because we have to add
a new index tuple in every index, even if none of the indexed columns are
modified.

Tom suggested that we just improve vacuum and autovacuum, and someone
brought up the dead space map idea again. Those are all worthwhile things
to do and help with vacuuming after deletes as well as updates, but they
only address issue 1. Mark's suggestion (assuming that it would've
worked) as well as yours address both, but only for updates.

- Heikki


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Csaba Nagy <nagy(at)ecircle-ag(dot)com>, Hannu Krosing <hannu(at)skype(dot)net>, Mark Woodward <pgsql(at)mohawksoft(dot)com>, "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>, Christopher Browne <cbbrowne(at)acm(dot)org>
Subject: Re: vacuum, performance, and MVCC
Date: 2006-06-25 18:52:44
Message-ID: 200606251852.k5PIqih25183@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Heikki Linnakangas wrote:
> On Sat, 24 Jun 2006, Bruce Momjian wrote:
>
> > Because having them be on the same page is the only way you can update
> > the page item pointer so when you recycle the row, you the indexes are
> > now pointing to the new version. Pages look like:
> >
> > [marker][item1][item2][item3]...[tuple1][tuple2][tuple3]
> >
> > and indexes only point to items, not to tuples. This allows tuples to
> > be compacted on the page without affecting the indexes.
> >
> > If tuple1 is updated to tuple2, once tuple1 is no longer visible to any
> > backends, you can modify item1 to point to tuple2, and you can mark the
> > space used by tuple1 as reusable:
> >
> > [marker][item1(tuple2)][item2][item3]...[free][tuple2][tuple3]
>
> Ok, now I think I get it. So the limitation of old and new tuple being on
> the same page is required to make it possible to remove the old tuple
> without touching the indexes?

Yes, modifying the heap page item pointer is required for reuse.

> If you move the new tuple (logically, by modifying item pointers) on
> vacuum, isn't there a risk that a concurrent seqscan misses it?

Well, you lock the page while updating the item pointer. Because the
versions are on the same page, a single page lock should be fine.

> > If you can't expire the old row because one of the indexed columns was
> > modified, I see no reason to try to reduce the additional index entries.
>
> It won't enable early expiration, but it means less work to do on update.
> If there's a lot of indexes, not having to add so many index tuples can be
> a significant saving.

Already added to TODO.

* Reuse index tuples that point to heap tuples that are not visible to
anyone?

> To summarise, we have two issues related to frequent updates:
> 1. Index and heap bloat, requiring frequent vacuum.
> 2. Updates are more expensive than on other DBMSs, because we have to add
> a new index tuple in every index, even if none of the indexed columns are
> modified.
>
> Tom suggested that we just improve vacuum and autovacuum, and someone
> brought up the dead space map idea again. Those are all worthwhile things
> to do and help with vacuuming after deletes as well as updates, but they
> only address issue 1. Mark's suggestion (assuming that it would've
> worked) as well as yours address both, but only for updates.

Agreed.

--
Bruce Momjian bruce(at)momjian(dot)us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Csaba Nagy <nagy(at)ecircle-ag(dot)com>, Hannu Krosing <hannu(at)skype(dot)net>, Mark Woodward <pgsql(at)mohawksoft(dot)com>, "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>, Christopher Browne <cbbrowne(at)acm(dot)org>
Subject: Re: vacuum, performance, and MVCC
Date: 2006-06-26 18:14:38
Message-ID: 20060626181438.GN93655@pervasive.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Jun 25, 2006 at 09:13:48PM +0300, Heikki Linnakangas wrote:
> >If you can't expire the old row because one of the indexed columns was
> >modified, I see no reason to try to reduce the additional index entries.
>
> It won't enable early expiration, but it means less work to do on update.
> If there's a lot of indexes, not having to add so many index tuples can be
> a significant saving.

While catching up on this thread, the following idea came to me that I
think would allow for not updating an index on an UPDATE if it's key
doesn't change. If I understand Bruce's SITC proposal correctly, this
would differ in that SITC requires that no index keys change.

My idea is that if an UPDATE places the new tuple on the same page as
the old tuple, it will not create new index entries for any indexes
where the key doesn't change. This means that when fetching tuples from
the index, ctid would have to be followed until you found the version
you wanted OR you found the first ctid that pointed to a different page
(because that tuple will have it's own index entry) OR you found a tuple
with a different value for the key of the index you're using (because
it'd be invalid, and there'd be a different index entry for it). I
believe that the behavior of the index hint bits would also have to
change somewhat, as each index entry would now essentially be pointing
at all the tuples in the ctid chain that exist on a page, not just
single tuple.

In the case of an UPDATE that needs to put the new tuple on a different
page, our current behavior would be used. This means that the hint bits
would still be useful in limiting the number of heap pages you hit. I
also believe this means that we wouldn't suffer any additional overhead
from our current code when there isn't much free space on pages.

Since SITC allows for in-page space reuse without vacuuming only when no
index keys change, it's most useful for very heavily updated tables such
as session handlers or queue tables, because those tables typically have
very few indexes, so it's pretty unlikely that an index key will change.
For more general-purpose tables that have more indexes but still see a
fair number of updates to a subset of rows, not having to update every
index would likely be a win. I also don't see any reason why both
options couldn't be used together.
--
Jim C. Nasby, Sr. Engineering Consultant jnasby(at)pervasive(dot)com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Csaba Nagy <nagy(at)ecircle-ag(dot)com>, Hannu Krosing <hannu(at)skype(dot)net>, Mark Woodward <pgsql(at)mohawksoft(dot)com>, "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>, Christopher Browne <cbbrowne(at)acm(dot)org>
Subject: Re: vacuum, performance, and MVCC
Date: 2006-06-26 18:32:59
Message-ID: 200606261832.k5QIWxZ12469@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


It is certainly possible to do what you are suggesting, that is have two
index entries point to same chain head, and have the index access
routines figure out if the index qualifications still hold, but that
seems like a lot of overhead.

Also, once there is only one visible row in the chain, removing old
index entries seems quite complex because you have to have vacuum keep
the qualifications of each row to figure out which index tuple is the
valid one (seems messy).

---------------------------------------------------------------------------

Jim C. Nasby wrote:
> On Sun, Jun 25, 2006 at 09:13:48PM +0300, Heikki Linnakangas wrote:
> > >If you can't expire the old row because one of the indexed columns was
> > >modified, I see no reason to try to reduce the additional index entries.
> >
> > It won't enable early expiration, but it means less work to do on update.
> > If there's a lot of indexes, not having to add so many index tuples can be
> > a significant saving.
>
> While catching up on this thread, the following idea came to me that I
> think would allow for not updating an index on an UPDATE if it's key
> doesn't change. If I understand Bruce's SITC proposal correctly, this
> would differ in that SITC requires that no index keys change.
>
> My idea is that if an UPDATE places the new tuple on the same page as
> the old tuple, it will not create new index entries for any indexes
> where the key doesn't change. This means that when fetching tuples from
> the index, ctid would have to be followed until you found the version
> you wanted OR you found the first ctid that pointed to a different page
> (because that tuple will have it's own index entry) OR you found a tuple
> with a different value for the key of the index you're using (because
> it'd be invalid, and there'd be a different index entry for it). I
> believe that the behavior of the index hint bits would also have to
> change somewhat, as each index entry would now essentially be pointing
> at all the tuples in the ctid chain that exist on a page, not just
> single tuple.
>
> In the case of an UPDATE that needs to put the new tuple on a different
> page, our current behavior would be used. This means that the hint bits
> would still be useful in limiting the number of heap pages you hit. I
> also believe this means that we wouldn't suffer any additional overhead
> from our current code when there isn't much free space on pages.
>
> Since SITC allows for in-page space reuse without vacuuming only when no
> index keys change, it's most useful for very heavily updated tables such
> as session handlers or queue tables, because those tables typically have
> very few indexes, so it's pretty unlikely that an index key will change.
> For more general-purpose tables that have more indexes but still see a
> fair number of updates to a subset of rows, not having to update every
> index would likely be a win. I also don't see any reason why both
> options couldn't be used together.
> --
> Jim C. Nasby, Sr. Engineering Consultant jnasby(at)pervasive(dot)com
> Pervasive Software http://pervasive.com work: 512-231-6117
> vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
>

--
Bruce Momjian bruce(at)momjian(dot)us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Csaba Nagy <nagy(at)ecircle-ag(dot)com>, Hannu Krosing <hannu(at)skype(dot)net>, Mark Woodward <pgsql(at)mohawksoft(dot)com>, "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>, Christopher Browne <cbbrowne(at)acm(dot)org>
Subject: Re: vacuum, performance, and MVCC
Date: 2006-06-27 03:03:19
Message-ID: 20060627030319.GC44573@pervasive.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Jun 26, 2006 at 02:32:59PM -0400, Bruce Momjian wrote:
>
> It is certainly possible to do what you are suggesting, that is have two
> index entries point to same chain head, and have the index access
> routines figure out if the index qualifications still hold, but that
> seems like a lot of overhead.
>
> Also, once there is only one visible row in the chain, removing old
> index entries seems quite complex because you have to have vacuum keep
> the qualifications of each row to figure out which index tuple is the
> valid one (seems messy).

Perhaps my point got lost... in the case where no index keys change
during an update, SITC seems superior in every way to my proposal. My
idea (let's call it Index Tuple Page Consolidation, ITPC) would be
beneficial to UPDATEs that modify one or more index keys but still put
the tuple on the same page. Where SITC would be most useful for tables
that have a very heavy update rate and very few indexes, ITPC would
benefit tables that have more indexes on them; where presumably it's
much more likely for UPDATEs to change at least one index key (which
means SITC goes out the window, if I understand it correctly). If I'm
missing something and SITC can in fact deal with some index keys
changing during an UPDATE, then I see no reason for ITPC.

> ---------------------------------------------------------------------------
>
> Jim C. Nasby wrote:
> > On Sun, Jun 25, 2006 at 09:13:48PM +0300, Heikki Linnakangas wrote:
> > > >If you can't expire the old row because one of the indexed columns was
> > > >modified, I see no reason to try to reduce the additional index entries.
> > >
> > > It won't enable early expiration, but it means less work to do on update.
> > > If there's a lot of indexes, not having to add so many index tuples can be
> > > a significant saving.
> >
> > While catching up on this thread, the following idea came to me that I
> > think would allow for not updating an index on an UPDATE if it's key
> > doesn't change. If I understand Bruce's SITC proposal correctly, this
> > would differ in that SITC requires that no index keys change.
> >
> > My idea is that if an UPDATE places the new tuple on the same page as
> > the old tuple, it will not create new index entries for any indexes
> > where the key doesn't change. This means that when fetching tuples from
> > the index, ctid would have to be followed until you found the version
> > you wanted OR you found the first ctid that pointed to a different page
> > (because that tuple will have it's own index entry) OR you found a tuple
> > with a different value for the key of the index you're using (because
> > it'd be invalid, and there'd be a different index entry for it). I
> > believe that the behavior of the index hint bits would also have to
> > change somewhat, as each index entry would now essentially be pointing
> > at all the tuples in the ctid chain that exist on a page, not just
> > single tuple.
> >
> > In the case of an UPDATE that needs to put the new tuple on a different
> > page, our current behavior would be used. This means that the hint bits
> > would still be useful in limiting the number of heap pages you hit. I
> > also believe this means that we wouldn't suffer any additional overhead
> > from our current code when there isn't much free space on pages.
> >
> > Since SITC allows for in-page space reuse without vacuuming only when no
> > index keys change, it's most useful for very heavily updated tables such
> > as session handlers or queue tables, because those tables typically have
> > very few indexes, so it's pretty unlikely that an index key will change.
> > For more general-purpose tables that have more indexes but still see a
> > fair number of updates to a subset of rows, not having to update every
> > index would likely be a win. I also don't see any reason why both
> > options couldn't be used together.
> > --
> > Jim C. Nasby, Sr. Engineering Consultant jnasby(at)pervasive(dot)com
> > Pervasive Software http://pervasive.com work: 512-231-6117
> > vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
> >
>
> --
> Bruce Momjian bruce(at)momjian(dot)us
> EnterpriseDB http://www.enterprisedb.com
>
> + If your life is a hard drive, Christ can be your backup. +
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings
>

--
Jim C. Nasby, Sr. Engineering Consultant jnasby(at)pervasive(dot)com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Csaba Nagy <nagy(at)ecircle-ag(dot)com>, Hannu Krosing <hannu(at)skype(dot)net>, Mark Woodward <pgsql(at)mohawksoft(dot)com>, "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>, Christopher Browne <cbbrowne(at)acm(dot)org>
Subject: Re: vacuum, performance, and MVCC
Date: 2006-06-27 03:08:24
Message-ID: 200606270308.k5R38Ow16620@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Jim C. Nasby wrote:
> On Mon, Jun 26, 2006 at 02:32:59PM -0400, Bruce Momjian wrote:
> >
> > It is certainly possible to do what you are suggesting, that is have two
> > index entries point to same chain head, and have the index access
> > routines figure out if the index qualifications still hold, but that
> > seems like a lot of overhead.
> >
> > Also, once there is only one visible row in the chain, removing old
> > index entries seems quite complex because you have to have vacuum keep
> > the qualifications of each row to figure out which index tuple is the
> > valid one (seems messy).
>
> Perhaps my point got lost... in the case where no index keys change
> during an update, SITC seems superior in every way to my proposal. My
> idea (let's call it Index Tuple Page Consolidation, ITPC) would be
> beneficial to UPDATEs that modify one or more index keys but still put
> the tuple on the same page. Where SITC would be most useful for tables
> that have a very heavy update rate and very few indexes, ITPC would
> benefit tables that have more indexes on them; where presumably it's
> much more likely for UPDATEs to change at least one index key (which
> means SITC goes out the window, if I understand it correctly). If I'm
> missing something and SITC can in fact deal with some index keys
> changing during an UPDATE, then I see no reason for ITPC.

I understood what you had said. The question is whether we want to get
that complex with this feature, and if there are enough use cases
(UPDATE with index keys changing) to warrant it.

--
Bruce Momjian bruce(at)momjian(dot)us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: Hannu Krosing <hannu(at)skype(dot)net>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Csaba Nagy <nagy(at)ecircle-ag(dot)com>, Mark Woodward <pgsql(at)mohawksoft(dot)com>, "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>, Christopher Browne <cbbrowne(at)acm(dot)org>
Subject: Re: vacuum, performance, and MVCC
Date: 2006-06-27 07:38:11
Message-ID: 1151393893.3516.7.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Ühel kenal päeval, E, 2006-06-26 kell 23:08, kirjutas Bruce Momjian:
> Jim C. Nasby wrote:
> > On Mon, Jun 26, 2006 at 02:32:59PM -0400, Bruce Momjian wrote:
> > >
> > > It is certainly possible to do what you are suggesting, that is have two
> > > index entries point to same chain head, and have the index access
> > > routines figure out if the index qualifications still hold, but that
> > > seems like a lot of overhead.

I think Jim meant not 2 pointing to the same head, but 2 pointing into
the same chain. Say we have table with (id serial, ts timestamp) where
ts changes at each update and id does not.

So after 3 updates on one page we have one CITC/ITPC head with pointers
from both indexes and two follow-up tuples with pointers from only ts
index.

The problem with this setup is, that we can't reuse any of those
follow-up tuples without index cleanup.

> > > Also, once there is only one visible row in the chain, removing old
> > > index entries seems quite complex because you have to have vacuum keep
> > > the qualifications of each row to figure out which index tuple is the
> > > valid one (seems messy).
> >
> > Perhaps my point got lost... in the case where no index keys change
> > during an update, SITC seems superior in every way to my proposal. My
> > idea (let's call it Index Tuple Page Consolidation, ITPC) would be
> > beneficial to UPDATEs that modify one or more index keys but still put
> > the tuple on the same page. Where SITC would be most useful for tables
> > that have a very heavy update rate and very few indexes, ITPC would
> > benefit tables that have more indexes on them; where presumably it's
> > much more likely for UPDATEs to change at least one index key (which
> > means SITC goes out the window, if I understand it correctly). If I'm
> > missing something and SITC can in fact deal with some index keys
> > changing during an UPDATE, then I see no reason for ITPC.
>
> I understood what you had said. The question is whether we want to get
> that complex with this feature, and if there are enough use cases
> (UPDATE with index keys changing) to warrant it.

I'd like to think that most heavily-updated tables avoid that, but there
may be still cases where this is needed.

--
----------------
Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me: callto:hkrosing
Get Skype for free: http://www.skype.com


From: Hannu Krosing <hannu(at)skype(dot)net>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Csaba Nagy <nagy(at)ecircle-ag(dot)com>, Mark Woodward <pgsql(at)mohawksoft(dot)com>, "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>, Christopher Browne <cbbrowne(at)acm(dot)org>
Subject: Re: vacuum, performance, and MVCC
Date: 2006-06-27 08:40:05
Message-ID: 1151397605.3516.13.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Ühel kenal päeval, T, 2006-06-27 kell 10:38, kirjutas Hannu Krosing:
> Ühel kenal päeval, E, 2006-06-26 kell 23:08, kirjutas Bruce Momjian:
> > Jim C. Nasby wrote:
> > > On Mon, Jun 26, 2006 at 02:32:59PM -0400, Bruce Momjian wrote:
> > > >
> > > > It is certainly possible to do what you are suggesting, that is have two
> > > > index entries point to same chain head, and have the index access
> > > > routines figure out if the index qualifications still hold, but that
> > > > seems like a lot of overhead.
>
> I think Jim meant not 2 pointing to the same head, but 2 pointing into
> the same chain. Say we have table with (id serial, ts timestamp) where
> ts changes at each update and id does not.
>
> So after 3 updates on one page we have one CITC/ITPC head with pointers
> from both indexes and two follow-up tuples with pointers from only ts
> index.
>
> The problem with this setup is, that we can't reuse any of those
> follow-up tuples without index cleanup.

But we still have to think about similar cases (index entries pointing
inside CITC chains), unless we plan to disallow adding indexes to
tables.

Perhaps that case has to simply disable heap tuple reuse until some
event. what would that event be?

Or maybe we should have some bitmap of dirty tuple ids inside each page,
that is tuple ids that have index pointers to them. and then avoid using
these ?

--
----------------
Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me: callto:hkrosing
Get Skype for free: http://www.skype.com


From: PFC <lists(at)peufeu(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: vacuum, performance, and MVCC
Date: 2006-06-27 08:42:54
Message-ID: op.tbsqhsxacigqcu@apollo13
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


> My idea is that if an UPDATE places the new tuple on the same page as
> the old tuple, it will not create new index entries for any indexes
> where the key doesn't change.

Basically the idea behind preventing index bloat by updates is to have
one index tuple point to several actual tuples having the same value.

So : Index entry -> list of tuples having the same value -> actual tuples
(-> represents an indirection)

I proposed to put the list of tuples in the index ; you propose to put it
in data pages.

I think both solutions have pros and cons :

* List of tuples in the index :
+ reduces index size, makes cacheability in RAM more likely
+ speeds up index scans
- complexity
- slows down modifications to the index (a bit)

* List of tuples in the page
+ simpler to implement
+ reduces index size, but less so than previous solution
- useless if UPDATE puts the new tuple on a different page

I guess the best solution would be a mix of both.

Also, I insist (again) that there is a lot to gain by using a bit of
compression on the data pages, even if it's very simple compression like
storing the new version of a row as a difference from the previous version
(ie. only store the columns that changed).
I think DB2 stores the latest version entirely, and stores the previous
versions as a delta. This is more efficient.

In the case of tables containing TEXT values, these could also get
TOASTed. When an update does not modify the TOASTed columns, it would be
nice to simply be able to keep the reference to the TOASTed data instead
of decompressing it and recompressing it. Or is it already the case ?


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Hannu Krosing <hannu(at)skype(dot)net>
Cc: "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Csaba Nagy <nagy(at)ecircle-ag(dot)com>, Mark Woodward <pgsql(at)mohawksoft(dot)com>, "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>, Christopher Browne <cbbrowne(at)acm(dot)org>
Subject: Re: vacuum, performance, and MVCC
Date: 2006-06-27 16:16:27
Message-ID: 200606271616.k5RGGRi26939@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hannu Krosing wrote:
> ?hel kenal p?eval, T, 2006-06-27 kell 10:38, kirjutas Hannu Krosing:
> > ?hel kenal p?eval, E, 2006-06-26 kell 23:08, kirjutas Bruce Momjian:
> > > Jim C. Nasby wrote:
> > > > On Mon, Jun 26, 2006 at 02:32:59PM -0400, Bruce Momjian wrote:
> > > > >
> > > > > It is certainly possible to do what you are suggesting, that is have two
> > > > > index entries point to same chain head, and have the index access
> > > > > routines figure out if the index qualifications still hold, but that
> > > > > seems like a lot of overhead.
> >
> > I think Jim meant not 2 pointing to the same head, but 2 pointing into
> > the same chain. Say we have table with (id serial, ts timestamp) where
> > ts changes at each update and id does not.
> >
> > So after 3 updates on one page we have one CITC/ITPC head with pointers
> > from both indexes and two follow-up tuples with pointers from only ts
> > index.
> >
> > The problem with this setup is, that we can't reuse any of those
> > follow-up tuples without index cleanup.
>
> But we still have to think about similar cases (index entries pointing
> inside CITC chains), unless we plan to disallow adding indexes to
> tables.

CREATE INDEX has to undo any chains where the new indexed columns change
in the chain, and add index entries to remove the chain.

--
Bruce Momjian bruce(at)momjian(dot)us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: PFC <lists(at)peufeu(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: vacuum, performance, and MVCC
Date: 2006-06-27 16:18:23
Message-ID: 200606271618.k5RGINR27070@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

PFC wrote:
>
> > My idea is that if an UPDATE places the new tuple on the same page as
> > the old tuple, it will not create new index entries for any indexes
> > where the key doesn't change.
>
> Basically the idea behind preventing index bloat by updates is to have
> one index tuple point to several actual tuples having the same value.
>

The idea is not to avoid index bloat, but to allow heap reuse, and having
one index entry for multiple versions of an UPDATEd row is merely an
implementation detail.

--
Bruce Momjian bruce(at)momjian(dot)us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Csaba Nagy <nagy(at)ecircle-ag(dot)com>, Hannu Krosing <hannu(at)skype(dot)net>, Mark Woodward <pgsql(at)mohawksoft(dot)com>, "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>, Christopher Browne <cbbrowne(at)acm(dot)org>
Subject: Re: vacuum, performance, and MVCC
Date: 2006-06-27 16:18:52
Message-ID: 20060627161852.GH44573@pervasive.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Jun 26, 2006 at 11:08:24PM -0400, Bruce Momjian wrote:
> Jim C. Nasby wrote:
> > On Mon, Jun 26, 2006 at 02:32:59PM -0400, Bruce Momjian wrote:
> > >
> > > It is certainly possible to do what you are suggesting, that is have two
> > > index entries point to same chain head, and have the index access
> > > routines figure out if the index qualifications still hold, but that
> > > seems like a lot of overhead.
> > >
> > > Also, once there is only one visible row in the chain, removing old
> > > index entries seems quite complex because you have to have vacuum keep
> > > the qualifications of each row to figure out which index tuple is the
> > > valid one (seems messy).
> >
> > Perhaps my point got lost... in the case where no index keys change
> > during an update, SITC seems superior in every way to my proposal. My
> > idea (let's call it Index Tuple Page Consolidation, ITPC) would be
> > beneficial to UPDATEs that modify one or more index keys but still put
> > the tuple on the same page. Where SITC would be most useful for tables
> > that have a very heavy update rate and very few indexes, ITPC would
> > benefit tables that have more indexes on them; where presumably it's
> > much more likely for UPDATEs to change at least one index key (which
> > means SITC goes out the window, if I understand it correctly). If I'm
> > missing something and SITC can in fact deal with some index keys
> > changing during an UPDATE, then I see no reason for ITPC.
>
> I understood what you had said. The question is whether we want to get
> that complex with this feature, and if there are enough use cases
> (UPDATE with index keys changing) to warrant it.

Ideas on how to test a table to see how many tuples would fit this
criteria?

Or we could just shelve ITPC as a possibility in the future, after we
see how much the limitations of SITC hurt.
--
Jim C. Nasby, Sr. Engineering Consultant jnasby(at)pervasive(dot)com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Csaba Nagy <nagy(at)ecircle-ag(dot)com>, Hannu Krosing <hannu(at)skype(dot)net>, Mark Woodward <pgsql(at)mohawksoft(dot)com>, "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>, Christopher Browne <cbbrowne(at)acm(dot)org>
Subject: Re: vacuum, performance, and MVCC
Date: 2006-06-27 16:23:29
Message-ID: 200606271623.k5RGNT427311@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Jim C. Nasby wrote:
> > > Perhaps my point got lost... in the case where no index keys change
> > > during an update, SITC seems superior in every way to my proposal. My
> > > idea (let's call it Index Tuple Page Consolidation, ITPC) would be
> > > beneficial to UPDATEs that modify one or more index keys but still put
> > > the tuple on the same page. Where SITC would be most useful for tables
> > > that have a very heavy update rate and very few indexes, ITPC would
> > > benefit tables that have more indexes on them; where presumably it's
> > > much more likely for UPDATEs to change at least one index key (which
> > > means SITC goes out the window, if I understand it correctly). If I'm
> > > missing something and SITC can in fact deal with some index keys
> > > changing during an UPDATE, then I see no reason for ITPC.
> >
> > I understood what you had said. The question is whether we want to get
> > that complex with this feature, and if there are enough use cases
> > (UPDATE with index keys changing) to warrant it.
>
> Ideas on how to test a table to see how many tuples would fit this
> criteria?
>
> Or we could just shelve ITPC as a possibility in the future, after we
> see how much the limitations of SITC hurt.

Probably. I am not sure even SITC is a win given the complexity it will
add, but I think it is worth trying. Getting into more complex cases
where chains change indexed values seems like something we could try
later if we have to.

--
Bruce Momjian bruce(at)momjian(dot)us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
To: PFC <lists(at)peufeu(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: vacuum, performance, and MVCC
Date: 2006-06-27 16:29:39
Message-ID: 20060627162939.GI44573@pervasive.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Jun 27, 2006 at 10:42:54AM +0200, PFC wrote:
> Also, I insist (again) that there is a lot to gain by using a bit of
> compression on the data pages, even if it's very simple compression like
> storing the new version of a row as a difference from the previous version
> (ie. only store the columns that changed).
> I think DB2 stores the latest version entirely, and stores the
> previous versions as a delta. This is more efficient.

This would only help on tables that:

have many columns[1]
are frequently updated
the updates normally touch few columns

[1] I'm assuming that un-changed toasted fields keep the same pointer

I'm doubtful that that case is common enough to warrant the amount of
work that would be involved in doing this. I think it might be useful to
consider ways to make vertical partitioning easier, since that's the
common means to reduce the impact of these scenarios.

> In the case of tables containing TEXT values, these could also get
> TOASTed. When an update does not modify the TOASTed columns, it would be
> nice to simply be able to keep the reference to the TOASTed data instead
> of decompressing it and recompressing it. Or is it already the case ?

Hopefully it is, but I'm not sure... something that would be good is a
means to force fields to be toasted sooner than when the tuple is bigger
than 2k, because that'd be a very easy way to get gains from vertical
partitioning.
--
Jim C. Nasby, Sr. Engineering Consultant jnasby(at)pervasive(dot)com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461


From: Greg Stark <gsstark(at)mit(dot)edu>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: PFC <lists(at)peufeu(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: vacuum, performance, and MVCC
Date: 2006-06-27 18:28:43
Message-ID: 87k672wizo.fsf@stark.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Bruce Momjian <bruce(at)momjian(dot)us> writes:

> PFC wrote:
> >
> > > My idea is that if an UPDATE places the new tuple on the same page as
> > > the old tuple, it will not create new index entries for any indexes
> > > where the key doesn't change.
> >
> > Basically the idea behind preventing index bloat by updates is to have
> > one index tuple point to several actual tuples having the same value.
> >
>
> The idea is not to avoid index bloat, but to allow heap reuse, and having
> one index entry for multiple versions of an UPDATEd row is merely an
> implementation detail.

It sort of sounds like you're describing a whole new index type that stores
only the page, not the precise record of any tuple it indexes. If your table
has only such indexes then you never need to worry about updating indexes if
your new tuple version goes on the same page as the old one.

It's an interesting thought experiment. It might trade off a lot of work in
index maintenance as well as saving space in the index for a lot of additional
work performing index scans. There can easily be enough tuples on a page to
make scanning the entire page pretty costly.

--
greg


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: PFC <lists(at)peufeu(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: vacuum, performance, and MVCC
Date: 2006-06-27 18:37:32
Message-ID: 200606271837.k5RIbW522076@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Greg Stark wrote:
>
> Bruce Momjian <bruce(at)momjian(dot)us> writes:
>
> > PFC wrote:
> > >
> > > > My idea is that if an UPDATE places the new tuple on the same page as
> > > > the old tuple, it will not create new index entries for any indexes
> > > > where the key doesn't change.
> > >
> > > Basically the idea behind preventing index bloat by updates is to have
> > > one index tuple point to several actual tuples having the same value.
> > >
> >
> > The idea is not to avoid index bloat, but to allow heap reuse, and having
> > one index entry for multiple versions of an UPDATEd row is merely an
> > implementation detail.
>
> It sort of sounds like you're describing a whole new index type that stores
> only the page, not the precise record of any tuple it indexes. If your table

Background, indexes point to page item pointers, not to actual offsets
in the page. This is how vacuum can move around tuples without modifying the
indexes. The index points to a page item pointer that is a chain of
tuples with the same indexed columns.

--
Bruce Momjian bruce(at)momjian(dot)us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: Hannu Krosing <hannu(at)skype(dot)net>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Csaba Nagy <nagy(at)ecircle-ag(dot)com>, Mark Woodward <pgsql(at)mohawksoft(dot)com>, "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>, Christopher Browne <cbbrowne(at)acm(dot)org>
Subject: Re: vacuum, performance, and MVCC
Date: 2006-06-29 07:30:30
Message-ID: 1151566230.2897.21.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Ühel kenal päeval, T, 2006-06-27 kell 12:16, kirjutas Bruce Momjian:
> Hannu Krosing wrote:
> > ?hel kenal p?eval, T, 2006-06-27 kell 10:38, kirjutas Hannu Krosing:
> > > ?hel kenal p?eval, E, 2006-06-26 kell 23:08, kirjutas Bruce Momjian:
> > > > Jim C. Nasby wrote:
> > > > > On Mon, Jun 26, 2006 at 02:32:59PM -0400, Bruce Momjian wrote:
> > > > > >
> > > > > > It is certainly possible to do what you are suggesting, that is have two
> > > > > > index entries point to same chain head, and have the index access
> > > > > > routines figure out if the index qualifications still hold, but that
> > > > > > seems like a lot of overhead.
> > >
> > > I think Jim meant not 2 pointing to the same head, but 2 pointing into
> > > the same chain. Say we have table with (id serial, ts timestamp) where
> > > ts changes at each update and id does not.
> > >
> > > So after 3 updates on one page we have one CITC/ITPC head with pointers
> > > from both indexes and two follow-up tuples with pointers from only ts
> > > index.
> > >
> > > The problem with this setup is, that we can't reuse any of those
> > > follow-up tuples without index cleanup.
> >
> > But we still have to think about similar cases (index entries pointing
> > inside CITC chains), unless we plan to disallow adding indexes to
> > tables.
>
> CREATE INDEX has to undo any chains where the new indexed columns change
> in the chain, and add index entries to remove the chain.

Yes, that would be the most straightforward solution.

It could be better in some cases, if we could avoid adding entries to
other indexes. Maybe we can just reset some flags, so that some SITC ops
like finding tuples by the CITC index pointer still work while adding
new entries wont.

But it will be tricky to make this work for bitmap index scans.

So yes, index build is a slop operation anyway, so making it even a
little slower is probably not a big problem. And most CITC chains will
have only one visible row at a time, this will probably not be a big
issue.

--
----------------
Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me: callto:hkrosing
Get Skype for free: http://www.skype.com