Re: Including Snapshot Info with Indexes

From: "Gokulakannan Somasundaram" <gokul007(at)gmail(dot)com>
To: "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Including Snapshot Info with Indexes
Date: 2007-10-08 10:01:27
Message-ID: 9362e74e0710080301n2583ab33hb228d0fb78d812e9@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

On 10/8/07, Heikki Linnakangas <heikki(at)enterprisedb(dot)com> wrote:
>
> Gokulakannan Somasundaram wrote:
> > Currently The index implementation in Postgresql does not store the
> > Snapshot information in the Index. If we add the snapshot information
> into
> > the indexing structure, we will have the following advantages.
>
> This idea has been discussed to death many times before. Please search
> the archives.
>
> > a) There can be index only scans like Oracle
>
> IMO, the most promising approach to achieving index-only-scans at the
> moment is the Dead Space Map, as discussed in the 8.3 dev cycle.

Index only scans means that in order to get certain results, we may not
goto the table at all. For example, if you have an index on columns a and b,
and if there is a query like "select b from table where a between a1 and
a2", then the explain plan need not goto the table. I can't understand how
dead space map will provide such a functionality. In short each index will
act like an Index Organized Table, if the all the columns of the query are
present in the index.

> b) Unique indexes will become less costly, as older index tuples can be
> > found out.
>
> Doesn't seem like a big benefit, considering that in most cases there
> won't be any tuples in the index with a duplicate key. A common
> exception to that is (non-HOT) updating a row. But in that case, the
> page containing the old tuple is already in cache, so the lookup of the
> visibility from the heap is cheap.

Its not a big benefit. agreed.

> c) Even the index scans will get faster, since some of the index tuples
> > won't translate into HeapScans.
>
> That's the same as doing an index-only-scan, right?

No here if you have an index on a(say). If there is a query like select *
form table where a between a1 and a2, currently the scan goes to the table
to verify the visibility. Of course if the tuple satisfies vacuum, then it
is marked in the index, which is an optimization. This is not index-only
scan. This is a normal index scan, which can skip certain random I/Os.

> d) Deletes and Updates will become slightly costly, as they have to update
> > these indexes.
>
> I think you're grossly underestimating the cost of that. For example, on
> a table with 3 indexes. a delete currently requires one index lookup +
> one heap lookup. With visibility in the indexes, that would require 3
> index lookups + one heap lookup. That's 4 vs. 2 page accesses, not
> taking into account the non-leaf b-tree pages. The real impact will
> depend on what's in cache, but the cost can be very high.

That's true. But i am not asking to replace the current index
implementation, but to provide an extra option while indexing. Say if a
particular database setup doesn't do much deletes and updates(imagine tables
with partitioning, where the partitions/tables are dropped instead of
deletes. They can have an option to "create index .. with snapshot"

Imagine the Index Vacuum also will do lesser Random I/Os

Also, the full visibility information would need 12 bytes of space per
> tuple. An index tuple on an int4 key currently takes 12 bytes, so that
> would double the index size. Storage size has a big impact on
> performance. More bytes means more I/O, less data fits in cache, and
> more WAL traffic.

I am thinking of certain optimizations here. we have a bit unused in
indextuple structure. If a particular tuple is not deleted, then we can
signify that using that bit and save 6 bytes of saving the xmax and cmax. We
are trading of this space efficiency in place of Random I/Os, which is not a
bad trade-off , i suppose. Again this is going to optional for the user. If
users have an option to create Bitmap index/ Binary index, why can't they
have this option as well?

There's non-trivial implementation issues involved as well. You'd need a
> way to reliably find all the index pointers for a given heap tuple
> (search the archives for "retail vacuum" for the issues involved in
> that. Broken user defined functions are a problem for example). And
> you'd need to keep them all locked at the same time to modify them all
> atomically, which is prone to deadlocks.

I think Vacuum need not goto the table, as the visibility information is
present in the index itself. I don't know whether i have given the correct
answer here.

Expecting your reply..

Thanks,
Gokul.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Gokulakannan Somasundaram 2007-10-08 10:06:48 Re: Including Snapshot Info with Indexes
Previous Message Heikki Linnakangas 2007-10-08 09:18:52 Re: Improving the Performance of Full Table Updates

Browse pgsql-patches by date

  From Date Subject
Next Message Gokulakannan Somasundaram 2007-10-08 10:06:48 Re: Including Snapshot Info with Indexes
Previous Message Heikki Linnakangas 2007-10-08 09:17:00 Re: Including Snapshot Info with Indexes