Re: Covering Indexes

Lists: pgsql-hackers
From: "David E(dot) Wheeler" <david(at)justatheory(dot)com>
To: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Covering Indexes
Date: 2012-06-28 12:16:45
Message-ID: 7138506E-2A5D-4AA6-A8CD-DC9FB4D7344E@justatheory.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hackers,

Very interesting design document for SQLite 4:

http://www.sqlite.org/src4/doc/trunk/www/design.wiki

I'm particularly intrigued by "covering indexes". For example:

CREATE INDEX cover1 ON table1(a,b) COVERING(c,d);

This allows the following query to do an index-only scan:

SELECT c, d FROM table1 WHERE a=? AND b=?;

Now that we have index-only scans in 9.2, I'm wondering if it would make sense to add covering index support, too, where additional, unindexed columns are stored alongside indexed columns.

And I wonder if it would work well with expressions, too?

David


From: Andreas Joseph Krogh <andreak(at)officenet(dot)no>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Covering Indexes
Date: 2012-06-28 12:56:36
Message-ID: 4FEC5484.1090908@officenet.no
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 06/28/2012 02:16 PM, David E. Wheeler wrote:
> Hackers,
>
> Very interesting design document for SQLite 4:
>
> http://www.sqlite.org/src4/doc/trunk/www/design.wiki
>
> I'm particularly intrigued by "covering indexes". For example:
>
> CREATE INDEX cover1 ON table1(a,b) COVERING(c,d);
>
> This allows the following query to do an index-only scan:
>
> SELECT c, d FROM table1 WHERE a=? AND b=?;
>
> Now that we have index-only scans in 9.2, I'm wondering if it would make sense to add covering index support, too, where additional, unindexed columns are stored alongside indexed columns.
>
> And I wonder if it would work well with expressions, too?
>
> David

This is analogous to SQL Server's "include" :

|CREATE NONCLUSTERED INDEX my_idx|
|ON my_table (status)|
|INCLUDE (someColumn, otherColumn)|

Which is useful, but bloats the index.

--
Andreas Joseph Krogh<andreak(at)officenet(dot)no> - mob: +47 909 56 963
Senior Software Developer / CEO - OfficeNet AS - http://www.officenet.no
Public key: http://home.officenet.no/~andreak/public_key.asc


From: Rob Wultsch <wultsch(at)gmail(dot)com>
To: "David E(dot) Wheeler" <david(at)justatheory(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Covering Indexes
Date: 2012-06-28 13:02:23
Message-ID: CAGdn2ugWfeuJO-QVzsYc=NOWDdh0WpTunOWLOZ-0+9QZjh15=w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Jun 28, 2012 at 8:16 AM, David E. Wheeler <david(at)justatheory(dot)com> wrote:
> Hackers,
>
> Very interesting design document for SQLite 4:
>
>  http://www.sqlite.org/src4/doc/trunk/www/design.wiki
>
> I'm particularly intrigued by "covering indexes". For example:
>
>    CREATE INDEX cover1 ON table1(a,b) COVERING(c,d);
>
> This allows the following query to do an index-only scan:
>
>    SELECT c, d FROM table1 WHERE a=? AND b=?;
>
> Now that we have index-only scans in 9.2, I'm wondering if it would make sense to add covering index support, too, where additional, unindexed columns are stored alongside indexed columns.
>
> And I wonder if it would work well with expressions, too?
>
> David

IRC MS SQL also allow unindexed columns in the index.

--
Rob Wultsch
wultsch(at)gmail(dot)com


From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: "David E(dot) Wheeler" <david(at)justatheory(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Covering Indexes
Date: 2012-06-28 15:37:55
Message-ID: CAMkU=1yDHXMw_8SKA-O=tLbiMbMKmwYedE=pBhM7KDoB+cp+eg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Jun 28, 2012 at 5:16 AM, David E. Wheeler <david(at)justatheory(dot)com> wrote:
> Hackers,
>
> Very interesting design document for SQLite 4:
>
>  http://www.sqlite.org/src4/doc/trunk/www/design.wiki
>
> I'm particularly intrigued by "covering indexes". For example:
>
>    CREATE INDEX cover1 ON table1(a,b) COVERING(c,d);

I don't see the virtue of this in this case. Since the index is not
unique, why not just put the index on (a,b,c,d) and be done with it?
Is there some advantage to be had in inventing a way to store c and d
in the index without having them usable for indexing?

Cheers,

Jeff


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: "David E(dot) Wheeler" <david(at)justatheory(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Covering Indexes
Date: 2012-06-28 15:50:30
Message-ID: 4FEC7D46.40306@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 06/28/2012 11:37 AM, Jeff Janes wrote:
> On Thu, Jun 28, 2012 at 5:16 AM, David E. Wheeler<david(at)justatheory(dot)com> wrote:
>> Hackers,
>>
>> Very interesting design document for SQLite 4:
>>
>> http://www.sqlite.org/src4/doc/trunk/www/design.wiki
>>
>> I'm particularly intrigued by "covering indexes". For example:
>>
>> CREATE INDEX cover1 ON table1(a,b) COVERING(c,d);
> I don't see the virtue of this in this case. Since the index is not
> unique, why not just put the index on (a,b,c,d) and be done with it?
> Is there some advantage to be had in inventing a way to store c and d
> in the index without having them usable for indexing?
>

Presumably the comparison function will be faster the fewer attributes
it needs to compare.

cheers

andrew


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, "David E(dot) Wheeler" <david(at)justatheory(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Covering Indexes
Date: 2012-06-28 16:07:58
Message-ID: 14261.1340899678@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> On 06/28/2012 11:37 AM, Jeff Janes wrote:
>> On Thu, Jun 28, 2012 at 5:16 AM, David E. Wheeler<david(at)justatheory(dot)com> wrote:
>>> I'm particularly intrigued by "covering indexes". For example:
>>> CREATE INDEX cover1 ON table1(a,b) COVERING(c,d);

>> I don't see the virtue of this in this case. Since the index is not
>> unique, why not just put the index on (a,b,c,d) and be done with it?

> Presumably the comparison function will be faster the fewer attributes
> it needs to compare.

Well, no, because queries will only be comparing the columns used in
the query. Insertions would likely actually be faster with the extra
columns considered significant, since we've known for a long time that
btree doesn't especially like large numbers of identical index entries.

When this came up a couple weeks ago, the argument that was made for it
was that you could attach non-significant columns to an index that *is*
unique. That might or might not be a wide enough use-case to justify
adding such a horrid kludge.

regards, tom lane


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, David E(dot) Wheeler <david(at)justatheory(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Covering Indexes
Date: 2012-06-28 16:12:14
Message-ID: 1340899866-sup-4764@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Excerpts from Tom Lane's message of jue jun 28 12:07:58 -0400 2012:

> When this came up a couple weeks ago, the argument that was made for it
> was that you could attach non-significant columns to an index that *is*
> unique. That might or might not be a wide enough use-case to justify
> adding such a horrid kludge.

The other question is whether such an index would prevent an update from
being HOT when the non-indexed values are touched. That could be a
significant difference.

--
Álvaro Herrera <alvherre(at)commandprompt(dot)com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: Aidan Van Dyk <aidan(at)highrise(dot)ca>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, "David E(dot) Wheeler" <david(at)justatheory(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Covering Indexes
Date: 2012-06-28 16:23:52
Message-ID: CAC_2qU-GbSKjY3B2Bsrh9jY6JctPBy3pFYw-cQ+rF8TNB+J7xg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Jun 28, 2012 at 12:12 PM, Alvaro Herrera
<alvherre(at)commandprompt(dot)com> wrote:

> The other question is whether such an index would prevent an update from
> being HOT when the non-indexed values are touched.  That could be a
> significant difference.

I don't see Index-Only-Scans being something that will be used in
"high churn" tables.

So as long as the value of these "covering/included" fields is tied to
index-only scans, maybe it isn't a problem?

Of course, we have have a hard time convincing people that the "index
only" scans they want can't be "index only" because heap pages aren't
"all visible"...

a.

--
Aidan Van Dyk                                             Create like a god,
aidan(at)highrise(dot)ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, "David E(dot) Wheeler" <david(at)justatheory(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Covering Indexes
Date: 2012-06-28 16:32:43
Message-ID: 14811.1340901163@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> The other question is whether such an index would prevent an update from
> being HOT when the non-indexed values are touched.

Surely it would *have* to, whether the columns are significant or not
for uniqueness purposes. Else an index-only scan gives the wrong value
after the update.

regards, tom lane


From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Dunstan <andrew(at)dunslane(dot)net>, "David E(dot) Wheeler" <david(at)justatheory(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Covering Indexes
Date: 2012-06-28 16:53:13
Message-ID: CAMkU=1yHznedk9O4bqZ55RY5Hx8d=cFwmOk2O5rk-eMd7QUhZg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Jun 28, 2012 at 9:12 AM, Alvaro Herrera
<alvherre(at)commandprompt(dot)com> wrote:
>
> Excerpts from Tom Lane's message of jue jun 28 12:07:58 -0400 2012:
>
>> When this came up a couple weeks ago, the argument that was made for it
>> was that you could attach non-significant columns to an index that *is*
>> unique.  That might or might not be a wide enough use-case to justify
>> adding such a horrid kludge.
>
> The other question is whether such an index would prevent an update from
> being HOT when the non-indexed values are touched.

That seems like an easy question to answer. How could it not disable
HOT and still work correctly?

> That could be a
> significant difference.

True, adding the covering column would not always be a win. But
surely it more likely to be a win when it can be done without adding
yet another index that also needs to be maintained.

Cheers,

Jeff


From: Csaba Nagy <ncslists(at)googlemail(dot)com>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: "David E(dot) Wheeler" <david(at)justatheory(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Covering Indexes
Date: 2012-06-29 06:10:03
Message-ID: 1340950203.2969.107.camel@clnt-sysecm-cnagy
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi all,

> On Thu, Jun 28, 2012 at 5:16 AM, David E. Wheeler <david(at)justatheory(dot)com> wrote:
> I don't see the virtue of this in this case. Since the index is not
> unique, why not just put the index on (a,b,c,d) and be done with it?
> Is there some advantage to be had in inventing a way to store c and d
> in the index without having them usable for indexing?

Why not restrict it to UNIQUE indexes ?

For not unique indexes it has no advantages (it could be in fact indexed
on all columns anyway as an implementation detail).

For the unique case the problem of many identical entries mentioned by
Tom is not relevant, so the additional data will only bloat the index
but not otherwise affect the index performance.

Would this get close enough to index-covered table ? _That_ would be
interesting - I have a really big table (table/index size: 370G/320G,
~8*10^9 rows) which is basically using double space because it's primary
key is covering all columns of the table.

Cheers,
Csaba.


From: Eric McKeeth <eldin00(at)gmail(dot)com>
To: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Covering Indexes
Date: 2012-06-29 08:34:23
Message-ID: CAF=gRfw7RBGA5HWsLC7DYeLQpDmZKd4fta4p9XrgX4gid6mdGg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Jun 28, 2012 at 7:02 AM, Rob Wultsch <wultsch(at)gmail(dot)com> wrote:
> On Thu, Jun 28, 2012 at 8:16 AM, David E. Wheeler <david(at)justatheory(dot)com> wrote:
>> Hackers,
>>
>> Very interesting design document for SQLite 4:
>>
>>  http://www.sqlite.org/src4/doc/trunk/www/design.wiki
>>
>> I'm particularly intrigued by "covering indexes". For example:
>>
>>    CREATE INDEX cover1 ON table1(a,b) COVERING(c,d);
>>
>> This allows the following query to do an index-only scan:
>>
>>    SELECT c, d FROM table1 WHERE a=? AND b=?;
>>
>> Now that we have index-only scans in 9.2, I'm wondering if it would make sense to add covering index support, too, where additional, unindexed columns are stored alongside indexed columns.
>>
>> And I wonder if it would work well with expressions, too?
>>
>> David
>
> IRC MS SQL also allow unindexed columns in the index.
>
> --
> Rob Wultsch
> wultsch(at)gmail(dot)com

MS SQL Server does support including non-key columns in indexes,
allowing index only scans for queries returning these columns. Their
syntax is different from that proposed in the linked SQLite document.
To the best of my experience, the advantages in SQL Server to such an
index (as opposed to just adding the columns to the index normally)
are as follows:

1- You can include extra columns in a unique index which do not
participate in the uniqueness determination.
2- The non-key columns can be of types which normally cannot be
included in a b-tree index due to lack of proper sorting operators.
3- The resulting index is smaller, because the non-key columns are
only contained in leaf nodes, not in internal nodes.
4- The insert/update overhead is lower.

Of course, an implementation in a different database system, such as
Postgres, may or may not have the same set of benefits. Number 4 in
particular seems to be dependent on the details of the b-tree
implementation. In my mind numbers 1 and 2 are the most compelling
arguments in favor of this feature. Whether the it's worth the effort
of coding the feature would depend on how well the above benefits (or
any benefits I missed) hold true, and how useful such an index
actually proved for index only scans in Postgres.


From: Thomas Munro <munro(at)ip9(dot)org>
To: Rob Wultsch <wultsch(at)gmail(dot)com>
Cc: "David E(dot) Wheeler" <david(at)justatheory(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Covering Indexes
Date: 2012-06-30 10:12:09
Message-ID: CADLWmXV-oDoD8+A+0n+xv3uE8n86jTTxfQtfMsm8-XXXQTC_JQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 28 June 2012 14:02, Rob Wultsch <wultsch(at)gmail(dot)com> wrote:
> On Thu, Jun 28, 2012 at 8:16 AM, David E. Wheeler <david(at)justatheory(dot)com> wrote:
>> I'm particularly intrigued by "covering indexes". For example:
>>
>>    CREATE INDEX cover1 ON table1(a,b) COVERING(c,d);
>
> IRC MS SQL also allow unindexed columns in the index.

For what it's worth, DB2 also has this feature, written roughly the
same way as MS SQL Server: CREATE INDEX cover1 ON table1(a, b) INCLUDE
(c, d).

http://pic.dhe.ibm.com/infocenter/db2luw/v9r7/index.jsp?topic=/com.ibm.db2.luw.sql.ref.doc/doc/r0000919.html

Oracle doesn't seem to have this feature (and the SQL standard doesn't
mention indexes at all).


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Csaba Nagy <ncslists(at)googlemail(dot)com>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, "David E(dot) Wheeler" <david(at)justatheory(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Covering Indexes
Date: 2012-07-06 13:41:01
Message-ID: 20120706134101.GA6235@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Jun 29, 2012 at 08:10:03AM +0200, Csaba Nagy wrote:
> Hi all,
>
> > On Thu, Jun 28, 2012 at 5:16 AM, David E. Wheeler <david(at)justatheory(dot)com> wrote:
> > I don't see the virtue of this in this case. Since the index is not
> > unique, why not just put the index on (a,b,c,d) and be done with it?
> > Is there some advantage to be had in inventing a way to store c and d
> > in the index without having them usable for indexing?
>
> Why not restrict it to UNIQUE indexes ?
>
> For not unique indexes it has no advantages (it could be in fact indexed
> on all columns anyway as an implementation detail).
>
> For the unique case the problem of many identical entries mentioned by
> Tom is not relevant, so the additional data will only bloat the index
> but not otherwise affect the index performance.
>
> Would this get close enough to index-covered table ? _That_ would be
> interesting - I have a really big table (table/index size: 370G/320G,
> ~8*10^9 rows) which is basically using double space because it's primary
> key is covering all columns of the table.

I was wondering if there was some way to specify an expression index
that did a unique index check on some columns but included more columns
not part of the unique index.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +


From: Cédric Villemain <cedric(at)2ndquadrant(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Csaba Nagy <ncslists(at)googlemail(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, "David E(dot) Wheeler" <david(at)justatheory(dot)com>
Subject: Re: Covering Indexes
Date: 2012-07-06 14:06:56
Message-ID: 201207061607.15834.cedric@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Le vendredi 6 juillet 2012 15:41:01, Bruce Momjian a écrit :
> On Fri, Jun 29, 2012 at 08:10:03AM +0200, Csaba Nagy wrote:
> > Hi all,
> >
> > > On Thu, Jun 28, 2012 at 5:16 AM, David E. Wheeler
> > > <david(at)justatheory(dot)com> wrote: I don't see the virtue of this in this
> > > case. Since the index is not unique, why not just put the index on
> > > (a,b,c,d) and be done with it? Is there some advantage to be had in
> > > inventing a way to store c and d in the index without having them
> > > usable for indexing?
> >
> > Why not restrict it to UNIQUE indexes ?
> >
> > For not unique indexes it has no advantages (it could be in fact indexed
> > on all columns anyway as an implementation detail).
> >
> > For the unique case the problem of many identical entries mentioned by
> > Tom is not relevant, so the additional data will only bloat the index
> > but not otherwise affect the index performance.
> >
> > Would this get close enough to index-covered table ? _That_ would be
> > interesting - I have a really big table (table/index size: 370G/320G,
> > ~8*10^9 rows) which is basically using double space because it's primary
> > key is covering all columns of the table.
>
> I was wondering if there was some way to specify an expression index
> that did a unique index check on some columns but included more columns
> not part of the unique index.

I haven't tryed it, but I suppose that Exclusion Constraint should allow that.

--
Cédric Villemain +33 (0)6 20 30 22 52
http://2ndQuadrant.fr/
PostgreSQL: Support 24x7 - Développement, Expertise et Formation


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Csaba Nagy <ncslists(at)googlemail(dot)com>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, "David E(dot) Wheeler" <david(at)justatheory(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Covering Indexes
Date: 2012-07-06 14:54:43
Message-ID: 27788.1341586483@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Csaba Nagy <ncslists(at)googlemail(dot)com> writes:
> Why not restrict it to UNIQUE indexes ?

What benefit would such a restriction provide? AFAICS it doesn't make
implementation any easier.

regards, tom lane


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: "David E(dot) Wheeler" <david(at)justatheory(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Covering Indexes
Date: 2012-07-17 15:18:19
Message-ID: CA+U5nMKu5UNfjxc4c7LE3EKzu0-EwNFgezj0nrTK108Yw-OK4A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 28 June 2012 13:16, David E. Wheeler <david(at)justatheory(dot)com> wrote:

> Very interesting design document for SQLite 4:
>
> http://www.sqlite.org/src4/doc/trunk/www/design.wiki
>
> I'm particularly intrigued by "covering indexes". For example:
>
> CREATE INDEX cover1 ON table1(a,b) COVERING(c,d);
>
> This allows the following query to do an index-only scan:
>
> SELECT c, d FROM table1 WHERE a=? AND b=?;
>
> Now that we have index-only scans in 9.2, I'm wondering if it would make sense to add covering index support, too, where additional, unindexed columns are stored alongside indexed columns.

Just to be clear, the ability to have covered indexes is already in
9.2, I updated the release notes to explain that a few months back.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: "David E(dot) Wheeler" <david(at)justatheory(dot)com>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Covering Indexes
Date: 2012-07-17 15:21:51
Message-ID: F7D227DF-230D-40CE-BD3D-D3C4401AC705@justatheory.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Jul 17, 2012, at 5:18 PM, Simon Riggs wrote:

>> Now that we have index-only scans in 9.2, I'm wondering if it would make sense to add covering index support, too, where additional, unindexed columns are stored alongside indexed columns.
>
> Just to be clear, the ability to have covered indexes is already in
> 9.2, I updated the release notes to explain that a few months back.

You mean this?

> Allow queries to retrieve data only from indexes, avoiding heap access (Robert Haas, Ibrar Ahmed, Heikki Linnakangas, Tom Lane)
>
> This is often called "index-only scans" or "covering indexes". This is possible for heap pages with exclusively all-visible tuples, as reported by the visibility map. The visibility map was made crash-safe as a necessary part of implementing this feature.

That’s not how SQLite is using the term “covering index.” What they mean is the ability to have additional, unindexed columns in an index, so that they can *also* be returned in the event of an index-only scan.

Best,

David


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: "David E(dot) Wheeler" <david(at)justatheory(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Covering Indexes
Date: 2012-07-17 15:32:28
Message-ID: CA+U5nMJz33ZsvqPzK-AUoindxkQ6eLiP1HgQ53byoDLpwfDWUA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 17 July 2012 16:21, David E. Wheeler <david(at)justatheory(dot)com> wrote:
> On Jul 17, 2012, at 5:18 PM, Simon Riggs wrote:
>
>>> Now that we have index-only scans in 9.2, I'm wondering if it would make sense to add covering index support, too, where additional, unindexed columns are stored alongside indexed columns.
>>
>> Just to be clear, the ability to have covered indexes is already in
>> 9.2, I updated the release notes to explain that a few months back.
>
> You mean this?
>
>> Allow queries to retrieve data only from indexes, avoiding heap access (Robert Haas, Ibrar Ahmed, Heikki Linnakangas, Tom Lane)
>>
>> This is often called "index-only scans" or "covering indexes". This is possible for heap pages with exclusively all-visible tuples, as reported by the visibility map. The visibility map was made crash-safe as a necessary part of implementing this feature.
>
> That’s not how SQLite is using the term “covering index.” What they mean is the ability to have additional, unindexed columns in an index, so that they can *also* be returned in the event of an index-only scan.

CREATE INDEX ON foo (a, b, c, d);

allows

SELECT c, d FROM foo WHERE a = ? AND b = ?

to use an index only scan.

The phrase "unindexed" seems misleading since the data is clearly in
the index from the description on the URL you gave. And since the
index is non-unique, I don't see any gap between Postgres and
SQLliite4.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: "David E(dot) Wheeler" <david(at)justatheory(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Covering Indexes
Date: 2012-07-17 15:54:58
Message-ID: 6F6BA247-3BAD-4906-B95B-751B604BA64E@justatheory.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Jul 17, 2012, at 5:32 PM, Simon Riggs wrote:

> CREATE INDEX ON foo (a, b, c, d);
>
> allows
>
> SELECT c, d FROM foo WHERE a = ? AND b = ?
>
> to use an index only scan.
>
> The phrase "unindexed" seems misleading since the data is clearly in
> the index from the description on the URL you gave. And since the
> index is non-unique, I don't see any gap between Postgres and
> SQLliite4.

Yeah, but that index is unnecessarily big if one will never use c or d in the search. The nice thing about covering indexes as described for SQLite 4 and implemented in MSSQL is that you can specify additional columns that just come along for the ride, but are not part of the indexed data:

CREATE INDEX cover1 ON table1(a,b) COVERING(c,d);

Yes, you can do that by also indexing c and d as of 9.2, but it might be nice to be able to include them in the index as additional row data without actually indexing them.

Best,

David


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: "David E(dot) Wheeler" <david(at)justatheory(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Covering Indexes
Date: 2012-07-17 16:08:59
Message-ID: CA+U5nMLmJSc3ixqnzOwpoAxpjuy0Fop8cxiE6LE1Lbp=j3okhA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 17 July 2012 16:54, David E. Wheeler <david(at)justatheory(dot)com> wrote:
> On Jul 17, 2012, at 5:32 PM, Simon Riggs wrote:
>
>> CREATE INDEX ON foo (a, b, c, d);
>>
>> allows
>>
>> SELECT c, d FROM foo WHERE a = ? AND b = ?
>>
>> to use an index only scan.
>>
>> The phrase "unindexed" seems misleading since the data is clearly in
>> the index from the description on the URL you gave. And since the
>> index is non-unique, I don't see any gap between Postgres and
>> SQLliite4.
>
> Yeah, but that index is unnecessarily big if one will never use c or d in the search. The nice thing about covering indexes as described for SQLite 4 and implemented in MSSQL is that you can specify additional columns that just come along for the ride, but are not part of the indexed data:
>
> CREATE INDEX cover1 ON table1(a,b) COVERING(c,d);
>
> Yes, you can do that by also indexing c and d as of 9.2, but it might be nice to be able to include them in the index as additional row data without actually indexing them.

Can you explain what you mean by "without actually indexing them"?
ISTM that it is a non-feature if the index is already non-unique, and
the difference is simply down to the amount of snake oil applied to
the descriptive text on the release notes.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Vik Reykja <vikreykja(at)gmail(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: "David E(dot) Wheeler" <david(at)justatheory(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Covering Indexes
Date: 2012-07-17 16:19:43
Message-ID: CALDgxVv7GmdRtow1OSpOory7SmvvzQRFVJyjT3aCEQvMAK720A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Jul 17, 2012 at 6:08 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:

> On 17 July 2012 16:54, David E. Wheeler <david(at)justatheory(dot)com> wrote:
> > Yeah, but that index is unnecessarily big if one will never use c or d
> in the search. The nice thing about covering indexes as described for
> SQLite 4 and implemented in MSSQL is that you can specify additional
> columns that just come along for the ride, but are not part of the indexed
> data:
> >
> > CREATE INDEX cover1 ON table1(a,b) COVERING(c,d);
> >
> > Yes, you can do that by also indexing c and d as of 9.2, but it might be
> nice to be able to include them in the index as additional row data without
> actually indexing them.
>
> Can you explain what you mean by "without actually indexing them"?
> ISTM that it is a non-feature if the index is already non-unique, and
> the difference is simply down to the amount of snake oil applied to
> the descriptive text on the release notes.
>

It would be useful in non-unique indexes to store data without ordering
operators (like xml).


From: "David Johnston" <polobo(at)yahoo(dot)com>
To: "'David E(dot) Wheeler'" <david(at)justatheory(dot)com>, "'Simon Riggs'" <simon(at)2ndquadrant(dot)com>
Cc: "'Pg Hackers'" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Covering Indexes
Date: 2012-07-17 16:41:47
Message-ID: 01e801cd643b$0f3a27d0$2dae7770$@yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> -----Original Message-----
> From: pgsql-hackers-owner(at)postgresql(dot)org [mailto:pgsql-hackers-
> owner(at)postgresql(dot)org] On Behalf Of David E. Wheeler
> Sent: Tuesday, July 17, 2012 11:55 AM
> To: Simon Riggs
> Cc: Pg Hackers
> Subject: Re: [HACKERS] Covering Indexes
>
> On Jul 17, 2012, at 5:32 PM, Simon Riggs wrote:
>
> > CREATE INDEX ON foo (a, b, c, d);
> >
> > allows
> >
> > SELECT c, d FROM foo WHERE a = ? AND b = ?
> >
> > to use an index only scan.
> >
> > The phrase "unindexed" seems misleading since the data is clearly in
> > the index from the description on the URL you gave. And since the
> > index is non-unique, I don't see any gap between Postgres and
> > SQLliite4.
>
> Yeah, but that index is unnecessarily big if one will never use c or d in
the
> search. The nice thing about covering indexes as described for SQLite 4
and
> implemented in MSSQL is that you can specify additional columns that just
> come along for the ride, but are not part of the indexed data:
>
> CREATE INDEX cover1 ON table1(a,b) COVERING(c,d);
>
> Yes, you can do that by also indexing c and d as of 9.2, but it might be
nice to
> be able to include them in the index as additional row data without
actually
> indexing them.
>
> Best,
>
> David

Concretely, I would presume that the contents of a covering index could then
look like the following (a,b,c,d):

(2,1,2,A)
(2,1,5,A) <-- the 5 is out of natural order but exists in the "covering"
part
(2,1,3,A)

Whereas PostgreSQL would be forced to have the index ordered as such:

(2,1,2,A)
(2,1,3,A)
(2,1,5,A)

Either way the data in "c" and "d" are IN THE INDEX otherwise in neither
case could the data values be returned while strictly querying the index.

So the question that needs to be asked is what kind of performance increase
can be had during DML (insert/update) statements and whether those gains are
worth pursuing. Since these other engines appear to allow both cases you
should be able to get at least a partial idea of the performance gains
between "index (a,b,c,d)" and "index (a,b) covering (c,d)".

Vik's concurrent point regarding "non-indexable" values makes some sense but
the use case there seems specialized as I suspect that in the general case
values that are non-indexable (if there truly are any) are generally those
that would be too large to warrant sticking into an index in the first
place. But, XML values do ring true in my mind (particularly frequently
used fragments that are generally quite small). But again whether that is a
reasonable use case for a covering index I do not know. It feels like
trying to solve the remaining 10% when it took a long while to even muster
up enough support and resources to solve the 90%.

David J.


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: David Johnston <polobo(at)yahoo(dot)com>
Cc: "David E(dot) Wheeler" <david(at)justatheory(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Covering Indexes
Date: 2012-07-17 17:00:37
Message-ID: CA+U5nM+e+HTXkJGBB4jTcC18_uVeyWkRZLO=oayXtvK2icW-Qg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 17 July 2012 17:41, David Johnston <polobo(at)yahoo(dot)com> wrote:

> Concretely, I would presume that the contents of a covering index could then
> look like the following (a,b,c,d):
>
> (2,1,2,A)
> (2,1,5,A) <-- the 5 is out of natural order but exists in the "covering"
> part
> (2,1,3,A)
>
> Whereas PostgreSQL would be forced to have the index ordered as such:
>
> (2,1,2,A)
> (2,1,3,A)
> (2,1,5,A)
>
> Either way the data in "c" and "d" are IN THE INDEX otherwise in neither
> case could the data values be returned while strictly querying the index.
>
> So the question that needs to be asked is what kind of performance increase
> can be had during DML (insert/update) statements and whether those gains are
> worth pursuing. Since these other engines appear to allow both cases you
> should be able to get at least a partial idea of the performance gains
> between "index (a,b,c,d)" and "index (a,b) covering (c,d)".

There is a use case, already discussed, whereby that is useful for
create unique index on foo (a,b) covering (c,d)

but there really isn't any functional difference between
create index on foo (a,b) covering (c,d)

and
create index on foo (a,b,c,d)

There is a potential performance impact. But as Tom says, that might
even be negative if it is actually measurable.

> Vik's concurrent point regarding "non-indexable" values makes some sense but
> the use case there seems specialized as I suspect that in the general case
> values that are non-indexable (if there truly are any) are generally those
> that would be too large to warrant sticking into an index in the first
> place.

I think it would be easy enough to add noop operators for sorts if you
wanted to do that.

> But, XML values do ring true in my mind (particularly frequently
> used fragments that are generally quite small). But again whether that is a
> reasonable use case for a covering index I do not know. It feels like
> trying to solve the remaining 10% when it took a long while to even muster
> up enough support and resources to solve the 90%.

The main thing is that we definitely already do have covering indexes
and we will be announcing we have that soon. The fact we have chosen
to implement that without adding new syntax strikes me as a selling
point as well, so all client tools still work.

So the feature we are talking about here needs to be called something
else, otherwise we will be confusing people. "Unsorted trailing index
columns"...

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: David Johnston <polobo(at)yahoo(dot)com>
Cc: "'David E(dot) Wheeler'" <david(at)justatheory(dot)com>, 'Simon Riggs' <simon(at)2ndquadrant(dot)com>, 'Pg Hackers' <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Covering Indexes
Date: 2012-07-17 17:01:34
Message-ID: 50059A6E.1030503@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On 07/17/2012 12:41 PM, David Johnston wrote:
>>
>> So the question that needs to be asked is what kind of performance increase
>> can be had during DML (insert/update) statements and whether those gains are
>> worth pursuing. Since these other engines appear to allow both cases you
>> should be able to get at least a partial idea of the performance gains
>> between "index (a,b,c,d)" and "index (a,b) covering (c,d)".

Tom's recent answer to me on this point (as I understood it) was that he
would expect performance to degrade, not improve, since the btree code
is known not to perform well when there are many non-unique values.

cheers

andrew


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "David E(dot) Wheeler" <david(at)justatheory(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Covering Indexes
Date: 2012-07-17 17:13:07
Message-ID: 12966.1342545187@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"David E. Wheeler" <david(at)justatheory(dot)com> <CA+U5nMJz33ZsvqPzK-AUoindxkQ6eLiP1HgQ53byoDLpwfDWUA(at)mail(dot)gmail(dot)com> writes:
> On Jul 17, 2012, at 5:32 PM, Simon Riggs wrote:
>> The phrase "unindexed" seems misleading since the data is clearly in
>> the index from the description on the URL you gave. And since the
>> index is non-unique, I don't see any gap between Postgres and
>> SQLliite4.

> Yeah, but that index is unnecessarily big if one will never use c or d
> in the search.

The data would still have to be stored in the leaf entries, at least.
Yeah, you could possibly omit the "unindexed columns" from upper tree
levels, but with typical btree fanout ratios in the hundreds, the
overall space savings would be negligible. The idea of different index
tuple descriptors on different tree levels doesn't appeal to me, either.

regards, tom lane


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: David Johnston <polobo(at)yahoo(dot)com>, "David E(dot) Wheeler" <david(at)justatheory(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Covering Indexes
Date: 2012-07-26 16:13:47
Message-ID: 20120726161347.GK21271@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Jul 17, 2012 at 06:00:37PM +0100, Simon Riggs wrote:
> > Either way the data in "c" and "d" are IN THE INDEX otherwise in neither
> > case could the data values be returned while strictly querying the index.
> >
> > So the question that needs to be asked is what kind of performance increase
> > can be had during DML (insert/update) statements and whether those gains are
> > worth pursuing. Since these other engines appear to allow both cases you
> > should be able to get at least a partial idea of the performance gains
> > between "index (a,b,c,d)" and "index (a,b) covering (c,d)".
>
> There is a use case, already discussed, whereby that is useful for
> create unique index on foo (a,b) covering (c,d)
>
> but there really isn't any functional difference between
> create index on foo (a,b) covering (c,d)
>
> and
> create index on foo (a,b,c,d)
>
> There is a potential performance impact. But as Tom says, that might
> even be negative if it is actually measurable.

So, do we want a TODO item about adding columns to a unique index that
will not be used for uniqueness checks?

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +


From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, David Johnston <polobo(at)yahoo(dot)com>, "David E(dot) Wheeler" <david(at)justatheory(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Covering Indexes
Date: 2012-07-26 16:25:06
Message-ID: CAHyXU0zsYmPccH_-ntmNFhXc2NkvJRdJfq=TfiQ1TkbtPbzhqQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Jul 26, 2012 at 11:13 AM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> On Tue, Jul 17, 2012 at 06:00:37PM +0100, Simon Riggs wrote:
>> > Either way the data in "c" and "d" are IN THE INDEX otherwise in neither
>> > case could the data values be returned while strictly querying the index.
>> >
>> > So the question that needs to be asked is what kind of performance increase
>> > can be had during DML (insert/update) statements and whether those gains are
>> > worth pursuing. Since these other engines appear to allow both cases you
>> > should be able to get at least a partial idea of the performance gains
>> > between "index (a,b,c,d)" and "index (a,b) covering (c,d)".
>>
>> There is a use case, already discussed, whereby that is useful for
>> create unique index on foo (a,b) covering (c,d)
>>
>> but there really isn't any functional difference between
>> create index on foo (a,b) covering (c,d)
>>
>> and
>> create index on foo (a,b,c,d)
>>
>> There is a potential performance impact. But as Tom says, that might
>> even be negative if it is actually measurable.
>
> So, do we want a TODO item about adding columns to a unique index that
> will not be used for uniqueness checks?

I think so. The case where you want a wide multiple column primary
key to be extended to cover that one extra commonly grabbed value is
not super common but entirely plausible. With the existing
infrastructure to get the advantages of index covering you'd have to
duplicate the entire index for the extra field which really sucks:
aside from the huge waste of time and space, you force the planner to
play the 'which index do i use?' game.

merlin


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, David Johnston <polobo(at)yahoo(dot)com>, "David E(dot) Wheeler" <david(at)justatheory(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Covering Indexes
Date: 2012-07-26 17:17:55
Message-ID: CA+TgmoayDW03EKXExion72mgLrTTRGLRR3zo2XbAR=EL6_5Gpw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Jul 26, 2012 at 12:25 PM, Merlin Moncure <mmoncure(at)gmail(dot)com> wrote:
> On Thu, Jul 26, 2012 at 11:13 AM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
>> On Tue, Jul 17, 2012 at 06:00:37PM +0100, Simon Riggs wrote:
>>> > Either way the data in "c" and "d" are IN THE INDEX otherwise in neither
>>> > case could the data values be returned while strictly querying the index.
>>> >
>>> > So the question that needs to be asked is what kind of performance increase
>>> > can be had during DML (insert/update) statements and whether those gains are
>>> > worth pursuing. Since these other engines appear to allow both cases you
>>> > should be able to get at least a partial idea of the performance gains
>>> > between "index (a,b,c,d)" and "index (a,b) covering (c,d)".
>>>
>>> There is a use case, already discussed, whereby that is useful for
>>> create unique index on foo (a,b) covering (c,d)
>>>
>>> but there really isn't any functional difference between
>>> create index on foo (a,b) covering (c,d)
>>>
>>> and
>>> create index on foo (a,b,c,d)
>>>
>>> There is a potential performance impact. But as Tom says, that might
>>> even be negative if it is actually measurable.
>>
>> So, do we want a TODO item about adding columns to a unique index that
>> will not be used for uniqueness checks?
>
> I think so. The case where you want a wide multiple column primary
> key to be extended to cover that one extra commonly grabbed value is
> not super common but entirely plausible. With the existing
> infrastructure to get the advantages of index covering you'd have to
> duplicate the entire index for the extra field which really sucks:
> aside from the huge waste of time and space, you force the planner to
> play the 'which index do i use?' game.

I think it is going to take several years before we really understand
how index-only scans play out in the real world, and what factors
limit their usefulness. This one has come up a few times because it's
sort of an obvious thing to want to do and we don't have it, but I
think that there's room for some skepticism about how well it will
actually work, for reasons that have already been mentioned, and of
course also because indexing more columns potentially means defeating
HOT, which I suspect will defeat many otherwise-promising applications
of index-only scans.

That having been said, it would be unwise to speculate too much in
advance of the data, and we're not going to get any data until someone
tries implementing it, so +1 from me for putting something on the TODO
list.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, David Johnston <polobo(at)yahoo(dot)com>, "David E(dot) Wheeler" <david(at)justatheory(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Covering Indexes
Date: 2012-07-26 17:37:37
Message-ID: CAHyXU0yi3rgOrqnYG_J_UA1hgCDCocvZMSwP796p3vsB5jH7gw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Jul 26, 2012 at 12:17 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> I think so. The case where you want a wide multiple column primary
>> key to be extended to cover that one extra commonly grabbed value is
>> not super common but entirely plausible. With the existing
>> infrastructure to get the advantages of index covering you'd have to
>> duplicate the entire index for the extra field which really sucks:
>> aside from the huge waste of time and space, you force the planner to
>> play the 'which index do i use?' game.
>
> I think it is going to take several years before we really understand
> how index-only scans play out in the real world, and what factors
> limit their usefulness. This one has come up a few times because it's
> sort of an obvious thing to want to do and we don't have it, but I
> think that there's room for some skepticism about how well it will
> actually work, for reasons that have already been mentioned, and of
> course also because indexing more columns potentially means defeating
> HOT, which I suspect will defeat many otherwise-promising applications
> of index-only scans.

Sure. many will still get to use them though: I'm doing tons of
OLAP/BI lately: wide keys, minimal to no updating, minimal to no RI,
andvery large tables (often clustered and partitioned), and extreme
performance requirements. Covering indexes to me is basically a drop
in feature and COVERING seems to make a lot of sense on paper. (I
absolutely can't wait to get 9.2 on some of our bigger servers here).

merlin


From: Jeff Davis <davis(dot)jeffrey(at)gmail(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Simon Riggs <simon(at)2ndQuadrant(dot)com>, David Johnston <polobo(at)yahoo(dot)com>, "David E(dot) Wheeler" <david(at)justatheory(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Covering Indexes
Date: 2012-07-27 16:59:37
Message-ID: 1343408377.18860.3.camel@jdavis
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, 2012-07-26 at 12:13 -0400, Bruce Momjian wrote:
> So, do we want a TODO item about adding columns to a unique index that
> will not be used for uniqueness checks?

-1 from me, at least in its current form.

At it's heart, this is about separating the constraint from the index
that enforces it -- you'd like the columns to be available for querying
(for index only scans or otherwise), but not to take part in the
constraint.

And when you look at it from that perspective, this proposal is and
extremely limited form. You can't, for example, decide that an existing
index can be used for a new unique constraint. That's a lot more
plausible than the use cases mentioned in this thread as far as I can
see, but this proposal can't do that.

I tried proposing a more general use case when developing exclusion
constraints:

http://archives.postgresql.org/message-id/1253466074.6983.22.camel@jdavis

(allow user to specify multiple constraints enforced by one existing
index). But, at least at the time, my proposal didn't pass the
usefulness test:

http://archives.postgresql.org/pgsql-hackers/2009-09/msg01355.php

even though my proposal was strictly more powerful than this one is.

Also, this proposal extends the weird differences between CREATE UNIQUE
INDEX and a the declaration of a UNIQUE constraint. For example, if you
want DEFERRABLE you need to declare the constraint, but if you want to
use an expression (rather than a simple column reference) you need to
create the index. This problem does not exist with exclusion
constraints.

In my opinion, new innovations in unique constraints would be better
served as part of exclusion constraints, and we should keep unique
constraints simple. If we make an improvement to UNIQUE, then we will
want to do similar things for exclusion constraints anyway, so it just
seems duplicative.

Regards,
Jeff Davis


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Simon Riggs <simon(at)2ndQuadrant(dot)com>, David Johnston <polobo(at)yahoo(dot)com>, "David E(dot) Wheeler" <david(at)justatheory(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Covering Indexes
Date: 2012-07-27 17:24:55
Message-ID: 1343409895.18860.4.camel@jdavis
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, 2012-07-26 at 12:13 -0400, Bruce Momjian wrote:
> So, do we want a TODO item about adding columns to a unique index that
> will not be used for uniqueness checks?

-1 from me, at least in its current form.

At it's heart, this is about separating the constraint from the index
that enforces it -- you'd like the columns to be available for querying
(for index only scans or otherwise), but not to take part in the
constraint.

And when you look at it from that perspective, this proposal is and
extremely limited form. You can't, for example, decide that an existing
index can be used for a new unique constraint. That's a lot more
plausible than the use cases mentioned in this thread as far as I can
see, but this proposal can't do that.

I tried proposing a more general use case when developing exclusion
constraints:

http://archives.postgresql.org/message-id/1253466074.6983.22.camel@jdavis

(allow user to specify multiple constraints enforced by one existing
index). But, at least at the time, my proposal didn't pass the
usefulness test:

http://archives.postgresql.org/pgsql-hackers/2009-09/msg01355.php

even though my proposal was strictly more powerful than this one is.

Also, this proposal extends the weird differences between CREATE UNIQUE
INDEX and a the declaration of a UNIQUE constraint. For example, if you
want DEFERRABLE you need to declare the constraint, but if you want to
use an expression (rather than a simple column reference) you need to
create the index. This problem does not exist with exclusion
constraints.

In my opinion, new innovations in unique constraints would be better
served as part of exclusion constraints, and we should keep unique
constraints simple. If we make an improvement to UNIQUE, then we will
want to do similar things for exclusion constraints anyway, so it just
seems duplicative.

Regards,
Jeff Davis


From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, David Johnston <polobo(at)yahoo(dot)com>, "David E(dot) Wheeler" <david(at)justatheory(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Covering Indexes
Date: 2012-07-27 20:27:22
Message-ID: CAHyXU0zN-uVXhfAbfbQKSd7vHYw-uJykE+RPxdWREovHQxnriQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Jul 27, 2012 at 12:24 PM, Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
> On Thu, 2012-07-26 at 12:13 -0400, Bruce Momjian wrote:
>> So, do we want a TODO item about adding columns to a unique index that
>> will not be used for uniqueness checks?
>
> -1 from me, at least in its current form.
>
> At it's heart, this is about separating the constraint from the index
> that enforces it -- you'd like the columns to be available for querying
> (for index only scans or otherwise), but not to take part in the
> constraint.
>
> And when you look at it from that perspective, this proposal is and
> extremely limited form. You can't, for example, decide that an existing
> index can be used for a new unique constraint. That's a lot more
> plausible than the use cases mentioned in this thread as far as I can
> see, but this proposal can't do that.
>
> I tried proposing a more general use case when developing exclusion
> constraints:
>
> http://archives.postgresql.org/message-id/1253466074.6983.22.camel@jdavis
>
> (allow user to specify multiple constraints enforced by one existing
> index). But, at least at the time, my proposal didn't pass the
> usefulness test:
>
> http://archives.postgresql.org/pgsql-hackers/2009-09/msg01355.php
>
> even though my proposal was strictly more powerful than this one is.
>
> Also, this proposal extends the weird differences between CREATE UNIQUE
> INDEX and a the declaration of a UNIQUE constraint. For example, if you
> want DEFERRABLE you need to declare the constraint, but if you want to
> use an expression (rather than a simple column reference) you need to
> create the index. This problem does not exist with exclusion
> constraints.
>
> In my opinion, new innovations in unique constraints would be better
> served as part of exclusion constraints, and we should keep unique
> constraints simple. If we make an improvement to UNIQUE, then we will
> want to do similar things for exclusion constraints anyway, so it just
> seems duplicative.

Well, you're right. The exclusion constraint syntax is amazingly
general (if somewhat arcane) and it would be neat to be extended like
that. It already decouples you from physical assumptions on the
index. For example, it creates a two field index for a double field
btree equality exclusion and does a runtime, not equality based
uniqueness check. The covering index/uniqueness use case adds
legitimacy to the INDEX clause of exclusion constraints IMNSHO.

One point of concern though is that (following a bit of testing)
alter table foo add exclude using btree (id with =);
...is always strictly slower for inserts than
alter table foo add primary key(id);

This is probably because it doesn't use the low level btree based
uniqueness check (the index is not declared UNIQUE) -- shouldn't it do
that if it can?

merlin


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, David Johnston <polobo(at)yahoo(dot)com>, "David E(dot) Wheeler" <david(at)justatheory(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Covering Indexes
Date: 2012-07-28 19:19:35
Message-ID: 1343503175.18860.36.camel@jdavis
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, 2012-07-27 at 15:27 -0500, Merlin Moncure wrote:
> The covering index/uniqueness use case adds
> legitimacy to the INDEX clause of exclusion constraints IMNSHO.

Yes, I think it would be worth revisiting the idea.

> One point of concern though is that (following a bit of testing)
> alter table foo add exclude using btree (id with =);
> ...is always strictly slower for inserts than
> alter table foo add primary key(id);

Yes, in my worst-case tests there is about a 2X difference for building
the constraint and about a 30-50% slowdown during INSERT (I thought I
remembered the difference being lower, but I didn't dig up my old test).
That's for an in-memory case, I would expect disk I/O to make the
difference less apparent.

> This is probably because it doesn't use the low level btree based
> uniqueness check (the index is not declared UNIQUE) -- shouldn't it do
> that if it can?

We could probably detect that the operator being used is the btree
equality operator, set the unique property of the btree, and avoid the
normal exclusion constraint check. I'm sure there are some details to
work out, but if we start collecting more use cases where people want
the flexibility of exclusion constraints and the speed of UNIQUE, we
should look into it.

Regards,
Jeff Davis


From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, David Johnston <polobo(at)yahoo(dot)com>, "David E(dot) Wheeler" <david(at)justatheory(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Covering Indexes
Date: 2012-07-28 21:46:56
Message-ID: CAMkU=1xveZbrGRveEafYzCuaAorCX_oGA-hU=1hgxXCP7+5ovA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Jul 27, 2012 at 1:27 PM, Merlin Moncure <mmoncure(at)gmail(dot)com> wrote:
>
> One point of concern though is that (following a bit of testing)
> alter table foo add exclude using btree (id with =);
> ...is always strictly slower for inserts than
> alter table foo add primary key(id);
>
> This is probably because it doesn't use the low level btree based
> uniqueness check (the index is not declared UNIQUE) -- shouldn't it do
> that if it can?

If it did that, then than would make it faster in precisely those
cases were I wouldn't use it in the first place--where there is a less
esoteric alternative that does exactly the same thing. While that is
not something without value, it would seem better (although
potentially more difficult of course) to just make it faster in
general, instead.

I didn't look into the creation, but rather into inserts. During
inserts, it looks like it is doing a look up into the btree twice,
presumably once to maintain it, and once to check for uniqueness. If
there was some way to cache the look-up between those, I think it
would go a long way towards eliminating the performance difference.
Could that be done without losing the generality?

And, does it matter? I would think covering indexes would be deployed
to best effect when your data is not cached in RAM, in which case the
IO cost common to both paths probably overwhelms any extra CPU cost.

Cheers,

Jeff


From: Florian Weimer <fw(at)deneb(dot)enyo(dot)de>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: "David E(dot) Wheeler" <david(at)justatheory(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Covering Indexes
Date: 2012-08-25 07:53:30
Message-ID: 877gsns8zp.fsf@mid.deneb.enyo.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

* Jeff Janes:

> I don't see the virtue of this in this case. Since the index is not
> unique, why not just put the index on (a,b,c,d) and be done with it?

AFAICT, SQLite 4 encodes keys in a way that is not easily reversed
(although the encoding is injective, so it's reversible in principle).
Therefore, covered columns need to be listed separately, so they are
not subject to the key encoding.