Re: Multicolumn index doc out of date?

Lists: pgsql-docs
From: Michael Fuhr <mike(at)fuhr(dot)org>
To: pgsql-docs(at)postgresql(dot)org
Subject: Multicolumn index doc out of date?
Date: 2005-09-12 14:26:47
Message-ID: 20050912142647.GA34685@winnie.fuhr.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-docs

The 8.1 Release Notes have the following item:

* Allow non-consecutive index columns to be used in a multi-column index

So isn't the following paragraph from "Multicolumn Indexes" out of
date?

The query planner can use a multicolumn index for queries that
involve the leftmost column in the index definition plus any
number of columns listed to the right of it, without a gap. For
example, an index on (a, b, c) can be used in queries involving
all of a, b, and c, or in queries involving both a and b, or in
queries involving only a, but not in other combinations. (In a
query involving a and c the planner could choose to use the index
for a, while treating c like an ordinary unindexed column.)

http://developer.postgresql.org/docs/postgres/indexes-multicolumn.html

--
Michael Fuhr


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Michael Fuhr <mike(at)fuhr(dot)org>
Cc: pgsql-docs(at)postgresql(dot)org, Teodor Sigaev <teodor(at)sigaev(dot)ru>
Subject: Re: Multicolumn index doc out of date?
Date: 2005-09-12 19:31:04
Message-ID: 23966.1126553464@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-docs

Michael Fuhr <mike(at)fuhr(dot)org> writes:
> So isn't the following paragraph from "Multicolumn Indexes" out of
> date?

> The query planner can use a multicolumn index for queries that
> involve the leftmost column in the index definition plus any
> number of columns listed to the right of it, without a gap. For
> example, an index on (a, b, c) can be used in queries involving
> all of a, b, and c, or in queries involving both a and b, or in
> queries involving only a, but not in other combinations. (In a
> query involving a and c the planner could choose to use the index
> for a, while treating c like an ordinary unindexed column.)

Yeah, I had missed that part of the manual while doing the multicolumn
rules change. I've replaced it with this:

: A multicolumn B-tree index can be used with query conditions that
: involve any subset of the index's columns, but the index is most
: efficient when there are constraints on the leading (leftmost)
: columns. The exact rule is that equality constraints on leading columns,
: plus any inequality constraints on the first column that does not have
: an equality constraint, will be used to limit the portion of the index
: that is scanned. Constraints on columns to the right of these columns
: are checked in the index, so they save visits to the table proper, but
: they do not reduce the portion of the index that has to be scanned. For
: example, given an index on (a, b, c) and a query condition WHERE a = 5
: AND b >= 42 AND c < 77, the index would have to be scanned from the
: first entry with a = 5 and b = 42 up through the last entry with a =
: 5. Index entries with c >= 77 would be skipped, but they'd still have to
: be scanned through. This index could in principle be used for queries
: that have constraints on b and/or c with no constraint on a --- but
: the entire index would have to be scanned, so in most cases the planner
: would prefer a sequential table scan over using the index.
:
: A multicolumn GiST index can only be used when there is a query
: condition on its leading column. As with B-trees, conditions on
: additional columns restrict the entries returned by the index, but do
: not in themselves aid the index search.

I believe the above is accurate about btree, but I'm not so sure about
GiST --- Teodor, any comments?

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Teodor Sigaev <teodor(at)sigaev(dot)ru>
Cc: Michael Fuhr <mike(at)fuhr(dot)org>, pgsql-docs(at)postgresql(dot)org
Subject: Re: Multicolumn index doc out of date?
Date: 2005-10-21 00:35:01
Message-ID: 15864.1129854901@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-docs

[ getting back to this documentation issue finally ]

Teodor Sigaev <teodor(at)sigaev(dot)ru> writes:
> I disagree with last affirmation: inner pages of index contains fair union of
> keys and enough helpful to select. Mailware ( http://www.pgsql.ru/db/mw )
> sucsessfully use combined GiST index (date, tsvector) for searching.

> GiST's split algorithm is good for unique leading keys, not so bad for small
> number of non-unique values and bad for all equals leading key. But "bad" means
> that itsn't optimal as picksplit for other keys may be. If there is several keys
> which can be moved on left or right page without changing union of first key for
> each page then GiST try put its on page (left or right) with smallest penalty
> calculated by other keys. This algorithm is very similar to defining page to put
> tuple with normal processing (without page split).

> With unique leading key GiST's split is fully similar to BTree - it looks only
> at leading key, but gistchoose isn't. Gistchoose (gistutil.c:622) chooses child
> with smallest penalty and it looks to other keys if several leading keys has the
> same penalty. In a GiST tree different keys may have the same penalty value with
> new key.

OK, how about this text then?

A multicolumn GiST index can only be used when there is a query condition
on its leading column. Conditions on additional columns restrict the
entries returned by the index, but the condition on the first column is the
most important one for determining how much of the index needs to be
scanned. A GiST index will be relatively ineffective if its first column
has only a few distinct values, even if there are many distinct values in
additional columns.

regards, tom lane


From: Teodor Sigaev <teodor(at)sigaev(dot)ru>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Michael Fuhr <mike(at)fuhr(dot)org>, pgsql-docs(at)postgresql(dot)org, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
Subject: Re: Multicolumn index doc out of date?
Date: 2005-10-21 09:57:12
Message-ID: 4358BB78.5050303@sigaev.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-docs

Tom Lane wrote:
> [ getting back to this documentation issue finally ]
>
> Teodor Sigaev <teodor(at)sigaev(dot)ru> writes:
>
>>I disagree with last affirmation: inner pages of index contains fair union of
>>keys and enough helpful to select. Mailware ( http://www.pgsql.ru/db/mw )
>>sucsessfully use combined GiST index (date, tsvector) for searching.
>
>
>>GiST's split algorithm is good for unique leading keys, not so bad for small
>>number of non-unique values and bad for all equals leading key. But "bad" means
>>that itsn't optimal as picksplit for other keys may be. If there is several keys
>>which can be moved on left or right page without changing union of first key for
>>each page then GiST try put its on page (left or right) with smallest penalty
>>calculated by other keys. This algorithm is very similar to defining page to put
>>tuple with normal processing (without page split).
>
>
>>With unique leading key GiST's split is fully similar to BTree - it looks only
>>at leading key, but gistchoose isn't. Gistchoose (gistutil.c:622) chooses child
>>with smallest penalty and it looks to other keys if several leading keys has the
>>same penalty. In a GiST tree different keys may have the same penalty value with
>>new key.
>
>
> OK, how about this text then?
>
> A multicolumn GiST index can only be used when there is a query condition
> on its leading column. Conditions on additional columns restrict the
> entries returned by the index, but the condition on the first column is the
> most important one for determining how much of the index needs to be
> scanned. A GiST index will be relatively ineffective if its first column
> has only a few distinct values, even if there are many distinct values in
> additional columns.

Ok, I think.

--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/


From: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Teodor Sigaev <teodor(at)sigaev(dot)ru>, Michael Fuhr <mike(at)fuhr(dot)org>, pgsql-docs(at)postgresql(dot)org
Subject: Re: Multicolumn index doc out of date?
Date: 2005-10-21 14:40:03
Message-ID: Pine.GSO.4.63.0510211839000.23078@ra.sai.msu.su
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-docs

btw, I think it's worth to add link to src/backend/access/gist/README
file, which we updated recently.

Oleg

On Thu, 20 Oct 2005, Tom Lane wrote:

> [ getting back to this documentation issue finally ]
>
> Teodor Sigaev <teodor(at)sigaev(dot)ru> writes:
>> I disagree with last affirmation: inner pages of index contains fair union of
>> keys and enough helpful to select. Mailware ( http://www.pgsql.ru/db/mw )
>> sucsessfully use combined GiST index (date, tsvector) for searching.
>
>> GiST's split algorithm is good for unique leading keys, not so bad for small
>> number of non-unique values and bad for all equals leading key. But "bad" means
>> that itsn't optimal as picksplit for other keys may be. If there is several keys
>> which can be moved on left or right page without changing union of first key for
>> each page then GiST try put its on page (left or right) with smallest penalty
>> calculated by other keys. This algorithm is very similar to defining page to put
>> tuple with normal processing (without page split).
>
>> With unique leading key GiST's split is fully similar to BTree - it looks only
>> at leading key, but gistchoose isn't. Gistchoose (gistutil.c:622) chooses child
>> with smallest penalty and it looks to other keys if several leading keys has the
>> same penalty. In a GiST tree different keys may have the same penalty value with
>> new key.
>
> OK, how about this text then?
>
> A multicolumn GiST index can only be used when there is a query condition
> on its leading column. Conditions on additional columns restrict the
> entries returned by the index, but the condition on the first column is the
> most important one for determining how much of the index needs to be
> scanned. A GiST index will be relatively ineffective if its first column
> has only a few distinct values, even if there are many distinct values in
> additional columns.
>
> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
> choose an index scan if your joining column's datatypes do not
> match
>

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83