Re: Query optimization and indexes

Lists: pgsql-general
From: felix(at)crowfix(dot)com
To: pgsql-general(at)postgresql(dot)org
Subject: Query optimization and indexes
Date: 2006-08-18 23:19:28
Message-ID: 20060818231928.GA19900@crowfix.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Suppose I have an index on 5 columns (A, B, C, D, E).

If my WHERE clause is not in that order, will the optimizer reorder
them as necessary and possible?

WHERE A=1 AND C=3 AND B=2 AND E=5 AND D=4

Obviously it can't reorder them in all cases:

WHERE A=1 AND (C=3 OR B=2) AND (E=5 OR D=4)

If I don't specify columns in the WHERE clause, how much can it use
the index? I think it is smart enough to use beginning columns:

WHERE A=1 AND B=2

How about skipping leading columns?

WHERE B=2

How about skipping intermediate columns?

WHERE A=1 AND C=3

Or both, which is probably the same?

WHERE B=2 AND D=4?

--
... _._. ._ ._. . _._. ._. ___ .__ ._. . .__. ._ .. ._.
Felix Finch: scarecrow repairman & rocket surgeon / felix(at)crowfix(dot)com
GPG = E987 4493 C860 246C 3B1E 6477 7838 76E9 182E 8151 ITAR license #4933
I've found a solution to Fermat's Last Theorem but I see I've run out of room o


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: felix(at)crowfix(dot)com
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Query optimization and indexes
Date: 2006-08-19 00:15:47
Message-ID: 26398.1155946547@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

felix(at)crowfix(dot)com writes:
> Suppose I have an index on 5 columns (A, B, C, D, E).
> If my WHERE clause is not in that order, will the optimizer reorder
> them as necessary and possible?

Yes, the optimizer understands about commutativity/associativity of
AND and OR ;-)

> If I don't specify columns in the WHERE clause, how much can it use
> the index?

Before (if memory serves) 8.1, the planner would only consider leading
index columns as potential indexscan qualifiers. So given

where a = 5 and c = 4;

only the a = 5 clause would be used with the index. As of 8.1 it will
consider using nonconsecutive index columns, but if you think for a bit
about the storage order of a btree, you'll realize that you really need
leading columns to keep down the amount of the index that gets scanned.
A lot of the time, such a plan will be rejected as apparently more
expensive than a seqscan.

(This is for btrees, I don't recall the state of play for GIST indexes
exactly.)

regards, tom lane


From: Gregory Stark <gsstark(at)mit(dot)edu>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: felix(at)crowfix(dot)com, pgsql-general(at)postgresql(dot)org
Subject: Re: Query optimization and indexes
Date: 2006-08-19 10:38:45
Message-ID: 8764gprowa.fsf@stark.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

> Before (if memory serves) 8.1, the planner would only consider leading
> index columns as potential indexscan qualifiers. So given
>
> where a = 5 and c = 4;
>
> only the a = 5 clause would be used with the index. As of 8.1 it will
> consider using nonconsecutive index columns

Really? Is this the "skip scan" plan people were pining for? I missed when
that happened.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Gregory Stark <gsstark(at)mit(dot)edu>
Cc: felix(at)crowfix(dot)com, pgsql-general(at)postgresql(dot)org
Subject: Re: Query optimization and indexes
Date: 2006-08-19 13:08:04
Message-ID: 16848.1155992884@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Gregory Stark <gsstark(at)mit(dot)edu> writes:
> Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
>> only the a = 5 clause would be used with the index. As of 8.1 it will
>> consider using nonconsecutive index columns

> Really? Is this the "skip scan" plan people were pining for?

No, there's no skip scan, it just applies all the indexable-column
checks within the index instead of making a trip to the heap.
For instance consider
WHERE a >= 4 AND a < 7 AND c > 5
with index entries

A B C

3 9 8
3 9 9
4 0 0 <- search starts here
4 0 1 reject
...
4 0 5 reject
4 0 6 accept (visit heap)
4 0 9 accept
4 1 0 reject
...
6 9 8 accept
6 9 9 accept
7 0 0 <- search ends when we reach here
7 0 1

If the condition on C is very selective then we might find ourselves
scanning over a lot of rejected entries within the possible bounds
for A. The problem is to guess whether re-descending the search tree
will win compared to just slogging forward, and if so to generate a
suitable search key for each intermediate descent.

regards, tom lane