Partial index does not make query faster

Lists: pgsql-general
From: Ruben Blanco <rubenblan(at)gmail(dot)com>
To: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Partial index does not make query faster
Date: 2012-01-18 12:57:20
Message-ID: CAP4Xq2G_gvhh1D+E7yGKPSJuNT3yfniisdMu+8Qp+jFG3=x0dQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Hi, folks:

I'm trying to reduce execution time on a query using a partial index,
but Postgres doesn't make a significant improvement, even when the
partial index is 30 times smaller than the index used currently. Query
plan returns a slightly higher cost (cost=0.00..327952.12) for the
partial index than the one used instead (cost=0.00..327446.61).

The table is a partitioned table, holding telephone calls for one
month. The partial index holds the calls for just one day.

The table:

\d calls_201109
...
Indexes:
"calls_201109_index_1" UNIQUE, btree (company, call_date,
caller_cli, receiver_cli, call_time, caller_cli_whs, outgoing_call)
"calls_201109_index_2" btree (company, call_date, caller_cli)
"calls_201109_index_partial" btree (company, call_date,
caller_cli) WHERE call_date = '2011-09-01'::date

Using partial index "calls_201109_index_partial":

REINDEX TABLE calls_201109;
ANALYZE calls_201109;

EXPLAIN ANALYZE
SELECT *
FROM calls_201109
WHERE company = 1
AND call_date = '20110901'
AND outgoing_call!='I'
;

QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
Index Scan using calls_201109_index_2 on calls_201109
(cost=0.00..327952.12 rows=225604 width=866) (actual
time=0.061..456.512 rows=225784 loops=1)
Index Cond: ((company = 1) AND (call_date = '2011-09-01'::date))
Filter: (outgoing_call <> 'I'::bpchar)
Total runtime: 643.349 ms

Size of the (partial) index used:

SELECT pg_size_pretty(pg_total_relation_size('calls_201109_index_partial'));
pg_size_pretty
----------------
11 MB

Without using partial index ("calls_201109_index_2" is used instead):

DROP INDEX calls_201109_index_partial;
REINDEX TABLE calls_201109;
ANALYZE calls_201109;

EXPLAIN ANALYZE
SELECT *
FROM calls_201109
WHERE company = 1
AND call_date = '20110901'
AND outgoing_call!='I'
;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
Index Scan using calls_201109_index_2 on calls_201109
(cost=0.00..327446.61 rows=225015 width=865) (actual
time=0.103..468.209 rows=225784 loops=1)
Index Cond: ((company = 1) AND (call_date = '2011-09-01'::date))
Filter: (outgoing_call <> 'I'::bpchar)
Total runtime: 656.103 ms

Size of the index used:

SELECT pg_size_pretty(pg_total_relation_size('calls_201109_index_2'));
pg_size_pretty
----------------
330 MB

Any idea on how to make partial index effective?

Thanks in advance.
Ruben.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Ruben Blanco <rubenblan(at)gmail(dot)com>
Cc: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: Partial index does not make query faster
Date: 2012-01-18 17:36:07
Message-ID: 24539.1326908167@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Ruben Blanco <rubenblan(at)gmail(dot)com> writes:
> I'm trying to reduce execution time on a query using a partial index,
> but Postgres doesn't make a significant improvement, even when the
> partial index is 30 times smaller than the index used currently.

That doesn't really matter that much. The part of the index a given
query will actually access is about the same size either way, ie same
number of leaf tuples of the same size. If you're lucky, you might save
one level of btree descent to reach the leaf pages; but considering that
btree fanout for integer-size keys is several hundred to one, you need a
size ratio of several hundred even to be assured of that.

The planner does have a small correction to favor smaller indexes over
larger, but it's so small that it's often lost in the noise. In this
case I think it's probably getting swamped by rounding off the estimate
of the number of leaf pages accessed to the nearest number of pages.
So it doesn't see the partial index as being any cheaper to use than the
full index.

> Indexes:
> "calls_201109_index_2" btree (company, call_date, caller_cli)
> "calls_201109_index_partial" btree (company, call_date, caller_cli) WHERE call_date = '2011-09-01'::date

In this case you could have made the partial index smaller in a useful
way (ie, reducing the number of leaf pages touched) by omitting the
call_date column, which is quite redundant given the WHERE clause.
I experimented a bit with that, but found that there actually is a
planner bug in that case --- it misestimates the number of index tuples
to be read because of failing to account for the partial index predicate
in one place. I'll see about fixing that, but in the meantime I don't
think you are really going to get any win with the above line of
thought, for a couple reasons:

First, is there some reason why 2011-09-01 is such a special date that
it deserves its own index, or are you just showing us a fragment of a
grand plan to manually partition the index through creating a large set
of partial indexes? That sort of approach is almost certainly going to
be a dead loss once you consider the extra overhead of updating the
indexes and the extra planning costs. Basically, while partitioning a
table can be useful when it comes time to drop one partition, it doesn't
save anything for routine queries except in very special cases; and
since there's no equivalent administrative need at the index level,
partitioning an index is not going to be a win.

Second, even if you can omit the call_date column, that's not going to
make this index much smaller, perhaps even not at all smaller depending
on alignment considerations. You need to reduce the size of an index
entry by probably a factor of 2 before it's worth the extra complexity.

regards, tom lane