Partial Index [WHERE col IN ('A','B')] Planner Anomalies

From: "David Johnston" <polobo(at)yahoo(dot)com>
To: <pgsql-general(at)postgresql(dot)org>
Subject: Partial Index [WHERE col IN ('A','B')] Planner Anomalies
Date: 2011-04-26 18:56:10
Message-ID: 020401cc0443$9c276850$d47638f0$@yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi,

I have the following query:

EXPLAIN SELECT * FROM taskinstance WHERE ti_status IN ('ACTIVE',
'DISPATCHED', 'FAILURE');

With two indices:

ti_active. partial index over (ti_status) WHERE ti_status IN
('ACTIVE','DISPATCHED','FAILURE')

ti_sortedstatus. full index over (ti_status, ti_creationtimestamp)

If I explain the above query multiple times (without any data changes) I get
either of the two query plans / the Bitmap Heap one more often. If I
analyze the table I immediately get the "Index Scan" plan first but
subsequent explains revert to switching between the two However, if I run
any other query but the three-criteria query then the flipping behavior
observed stops and the "Bitmap Heap" plan becomes the dominant plan. The

QUERY PLAN

Bitmap Heap Scan on taskinstance (cost=115.87..135.83 rows=2336 width=723)

Recheck Cond: (((ti_status)::text = ANY
('{ACTIVE,DISPATCHED,FAILURE}'::text[])) AND ((ti_status)::text = ANY
('{ACTIVE,DISPATCHED,FAILURE}'::text[])))

-> BitmapAnd (cost=115.87..115.87 rows=5 width=0)

-> Bitmap Index Scan on ti_active_dispatch_failed_tasks_idx
(cost=0.00..47.94 rows=2336 width=0)

-> Bitmap Index Scan on ti_sortedstatus_idx (cost=0.00..66.52
rows=2336 width=0)

Index Cond: ((ti_status)::text = ANY
('{ACTIVE,DISPATCHED,FAILURE}'::text[]))

QUERY PLAN

Index Scan using ti_active_dispatch_failed_tasks_idx on taskinstance
(cost=0.00..402.03 rows=2348 width=718)

In a total table size of 1.2M records the partial index covers around 2300

Now, if I keep the IN construct but only list one of the options (any one) I
get the following:

QUERY PLAN

Index Scan using ti_active_dispatch_failed_tasks_idx on taskinstance
(cost=0.00..466.44 rows=2868 width=725)

Index Cond: ((ti_status)::text = 'FAILURE'::text) --works for 'ACTIVE' and
'DISPATCHED' as well

If I include two of them I get the following results:

Active, Dispatched ------------

QUERY PLAN

Bitmap Heap Scan on taskinstance (cost=8.52..12.53 rows=1 width=718)

Recheck Cond: ((ti_status)::text = ANY ('{ACTIVE,DISPATCHED}'::text[]))

-> Bitmap Index Scan on ti_active_dispatch_failed_tasks_idx
(cost=0.00..8.52 rows=1 width=0) <<<<<<<<<<<<<<<<

Index Cond: ((ti_status)::text = ANY
('{ACTIVE,DISPATCHED}'::text[]))

Active, Failure ---------------

QUERY PLAN

Bitmap Heap Scan on taskinstance (cost=111.51..123.49 rows=2020 width=718)

Recheck Cond: (((ti_status)::text = ANY
('{ACTIVE,DISPATCHED,FAILURE}'::text[])) AND ((ti_status)::text = ANY
('{ACTIVE,FAILURE}'::text[])))

-> BitmapAnd (cost=111.51..111.51 rows=3 width=0)

-> Bitmap Index Scan on ti_active_dispatch_failed_tasks_idx
(cost=0.00..46.36 rows=2020 width=0) <<<<<<<<<<<<<<<<<

-> Bitmap Index Scan on ti_sortedstatus_idx (cost=0.00..63.89
rows=2020 width=0) <<<<<<<<<<<<<<<<<<<<

Index Cond: ((ti_status)::text = ANY
('{ACTIVE,FAILURE}'::text[]))

Dispatched, Failure --------------

QUERY PLAN

Bitmap Heap Scan on taskinstance (cost=115.77..135.72 rows=2348 width=718)

Recheck Cond: (((ti_status)::text = ANY
('{ACTIVE,DISPATCHED,FAILURE}'::text[])) AND ((ti_status)::text = ANY
('{DISPATCHED,FAILURE}'::text[])))

-> BitmapAnd (cost=115.77..115.77 rows=5 width=0)

-> Bitmap Index Scan on ti_active_dispatch_failed_tasks_idx
(cost=0.00..48.00 rows=2348 width=0) <<<<<<<<<<<<<<<<<<<

-> Bitmap Index Scan on ti_sortedstatus_idx (cost=0.00..66.35
rows=2348 width=0) <<<<<<<<<<<<<<<<<<<

Index Cond: ((ti_status)::text = ANY
('{DISPATCHED,FAILURE}'::text[]))

In the double and single criteria cases I get the same plan each time -
regardless of whether I run ANALYZE.

With the double-conditions the inclusion of "FAILURE" seems to confuse the
planner but the single condition plan for FAILURE works as expected.

The documentation is unclear on how the use of "IN ()" affects the query
planner. At a basic level IN becomes (= OR =) which are both simple
conditionals/logic elements so it should work properly. Also, if the
planner recognizes that the partial index is sufficient to cover the WHERE
clause why would it want to bring in a full index that is, by definition,
redundant?

I guess if you are not evaluating the WHERE clause the two-constraint
versions likely would work oddly (and they do for ACTIVE/DISPATCHED if I
reverse the order of the two items) but the three-constraint version
occasionally works and the one-constraints versions always work even though
they too do not match the original index where clause.

Nothing of a critical nature here but figured I'd report what I am seeing.
I can probably drop some constraints( and some data ) if you want to see the
"taskinstance" table; and I will happily provide whatever configuration info
you ask for.

Running 9.0.3 Windows 7 64-bit

Thanks,

David Johnston

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Greg Sabino Mullane 2011-04-26 19:06:58 Re: Switching Database Engines
Previous Message Tomas Vondra 2011-04-26 18:40:04 Re: Help - corruption issue?