Partial Index [WHERE col IN ('A','B')] Planner Anomalies

Lists: pgsql-general
From: "David Johnston" <polobo(at)yahoo(dot)com>
To: <pgsql-general(at)postgresql(dot)org>
Subject: Partial Index [WHERE col IN ('A','B')] Planner Anomalies
Date: 2011-04-26 18:56:10
Message-ID: 020401cc0443$9c276850$d47638f0$@yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Hi,

I have the following query:

EXPLAIN SELECT * FROM taskinstance WHERE ti_status IN ('ACTIVE',
'DISPATCHED', 'FAILURE');

With two indices:

ti_active. partial index over (ti_status) WHERE ti_status IN
('ACTIVE','DISPATCHED','FAILURE')

ti_sortedstatus. full index over (ti_status, ti_creationtimestamp)

If I explain the above query multiple times (without any data changes) I get
either of the two query plans / the Bitmap Heap one more often. If I
analyze the table I immediately get the "Index Scan" plan first but
subsequent explains revert to switching between the two However, if I run
any other query but the three-criteria query then the flipping behavior
observed stops and the "Bitmap Heap" plan becomes the dominant plan. The

QUERY PLAN

Bitmap Heap Scan on taskinstance (cost=115.87..135.83 rows=2336 width=723)

Recheck Cond: (((ti_status)::text = ANY
('{ACTIVE,DISPATCHED,FAILURE}'::text[])) AND ((ti_status)::text = ANY
('{ACTIVE,DISPATCHED,FAILURE}'::text[])))

-> BitmapAnd (cost=115.87..115.87 rows=5 width=0)

-> Bitmap Index Scan on ti_active_dispatch_failed_tasks_idx
(cost=0.00..47.94 rows=2336 width=0)

-> Bitmap Index Scan on ti_sortedstatus_idx (cost=0.00..66.52
rows=2336 width=0)

Index Cond: ((ti_status)::text = ANY
('{ACTIVE,DISPATCHED,FAILURE}'::text[]))

QUERY PLAN

Index Scan using ti_active_dispatch_failed_tasks_idx on taskinstance
(cost=0.00..402.03 rows=2348 width=718)

In a total table size of 1.2M records the partial index covers around 2300

Now, if I keep the IN construct but only list one of the options (any one) I
get the following:

QUERY PLAN

Index Scan using ti_active_dispatch_failed_tasks_idx on taskinstance
(cost=0.00..466.44 rows=2868 width=725)

Index Cond: ((ti_status)::text = 'FAILURE'::text) --works for 'ACTIVE' and
'DISPATCHED' as well

If I include two of them I get the following results:

Active, Dispatched ------------

QUERY PLAN

Bitmap Heap Scan on taskinstance (cost=8.52..12.53 rows=1 width=718)

Recheck Cond: ((ti_status)::text = ANY ('{ACTIVE,DISPATCHED}'::text[]))

-> Bitmap Index Scan on ti_active_dispatch_failed_tasks_idx
(cost=0.00..8.52 rows=1 width=0) <<<<<<<<<<<<<<<<

Index Cond: ((ti_status)::text = ANY
('{ACTIVE,DISPATCHED}'::text[]))

Active, Failure ---------------

QUERY PLAN

Bitmap Heap Scan on taskinstance (cost=111.51..123.49 rows=2020 width=718)

Recheck Cond: (((ti_status)::text = ANY
('{ACTIVE,DISPATCHED,FAILURE}'::text[])) AND ((ti_status)::text = ANY
('{ACTIVE,FAILURE}'::text[])))

-> BitmapAnd (cost=111.51..111.51 rows=3 width=0)

-> Bitmap Index Scan on ti_active_dispatch_failed_tasks_idx
(cost=0.00..46.36 rows=2020 width=0) <<<<<<<<<<<<<<<<<

-> Bitmap Index Scan on ti_sortedstatus_idx (cost=0.00..63.89
rows=2020 width=0) <<<<<<<<<<<<<<<<<<<<

Index Cond: ((ti_status)::text = ANY
('{ACTIVE,FAILURE}'::text[]))

Dispatched, Failure --------------

QUERY PLAN

Bitmap Heap Scan on taskinstance (cost=115.77..135.72 rows=2348 width=718)

Recheck Cond: (((ti_status)::text = ANY
('{ACTIVE,DISPATCHED,FAILURE}'::text[])) AND ((ti_status)::text = ANY
('{DISPATCHED,FAILURE}'::text[])))

-> BitmapAnd (cost=115.77..115.77 rows=5 width=0)

-> Bitmap Index Scan on ti_active_dispatch_failed_tasks_idx
(cost=0.00..48.00 rows=2348 width=0) <<<<<<<<<<<<<<<<<<<

-> Bitmap Index Scan on ti_sortedstatus_idx (cost=0.00..66.35
rows=2348 width=0) <<<<<<<<<<<<<<<<<<<

Index Cond: ((ti_status)::text = ANY
('{DISPATCHED,FAILURE}'::text[]))

In the double and single criteria cases I get the same plan each time -
regardless of whether I run ANALYZE.

With the double-conditions the inclusion of "FAILURE" seems to confuse the
planner but the single condition plan for FAILURE works as expected.

The documentation is unclear on how the use of "IN ()" affects the query
planner. At a basic level IN becomes (= OR =) which are both simple
conditionals/logic elements so it should work properly. Also, if the
planner recognizes that the partial index is sufficient to cover the WHERE
clause why would it want to bring in a full index that is, by definition,
redundant?

I guess if you are not evaluating the WHERE clause the two-constraint
versions likely would work oddly (and they do for ACTIVE/DISPATCHED if I
reverse the order of the two items) but the three-constraint version
occasionally works and the one-constraints versions always work even though
they too do not match the original index where clause.

Nothing of a critical nature here but figured I'd report what I am seeing.
I can probably drop some constraints( and some data ) if you want to see the
"taskinstance" table; and I will happily provide whatever configuration info
you ask for.

Running 9.0.3 Windows 7 64-bit

Thanks,

David Johnston


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "David Johnston" <polobo(at)yahoo(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Partial Index [WHERE col IN ('A','B')] Planner Anomalies
Date: 2011-04-26 22:00:29
Message-ID: 23048.1303855229@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

"David Johnston" <polobo(at)yahoo(dot)com> writes:
> If I explain the above query multiple times (without any data changes) I get
> either of the two query plans / the Bitmap Heap one more often. If I
> analyze the table I immediately get the "Index Scan" plan first but
> subsequent explains revert to switching between the two However, if I run
> any other query but the three-criteria query then the flipping behavior
> observed stops and the "Bitmap Heap" plan becomes the dominant plan. The

Hmmm ... I'm seeing some very odd behavior here too. I suspect there is
some use-of-uninitialized-memory in there. After several false starts
I have a script that reliably reproduces the change of behavior in
(debug builds of) both HEAD and 9.0:

drop table if exists taskinstance;

create table taskinstance (ti_status varchar, ti_creationtimestamp timestamptz);
create index ti_sortedstatus on taskinstance (ti_status, ti_creationtimestamp);
create index ti_active on taskinstance (ti_status) WHERE ti_status IN ('ACTIVE', 'DISPATCHED', 'FAILURE');

vacuum taskinstance;

\c -

EXPLAIN SELECT * FROM taskinstance WHERE ti_status IN ('ACTIVE', 'DISPATCHED', 'FAILURE');
EXPLAIN SELECT * FROM taskinstance WHERE ti_status IN ('ACTIVE', 'DISPATCHED', 'FAILURE');
EXPLAIN SELECT * FROM taskinstance WHERE ti_status IN ('ACTIVE', 'DISPATCHED', 'FAILURE');

The first two EXPLAINs give a sane plan, the third one not so much.
Poking at it now.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "David Johnston" <polobo(at)yahoo(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Partial Index [WHERE col IN ('A','B')] Planner Anomalies
Date: 2011-04-27 18:09:35
Message-ID: 8043.1303927775@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

I wrote:
> "David Johnston" <polobo(at)yahoo(dot)com> writes:
>> If I explain the above query multiple times (without any data changes) I get
>> either of the two query plans / the Bitmap Heap one more often. If I
>> analyze the table I immediately get the "Index Scan" plan first but
>> subsequent explains revert to switching between the two However, if I run
>> any other query but the three-criteria query then the flipping behavior
>> observed stops and the "Bitmap Heap" plan becomes the dominant plan. The

> Hmmm ... I'm seeing some very odd behavior here too.

The instability of the plans is a bug, and at least in the example I was
looking at, the inefficient BitmapAnd plans are a bug too. I've applied
a patch that fixes it for me. There might be related cases that are
still broken though.

regards, tom lane