Re: Bug in predicate indexes?

Lists: pgsql-hackers
From: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Bug in predicate indexes?
Date: 2005-11-18 03:57:43
Message-ID: 437D5137.6020903@commandprompt.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hello,

It appears that predicate indexes won't uses indexes on int8 columns
unless they are casted:

test=# select count(*) from test_key ;
count
--------
120617
(1 row)

test=# alter table test_key ALTER column id set statistics 25;
ALTER TABLE
test=# analyze test_key ;
test=# create index foo on test_key (id) where id >= 50000;
CREATE INDEX
test=# analyze;
ANALYZE
test=# explain analyze select * from test_key where id > 50000;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
Seq Scan on test_key (cost=0.00..2276.71 rows=69768 width=16) (actual
time=23.186..289.406 rows=70617 loops=1)
Filter: (id > 50000)
Total runtime: 516.917 ms
(3 rows)

test=# create index foo on test_key (id) where id >= 50000::bigint;
CREATE INDEX
test=# analyze;
ANALYZE
test=# explain analyze select * from test_key where id > 50000::bigint;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
Index Scan using foo on test_key (cost=0.00..1602.88 rows=70806
width=16) (actual time=0.080..264.001 rows=70617 loops=1)
Index Cond: (id > 50000::bigint)
Total runtime: 496.160 ms
(3 rows)

test=# explain analyze select * from test_key where id > 50000;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
Index Scan using foo on test_key (cost=0.00..1602.88 rows=70806
width=16) (actual time=0.116..262.952 rows=70617 loops=1)
Index Cond: (id > 50000)
Total runtime: 488.943 ms
(3 rows)


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Bug in predicate indexes?
Date: 2005-11-18 14:58:24
Message-ID: 6055.1132325904@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Joshua D. Drake" <jd(at)commandprompt(dot)com> writes:
> It appears that predicate indexes won't uses indexes on int8 columns
> unless they are casted:

This is a known (although perhaps not well documented) limitation of the
predicate testing logic. You do not need a cast in the query, though,
only in the index's WHERE condition.

regards, tom lane


From: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Bug in predicate indexes?
Date: 2005-11-18 21:07:41
Message-ID: 437E429D.3010209@commandprompt.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


>This is a known (although perhaps not well documented) limitation of the
>predicate testing logic. You do not need a cast in the query, though,
>only in the index's WHERE condition.
>
>
O.k. cool just wanted to make sure I wasn't the only one :).

Joshua D. Drake

> regards, tom lane
>
>---------------------------(end of broadcast)---------------------------
>TIP 2: Don't 'kill -9' the postmaster
>
>

--
The PostgreSQL Company - Command Prompt, Inc. 1.503.667.4564
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Managed Services, Shared and Dedicated Hosting
Co-Authors: plPHP, plPerlNG - http://www.commandprompt.com/


From: "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Bug in predicate indexes?
Date: 2005-11-22 01:05:07
Message-ID: 20051122010507.GU19279@pervasive.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Nov 18, 2005 at 09:58:24AM -0500, Tom Lane wrote:
> "Joshua D. Drake" <jd(at)commandprompt(dot)com> writes:
> > It appears that predicate indexes won't uses indexes on int8 columns
> > unless they are casted:
>
> This is a known (although perhaps not well documented) limitation of the
> predicate testing logic. You do not need a cast in the query, though,
> only in the index's WHERE condition.

I'm working on a docs patch for this (attached, but un-tested); is
bigint the only datatype this applies to or are there others?
--
Jim C. Nasby, Sr. Engineering Consultant jnasby(at)pervasive(dot)com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

Attachment Content-Type Size
patch text/plain 855 bytes

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
Cc: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Bug in predicate indexes?
Date: 2005-11-22 01:40:38
Message-ID: 6783.1132623638@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Jim C. Nasby" <jnasby(at)pervasive(dot)com> writes:
> On Fri, Nov 18, 2005 at 09:58:24AM -0500, Tom Lane wrote:
>> This is a known (although perhaps not well documented) limitation of the
>> predicate testing logic. You do not need a cast in the query, though,
>> only in the index's WHERE condition.

> I'm working on a docs patch for this (attached, but un-tested); is
> bigint the only datatype this applies to or are there others?

You should find out what the problem is before you start writing
documentation about it ;-). This has nothing whatever to do with
bigint.

The problem is here in predtest.c:

* Try to find a btree opclass containing the needed operators.
*
* We must find a btree opclass that contains both operators, else the
* implication can't be determined. Also, the pred_op has to be of
* default subtype (implying left and right input datatypes are the same);
* otherwise it's unsafe to put the pred_const on the left side of the
* test. Also, the opclass must contain a suitable test operator matching
* the clause_const's type (which we take to mean that it has the same
* subtype as the original clause_operator).

What the code is trying to do is prove that "X op C1" implies "X op C2"
where the constants aren't necessarily the same and the operators are
drawn from the same btree opclass, but might themselves be different.
Some examples:

X = 4 implies X > 3, because 4 > 3
X <= 7 implies X < 3, because 7 < 3
X > 7 doesn't imply X < 14

To make the proof, we need to compare C1 to C2, which means we have to
find the required operator for that in the same btree opclass as the two
given operators. If X, C1, and C2 are all of different datatypes then
this is not going to work in the current design of operator classes,
because *all* the operators in a given opclass have the same lefthand-side
datatype. (I've been thinking about ways to relax that in future PG
versions, but don't yet have a proposal to make.) The current code
requires X and C2 to be of the same type, which means that the needed
operator for "C2 op C1" will be in the same opclass in which we can find
the operator for X op C1.

The bottom line is that if you want the predicate prover to be at all
smart about a comparison in the index WHERE clause, the comparison can't
be cross-type. Otherwise, the only way it will match it is with an
exact match to the query's WHERE clause. Example: this will still work

query: WHERE bigintcol = 42
index: WHERE bigintcol = 42

but not this:

query: WHERE bigintcol = 42
index: WHERE bigintcol >= 4

The last case needs "bigintcol >= 4::bigint" in the index predicate in
order to be provable from a related-but-not-identical query condition.

This applies to anyplace where we have cross-type comparisons, which
in a quick look in pg_operator seems to be

<(integer,bigint)
<(bigint,integer)
<(smallint,integer)
<(integer,smallint)
<(real,double precision)
<(double precision,real)
<(smallint,bigint)
<(bigint,smallint)
<(date,timestamp without time zone)
<(date,timestamp with time zone)
<(timestamp without time zone,date)
<(timestamp with time zone,date)
<(timestamp without time zone,timestamp with time zone)
<(timestamp with time zone,timestamp without time zone)

I'm not sure this is worth documenting given that it's likely to change
by 8.2 anyway.

regards, tom lane


From: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Bug in predicate indexes?
Date: 2005-11-22 03:31:53
Message-ID: 43829129.3030003@commandprompt.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


> <(integer,bigint)
> <(bigint,integer)
> <(smallint,integer)
> <(integer,smallint)
> <(real,double precision)
> <(double precision,real)
> <(smallint,bigint)
> <(bigint,smallint)
> <(date,timestamp without time zone)
> <(date,timestamp with time zone)
> <(timestamp without time zone,date)
> <(timestamp with time zone,date)
> <(timestamp without time zone,timestamp with time zone)
> <(timestamp with time zone,timestamp without time zone)
>
>I'm not sure this is worth documenting given that it's likely to change
>by 8.2 anyway.
>
>
I think that we should be explicit as possible. This is a limitation
that effects every release
that is currently supported by the community. Which is as least 7.4,
8.0, 8.1 and possibly
7.3 (did we ever come to conclusion on that?).

The limitation should be addressed and patched to each documentation set.

Sincerely,

Joshua D. Drake

> regards, tom lane
>
>---------------------------(end of broadcast)---------------------------
>TIP 1: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
> message can get through to the mailing list cleanly
>
>

--
The PostgreSQL Company - Command Prompt, Inc. 1.503.667.4564
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Managed Services, Shared and Dedicated Hosting
Co-Authors: PLphp, PLperl - http://www.commandprompt.com/


From: "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Bug in predicate indexes?
Date: 2005-11-22 20:25:38
Message-ID: 20051122202538.GB99429@pervasive.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Nov 21, 2005 at 08:40:38PM -0500, Tom Lane wrote:
> You should find out what the problem is before you start writing
> documentation about it ;-). This has nothing whatever to do with
> bigint.

Damn, there's 5 minutes of my life that I won't get back! ;P

<snip>
> What the code is trying to do is prove that "X op C1" implies "X op C2"
> where the constants aren't necessarily the same and the operators are
> drawn from the same btree opclass, but might themselves be different.
> Some examples:
>
> X = 4 implies X > 3, because 4 > 3
> X <= 7 implies X < 3, because 7 < 3
Erm... shouldn't that be because 3 < 7 ? :)
> X > 7 doesn't imply X < 14
<snip>
> The bottom line is that if you want the predicate prover to be at all
> smart about a comparison in the index WHERE clause, the comparison can't
> be cross-type. Otherwise, the only way it will match it is with an
> exact match to the query's WHERE clause. Example: this will still work
>
> query: WHERE bigintcol = 42
> index: WHERE bigintcol = 42
>
> but not this:
>
> query: WHERE bigintcol = 42
> index: WHERE bigintcol >= 4
>
> The last case needs "bigintcol >= 4::bigint" in the index predicate in
> order to be provable from a related-but-not-identical query condition.

I assume part of this is due to how we cast bare numbers?

> This applies to anyplace where we have cross-type comparisons, which
> in a quick look in pg_operator seems to be
>
> <(integer,bigint)
> <(bigint,integer)
> <(smallint,integer)
> <(integer,smallint)
> <(real,double precision)
> <(double precision,real)
> <(smallint,bigint)
> <(bigint,smallint)
> <(date,timestamp without time zone)
> <(date,timestamp with time zone)
> <(timestamp without time zone,date)
> <(timestamp with time zone,date)
> <(timestamp without time zone,timestamp with time zone)
> <(timestamp with time zone,timestamp without time zone)

I think it's more than that, but my query might be off...
decibel=# select count(*) from (select distinct l.typname,r.typname from
pg_opclass c join pg_operator o on (c.opcintype=o.oprleft) join pg_type
l on (o.oprleft=l.oid) join pg_type r on (o.oprright=r.oid)) a;
88

(that's 8.0.3, btw)

> I'm not sure this is worth documenting given that it's likely to change
> by 8.2 anyway.

I agree with Josh that this should be documented backwards... assuming
that my count of 88 is correct, I think it's best to just specify that
it's recommended to always explicitely cast any constants in a
predicate.

Let me know if I'm on the wrong track with any of this, otherwise I'll
work on a set of patches.
--
Jim C. Nasby, Sr. Engineering Consultant jnasby(at)pervasive(dot)com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461