New thoughts about indexing cross-type comparisons

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: New thoughts about indexing cross-type comparisons
Date: 2003-09-16 16:30:37
Message-ID: 18030.1063729837@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

We've spent much effort trying to solve the "int8col = 42 doesn't use
an index" class of problems. AFAIR, all the recent tries have focused
on trying to get the parser to choose an index-compatible operator
initially. (In this example, that would mean promoting 42 to int8 so
that int8 = int8 would be chosen at parse time.) While I have not
completely given up hope on that approach, it's clearly fraught with
potential for unwanted side-effects.

The other place we could try to fix it is to improve either the planner
or the index AMs themselves to cope with cross-type comparisons for
themselves. I have shied away from that thought because it seemed
unreasonably difficult, but with the parser-side solution looking harder
and harder, maybe it's time to re-evaluate.

There actually was code in the planner at one time to substitute
index-compatible operators for non-index-compatible operators.
The algorithm was basically "if I have indexcolumn OP somevalue, where
OP isn't a member of the index opclass but also somevalue is of a
different datatype than indexcolumn, then look to see if somevalue can
be binary-coerced to the datatype of indexcolumn. If so, look for an
operator of the same name as OP and accepting the indexcolumn datatype
on both sides. If found, and it's a member of the index opclass, then
use that operator instead of the original."

This algorithm was wrong on both practical and theoretical levels;
in the first place it's not very helpful to only be able to handle
binary-compatible transformations, and in the second place there isn't
any good guarantee that it's not changing the semantics when it replaces
the operator. For instance int4 < and oid < do not act the same.
Depending on equality of operator names was a bad idea even then, and
would be quite unworkable now in the world of schema search paths.

However, those objections really stem from the planner not having enough
information to know when the transformation was safe to make. What if
we gave it that information?

After some thought I think the most practical approach is to make a new
system catalog for "secondary members of index opclasses". A secondary
member is not one of the operators that the index can handle directly,
but it can be transformed into one of the primary members. The new
catalog would be called, say, pg_amsecop, and would have columns like
opclass operatorid replacementop ltransform rtransform
where the first two are the primary key. The idea is when we have a
WHERE expression "indexcol OP somevalue", and we can't find OP in the
index's operator class (ie, there's no pg_amop entry for that operator
and opclass), we next look to see if the opclass/operator combination
appears in pg_amsecop. If so, then we are allowed to replace the
"indexcol OP somevalue" expression by "ltransform(indexcol)
REPLACEMENTOP rtransform(somevalue)". ltransform and rtransform are
pg_proc OIDs of cast functions, or zero where no run-time cast is
needed. replacementop is the OID of the substitute operator, which
presumably is one of the primary members of the index opclass. Now we
have an indexable expression. The presence of the entry in pg_amsecop
represents the opclass designer's promise to us that this is a valid,
semantics-preserving transformation.

This design might be overly general --- for example, I doubt there can
be any need to apply a cast function to the indexable column. We could
eliminate the ltransform column and probably also halve the number of
entries in the table if we insist that the indexable column be on the
left before we start looking (in other words, "somevalue OP indexcol"
must be commuted before we look for the operator in pg_amsecop, not after).

This is not a 100% solution to our problems. I don't think we could use
it to solve the problem for int2 columns ("int2col = 42") because it'd
be unsafe to promise that an int4-to-int2 cast could be inserted into
an expression without changing the behavior. So we'd still want to look
at having small integer constants be initially typed as int2, which
leaves us with a number of unsolved issues, as noted here:
http://archives.postgresql.org/pgsql-hackers/2002-11/msg00468.php
So maybe the "secondary operator" idea will help, or maybe it won't do
much for us.

Comments? Does this spur any better ideas?

regards, tom lane

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dave Smith 2003-09-16 17:11:38 Re: New thoughts about indexing cross-type comparisons
Previous Message Mendola Gaetano 2003-09-16 15:19:30 Re: best place for xstrdup