Quick Links

Re: ILIKE vs indices

Lists:	pgsql-hackers

From:	James Cloos <cloos(at)jhcloos(dot)com>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	ILIKE vs indices
Date:	2012-12-28 23:04:52
Message-ID:	m34nj5eq1e.fsf@carbon.jhcloos.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

While tuning an application, I found the posts from 2003 recomending the
use of LOWER() and LIKE in place of ILIKE to take advantage of indices.

For this app, given the limitations of the upper-layer protocol it must
support, that change replaced about 30 minutes of repeated seq scans with
about 1 minute of repeated index scans! On a query-set often repeated
several times per day. (Probably more times per day now.)

Is there any contraindication to recasting:

foo ILIKE 'bar'

into:

LOWER(foo) LIKE LOWER('bar')

and documenting that an index has to be on LOWER(column) to benefit ILIKE?

Perhaps the parser could read the former as the latter?

-JimC
--
James Cloos <cloos(at)jhcloos(dot)com> OpenPGP: 1024D/ED7DAEA6

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	James Cloos <cloos(at)jhcloos(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: ILIKE vs indices
Date:	2012-12-28 23:41:29
Message-ID:	22960.1356738089@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

James Cloos <cloos(at)jhcloos(dot)com> writes:
> Is there any contraindication to recasting:
> foo ILIKE 'bar'
> into:
> LOWER(foo) LIKE LOWER('bar')

In some locales those are not equivalent, I believe, or at least
shouldn't be. (What the current code actually does is a separate
question.)

> Perhaps the parser could read the former as the latter?

Not unless the equivalence can be shown to be exact, which I doubt.
In any case it's not obvious why LOWER rather than UPPER.

regards, tom lane

From:	James Cloos <cloos(at)jhcloos(dot)com>
To:	pgsql-hackers(at)postgresql(dot)org
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject:	Re: ILIKE vs indices
Date:	2012-12-29 10:23:31
Message-ID:	m3vcblcg1v.fsf@carbon.jhcloos.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

>>>>> "TL" == Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

JC>> Is there any contraindication to recasting:
JC>> foo ILIKE 'bar'
JC>> into:
JC>> LOWER(foo) LIKE LOWER('bar')

TL> In some locales those are not equivalent, I believe, or at least
TL> shouldn't be. (What the current code actually does is a separate
TL> question.)

I see. After determining indexing based on th existance of an initial
fixed string, exluding anything matching isalpha(), it uses tolower(3)
and friends to do the actual match.

So my proposal wouldn't change what matches, but might make fixing any
bugs in what *should* match more difficult?

TL> In any case it's not obvious why LOWER rather than UPPER.

I suggested lower() because that matched all of the suggestions I found.
And as it turns out matches the current behaviour, too.

The footnote about adding explicit mention to the docs was expressly
because it is not otherwise obvious whether indices should use lower()
or upper().

I'll ask on one of the unicode lists whether there are any locales where
a case-insensitive match should be different than a case-preserving match
of tolower() vs tolower().

-JimC
--
James Cloos <cloos(at)jhcloos(dot)com> OpenPGP: 1024D/ED7DAEA6

From:	Greg Stark <stark(at)mit(dot)edu>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	James Cloos <cloos(at)jhcloos(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: ILIKE vs indices
Date:	2012-12-29 14:05:51
Message-ID:	CAM-w4HNnap-a3O1+1QgiCA1YaiAy4sqqH-Efj695=HY4HvM1iQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Dec 28, 2012 at 11:41 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> James Cloos <cloos(at)jhcloos(dot)com> writes:
>> Is there any contraindication to recasting:
>> foo ILIKE 'bar'
>> into:
>> LOWER(foo) LIKE LOWER('bar')
>
> In some locales those are not equivalent, I believe, or at least
> shouldn't be. (What the current code actually does is a separate
> question.)

What it actually does is actually *precisely* the above.

I can't quite wrap my head around the idea of "LIKE" and collations
having any meaningful interaction anyways. I certainly can't come up
with anything better than "lower() like lower()" (or "upper() like
upper()").

It would be nice to document what ILIKE actually means. Right now it's
kind of mysterious. And if we can't come up with anything better than
"lower() like lower()" then why not go ahead and document it and take
advantage of it.

--
greg

From:	Greg Stark <stark(at)mit(dot)edu>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	James Cloos <cloos(at)jhcloos(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: ILIKE vs indices
Date:	2012-12-29 14:13:26
Message-ID:	CAM-w4HPPJFPFQLS5V2nK+76_REo-53mHbo9OEwuE0oM=Eyj6aQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sat, Dec 29, 2012 at 2:05 PM, Greg Stark <stark(at)mit(dot)edu> wrote:
> I can't quite wrap my head around the idea of "LIKE" and collations
> having any meaningful interaction anyways. I certainly can't come up
> with anything better than "lower() like lower()" (or "upper() like
> upper()").

Hm. Maybe I spoke too fast. Perhaps we should just call strcasecmp()
character by character, or even call strcasecmp() on any substring of
the pattern that doesn't contain _ or % ? The latter would be pretty
hopeless to ever use a btree index though.

--
greg