Re: Stopgap solution for ILIKE in multibyte encodings

Lists: pgsql-hackers
From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: Stopgap solution for ILIKE in multibyte encodings
Date: 2006-09-04 17:34:36
Message-ID: 26399.1157391276@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I've gotten a little tired of reading reports that ILIKE doesn't work as
expected in UTF8. The problem is that iwchareq() in like.c is several
bricks shy of a load, as noticed e.g. here
http://archives.postgresql.org/pgsql-bugs/2005-10/msg00001.php

I looked a little bit at making iwchareq less broken, but it seems like
a mess because of the disconnect between pg_wchar and whatever the
system towlower() function might be expecting. And in any case it can
be expected that all this code will be thrown away someday, whenever
we bite the bullet and do our own locale handling --- so I'm disinclined
to spend a great deal of effort on it.

I propose that for ILIKE in multibyte encodings, we just pass the strings
through lower() and then use the normal LIKE code. This will be a bit
slower than what we do now, but as a wise man once said, code can be
arbitrarily fast if it needn't give the right answer. And we can't just
ignore the bug for still another release cycle.

Any objections?

regards, tom lane


From: "Guillaume Smet" <guillaume(dot)smet(at)gmail(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Stopgap solution for ILIKE in multibyte encodings
Date: 2006-09-04 17:41:25
Message-ID: 1d4e0c10609041041o7a93e6dfx848dc1e1a91cb69a@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom,

On 9/4/06, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> I propose that for ILIKE in multibyte encodings, we just pass the strings
> through lower() and then use the normal LIKE code. This will be a bit
> slower than what we do now, but as a wise man once said, code can be
> arbitrarily fast if it needn't give the right answer. And we can't just
> ignore the bug for still another release cycle.

Perhaps it's a stupid question but what about the indexes? An index on
lower(field) will be used by the new code or we wiil keep the current
behaviour of ILIKE?

Regards,

--
Guillaume


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Guillaume Smet" <guillaume(dot)smet(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Stopgap solution for ILIKE in multibyte encodings
Date: 2006-09-04 18:38:57
Message-ID: 10386.1157395137@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Guillaume Smet" <guillaume(dot)smet(at)gmail(dot)com> writes:
> On 9/4/06, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> I propose that for ILIKE in multibyte encodings, we just pass the strings
>> through lower() and then use the normal LIKE code.

> Perhaps it's a stupid question but what about the indexes? An index on
> lower(field) will be used by the new code or we wiil keep the current
> behaviour of ILIKE?

No, this is just an internal change in the function's implementation,
it won't have any effect like that. If you want indexing you'd still
need to write out "lower(col) like whatever".

regards, tom lane