LIKE and Locale

Lists: pgsql-hackers
From: pgsql(at)mohawksoft(dot)com
To: pgsql-hackers(at)postgresql(dot)org
Subject: LIKE and Locale
Date: 2004-03-31 19:49:12
Message-ID: 18042.24.91.171.78.1080762552.squirrel@mail.mohawksoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I'm a little frustrated

select * from mytable where mystring = 'foo';

Uses an index

select * from mytable where mystring like 'foo';

Does not use an index.

I know Tom is not to excited about this, but I think it is a serious
problem. What really brings me to this is that I just installed 7.4.2. It
is my first real deployment of PostgreSQL in about a year and a half.
Unknown to me, the default for my latest DB was not type 'C' but
"en_US.iso885915" and thus no amount of work would have allowed a 'LIKE'
to use an index without surrounding the index and query with some
function, like lower(). This "upgrade" seriously broke a working
installation.

In the foggy recesses of my mind, I vaguely recalled locale issues with
various non-english languages. This shouldn't have been a problem as I
was, I thought, just using the default. Surprisingly, SHOW ALL, showed
differently. I recreate the database with --no-locale, then it works,
obviously.

Yea, this amounts to an RTFM issue, granted, but shouldn't various locales
be able to work with LIKE? Shouldn't "en_US.iso885915" work with "LIKE?"
Shouldn't database creation with anything but 'C' issue a warning?

The real issue here is that one has to know that the behavior of "LIKE" is
dependent on the locale to understand the problem. Yes it is briefly
mentioned in the FAQ, but it is not obvious as a common problem in the
UNIX world. As far as I can tell it is a PostgreSQL only issue that the
locale setting in the system seriously affects functionality.

It is further compounded by the fact that this setting can not be changed
without recreating the database. Given a non-trivally sized database, this
is no small issue.

(Don't get me wrong, these RTFM landmines are great for the consultant and
support industry, keep up the good work ;-))


From: Stephan Szabo <sszabo(at)megazone(dot)bigpanda(dot)com>
To: pgsql(at)mohawksoft(dot)com
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: LIKE and Locale
Date: 2004-03-31 21:29:44
Message-ID: 20040331132235.U80037@megazone.bigpanda.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On Wed, 31 Mar 2004 pgsql(at)mohawksoft(dot)com wrote:

> I'm a little frustrated
>
> select * from mytable where mystring = 'foo';
>
> Uses an index
>
> select * from mytable where mystring like 'foo';
>
> Does not use an index.
>
> I know Tom is not to excited about this, but I think it is a serious
> problem. What really brings me to this is that I just installed 7.4.2. It

I agree with Tom mostly. It'd be nice for cases to be better optimized in
general, but optimizing basically degenerate cases seems futile especially
when there's a generally better workaround (see below)

> is my first real deployment of PostgreSQL in about a year and a half.
> Unknown to me, the default for my latest DB was not type 'C' but
> "en_US.iso885915" and thus no amount of work would have allowed a 'LIKE'
> to use an index without surrounding the index and query with some

What about making an index with the <whatever>_pattern_ops opclass which
IIRC is supposed to allow index use on LIKE even for anchored searches
in non-C locales.


From: pgsql(at)mohawksoft(dot)com
To: "Stephan Szabo" <sszabo(at)megazone(dot)bigpanda(dot)com>
Cc: pgsql(at)mohawksoft(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: LIKE and Locale
Date: 2004-03-31 21:54:18
Message-ID: 17971.24.91.171.78.1080770058.squirrel@mail.mohawksoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

>
> On Wed, 31 Mar 2004 pgsql(at)mohawksoft(dot)com wrote:
>
>> I'm a little frustrated
>>
>> select * from mytable where mystring = 'foo';
>>
>> Uses an index
>>
>> select * from mytable where mystring like 'foo';
>>
>> Does not use an index.
>>
>> I know Tom is not to excited about this, but I think it is a serious
>> problem. What really brings me to this is that I just installed 7.4.2.
>> It
>
> I agree with Tom mostly. It'd be nice for cases to be better optimized in
> general, but optimizing basically degenerate cases seems futile especially
> when there's a generally better workaround (see below)

I'm not convinced that one optimization must de-optimize something else.
Also, I am suspicious of "work arounds" being suggested as norms.

>
>> is my first real deployment of PostgreSQL in about a year and a half.
>> Unknown to me, the default for my latest DB was not type 'C' but
>> "en_US.iso885915" and thus no amount of work would have allowed a 'LIKE'
>> to use an index without surrounding the index and query with some
>
> What about making an index with the <whatever>_pattern_ops opclass which
> IIRC is supposed to allow index use on LIKE even for anchored searches
> in non-C locales.

At issue, would this require a change of the SQL query? If it requires
changing the query, then PostgreSQL places too much of a burden on the
application writer when it comes to supporting multiple databases.


From: Stephan Szabo <sszabo(at)megazone(dot)bigpanda(dot)com>
To: pgsql(at)mohawksoft(dot)com
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: LIKE and Locale
Date: 2004-03-31 22:29:57
Message-ID: 20040331141809.D81236@megazone.bigpanda.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On Wed, 31 Mar 2004 pgsql(at)mohawksoft(dot)com wrote:

> > On Wed, 31 Mar 2004 pgsql(at)mohawksoft(dot)com wrote:
> >
> >> I'm a little frustrated
> >>
> >> select * from mytable where mystring = 'foo';
> >>
> >> Uses an index
> >>
> >> select * from mytable where mystring like 'foo';
> >>
> >> Does not use an index.
> >>
> >> I know Tom is not to excited about this, but I think it is a serious
> >> problem. What really brings me to this is that I just installed 7.4.2.
> >> It
> >
> > I agree with Tom mostly. It'd be nice for cases to be better optimized in
> > general, but optimizing basically degenerate cases seems futile especially
> > when there's a generally better workaround (see below)
>
> I'm not convinced that one optimization must de-optimize something else.

But, given limited developer resources, optimizing degenerate sql is
probably not the best use unless someone feels strongly enough about it to
do it themselves.

> Also, I am suspicious of "work arounds" being suggested as norms.

The workaround in this case is to make an index that works with LIKE even
in non "C" locales. I qualified it as a workaround because potentially you
might need two indexes on the field. However, given that it's not limited
to non-wildcard containing strings, it's also more generally useful.

> >> is my first real deployment of PostgreSQL in about a year and a half.
> >> Unknown to me, the default for my latest DB was not type 'C' but
> >> "en_US.iso885915" and thus no amount of work would have allowed a 'LIKE'
> >> to use an index without surrounding the index and query with some
> >
> > What about making an index with the <whatever>_pattern_ops opclass which
> > IIRC is supposed to allow index use on LIKE even for anchored searches
> > in non-C locales.
>
> At issue, would this require a change of the SQL query? If it requires
> changing the query, then PostgreSQL places too much of a burden on the
> application writer when it comes to supporting multiple databases.

No, it involves making an index using the built-in <whatever>_pattern_ops
operator class (which is mentioned in the operator class part of the index
documentation I think, but probably needs better mention)

Something like:
CREATE INDEX indblah on tab(col text_pattern_ops)