Controlling locale and impact on LIKE statements

Lists: pgsql-general
From: Martin Langhoff <martin(at)catalyst(dot)net(dot)nz>
To: pgsql-general(at)postgresql(dot)org
Subject: Controlling locale and impact on LIKE statements
Date: 2007-09-05 00:48:19
Message-ID: 46DDFCD3.1060408@catalyst.net.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Hi!

I am having a bit of trouble with indexes, locales and LIKE queries.

Background
----------

Using Pg8.1/8.2 on a utf-8 database, my left-anchored LIKE clauses were
forcing a full table scan instead of using the index. After a bit of
digging, I found that Pg can only use the "normal" index for
left-anchored LIKE queries if locale is 'C'.

From http://www.postgresql.org/docs/8.1/static/indexes-types.html :
> The optimizer can also use a B-tree index for queries involving the
> pattern matching operators LIKE and ~ if the pattern is a constant and
> is anchored to the beginning of the string — for example, col LIKE
> 'foo%' or col ~ '^foo', but not col LIKE '%bar'. However, if your
> server does not use the C locale you will need to create the index
> with a special operator class to support indexing of pattern-matching
> queries.

What I think I need to do
-------------------------

As I have a Pg install where the locale is already en_US.UTF-8, and
the database already exists, is there a DB-scoped way of controlling
the locale? I think the index usage noted above is affected by
lc_ctype but I could be wrong.

I really don't want to go down the "rebuild your pgcluster" path as
outlined here
http://archives.postgresql.org/pgsql-hackers/2007-02/msg00992.php
;-)

Is there a better way? In this specific install I can create the
additional index...

However, this needs a general fix for Moodle, which
has an abstract DB schema handling, as we support MySQL, Pg, MSSQL,
Oracle. The whole thing of figuring out what the locale is and
whether to add magical additional indexes just for Pg makes me look
like a loony.

New PostgreSQL installs on modern linuxen like Ubuntu default to non-C
locales, which makes this more of an issue going forward.

See the discussion with Eloy (maintainer of the schema abstraction
layer) at http://moodle.org/mod/forum/discuss.php?d=78738#p350512
login as "guest" to avoid registration.

cheers,

martin
--
-----------------------------------------------------------------------
Martin @ Catalyst .Net .NZ Ltd, PO Box 11-053, Manners St, Wellington
WEB: http://catalyst.net.nz/ PHYS: Level 2, 150-154 Willis St
NZ: +64(4)916-7224 MOB: +64(21)364-017 UK: 0845 868 5733 ext 7224
Make things as simple as possible, but no simpler - Einstein
-----------------------------------------------------------------------


From: Filip Rembiałkowski <plk(dot)zuber(at)gmail(dot)com>
To: "Martin Langhoff" <martin(at)catalyst(dot)net(dot)nz>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Controlling locale and impact on LIKE statements
Date: 2007-09-06 01:15:13
Message-ID: 92869e660709051815x79598b87hc3b37922db12b7d9@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

2007/9/5, Martin Langhoff <martin(at)catalyst(dot)net(dot)nz>:
> Hi!
>
> I am having a bit of trouble with indexes, locales and LIKE queries.
>
> Background
> ----------
>
> Using Pg8.1/8.2 on a utf-8 database, my left-anchored LIKE clauses were
> forcing a full table scan instead of using the index. After a bit of
> digging, I found that Pg can only use the "normal" index for
> left-anchored LIKE queries if locale is 'C'.
>
> From http://www.postgresql.org/docs/8.1/static/indexes-types.html :
> > The optimizer can also use a B-tree index for queries involving the
> > pattern matching operators LIKE and ~ if the pattern is a constant and
> > is anchored to the beginning of the string — for example, col LIKE
> > 'foo%' or col ~ '^foo', but not col LIKE '%bar'. However, if your
> > server does not use the C locale you will need to create the index
> > with a special operator class to support indexing of pattern-matching
> > queries.
>
> What I think I need to do
> -------------------------
>
> As I have a Pg install where the locale is already en_US.UTF-8, and
> the database already exists, is there a DB-scoped way of controlling
> the locale? I think the index usage noted above is affected by
> lc_ctype but I could be wrong.
>
> I really don't want to go down the "rebuild your pgcluster" path as
> outlined here
> http://archives.postgresql.org/pgsql-hackers/2007-02/msg00992.php
> ;-)
>
> Is there a better way? In this specific install I can create the
> additional index...
>
> However, this needs a general fix for Moodle, which
> has an abstract DB schema handling, as we support MySQL, Pg, MSSQL,
> Oracle. The whole thing of figuring out what the locale is and
> whether to add magical additional indexes just for Pg makes me look
> like a loony.

no no.
just create ordinary btree indexes with text_pattern_ops, _always_,
disregarding the locale. it should not hurt.

create index i1 on t1 ( text1 text_pattern_ops );

>
> New PostgreSQL installs on modern linuxen like Ubuntu default to non-C
> locales, which makes this more of an issue going forward.
>
> See the discussion with Eloy (maintainer of the schema abstraction
> layer) at http://moodle.org/mod/forum/discuss.php?d=78738#p350512
> login as "guest" to avoid registration.
>
> cheers,
>
>
> martin
> --
> -----------------------------------------------------------------------
> Martin @ Catalyst .Net .NZ Ltd, PO Box 11-053, Manners St, Wellington
> WEB: http://catalyst.net.nz/ PHYS: Level 2, 150-154 Willis St
> NZ: +64(4)916-7224 MOB: +64(21)364-017 UK: 0845 868 5733 ext 7224
> Make things as simple as possible, but no simpler - Einstein
> -----------------------------------------------------------------------
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings
>

--
Filip Rembiałkowski