Re: default locale considered harmful? (was Re: [GENERAL]

Lists: pgsql-generalpgsql-hackers
From: lec <englim(at)pc(dot)jaring(dot)my>
To: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Using index for "like 'ABC%'" type query
Date: 2003-04-18 02:56:52
Message-ID: 3E9F6974.1020405@pc.jaring.my
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

For current version of postgresql, is it possible for a query like below
to utilize an index?

select item_name from item where item_code like 'ABC%';

Earlier versions of postgresql, eg, 6.5.3 can utilize the index.

Thanks,
Thomas.


From: Jeffrey Melloy <jmelloy(at)visualdistortion(dot)org>
To: lec <englim(at)pc(dot)jaring(dot)my>
Cc: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: Using index for "like 'ABC%'" type query
Date: 2003-04-18 03:16:18
Message-ID: 1EFE8419-714C-11D7-ACF3-000393C78AC0@visualdistortion.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

There is a module called "tsearch" in the contrib directory which works
quite well.
On Thursday, April 17, 2003, at 09:56 PM, lec wrote:

> For current version of postgresql, is it possible for a query like
> below to utilize an index?
>
> select item_name from item where item_code like 'ABC%';
>
> Earlier versions of postgresql, eg, 6.5.3 can utilize the index.
>
>
> Thanks,
> Thomas.
>
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
> message can get through to the mailing list cleanly


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Jeffrey Melloy <jmelloy(at)visualdistortion(dot)org>
Cc: lec <englim(at)pc(dot)jaring(dot)my>, pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: Using index for "like 'ABC%'" type query
Date: 2003-04-18 03:18:48
Message-ID: 200304180318.h3I3Im813344@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers


See the FAQ about index usage.

---------------------------------------------------------------------------

Jeffrey Melloy wrote:
> There is a module called "tsearch" in the contrib directory which works
> quite well.
> On Thursday, April 17, 2003, at 09:56 PM, lec wrote:
>
> > For current version of postgresql, is it possible for a query like
> > below to utilize an index?
> >
> > select item_name from item where item_code like 'ABC%';
> >
> > Earlier versions of postgresql, eg, 6.5.3 can utilize the index.
> >
> >
> > Thanks,
> > Thomas.
> >
> >
> > ---------------------------(end of
> > broadcast)---------------------------
> > TIP 3: if posting/reading through Usenet, please send an appropriate
> > subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
> > message can get through to the mailing list cleanly
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/docs/faqs/FAQ.html
>

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Andrew Sullivan <andrew(at)libertyrms(dot)info>
To: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: Using index for "like 'ABC%'" type query
Date: 2003-04-18 14:31:05
Message-ID: 20030418143104.GB26351@libertyrms.info
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

On Fri, Apr 18, 2003 at 10:56:52AM +0800, lec wrote:
> For current version of postgresql, is it possible for a query like below
> to utilize an index?
>
> select item_name from item where item_code like 'ABC%';

Yes. But locale is enabled by default now, and if you use anything
other than C (and other-than-C is increasingly the default on the
supportedplatforms), it won't work. Just make sure that your locale
is C when you do initdb.

A

--
----
Andrew Sullivan 204-4141 Yonge Street
Liberty RMS Toronto, Ontario Canada
<andrew(at)libertyrms(dot)info> M2P 2A8
+1 416 646 3304 x110


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Sullivan <andrew(at)libertyrms(dot)info>
Cc: pgsql-general <pgsql-general(at)postgresql(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: default locale considered harmful? (was Re: Using index for "like 'ABC%'" type query)
Date: 2003-04-18 14:48:21
Message-ID: 3954.1050677301@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Andrew Sullivan <andrew(at)libertyrms(dot)info> writes:
>> [ can't LIKE use an index? ]

> Yes. But locale is enabled by default now, and if you use anything
> other than C (and other-than-C is increasingly the default on the
> supported platforms), it won't work. Just make sure that your locale
> is C when you do initdb.

I recall someone floating a proposal that initdb should by default
initialize the database in C locale, not whatever-it-finds-in-the-
environment. To get a non-C locale you'd have to give an explicit
command-line switch --- essentially, reversing the sense of the present
"initdb --no-locale" option.

I'm beginning to think that would be a good idea, given the increasing
prevalence of en_US as a platform locale setting. Comments?

regards, tom lane


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andrew Sullivan <andrew(at)libertyrms(dot)info>, pgsql-general <pgsql-general(at)postgresql(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] default locale considered harmful? (was Re:
Date: 2003-04-18 15:07:15
Message-ID: 200304181507.h3IF7F708695@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Tom Lane wrote:
> Andrew Sullivan <andrew(at)libertyrms(dot)info> writes:
> >> [ can't LIKE use an index? ]
>
> > Yes. But locale is enabled by default now, and if you use anything
> > other than C (and other-than-C is increasingly the default on the
> > supported platforms), it won't work. Just make sure that your locale
> > is C when you do initdb.
>
> I recall someone floating a proposal that initdb should by default
> initialize the database in C locale, not whatever-it-finds-in-the-
> environment. To get a non-C locale you'd have to give an explicit
> command-line switch --- essentially, reversing the sense of the present
> "initdb --no-locale" option.
>
> I'm beginning to think that would be a good idea, given the increasing
> prevalence of en_US as a platform locale setting. Comments?

Agreed, or we could special-case en_US to be C locale.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Dennis Gearon <gearond(at)cvc(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andrew Sullivan <andrew(at)libertyrms(dot)info>, pgsql-general <pgsql-general(at)postgresql(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: default locale considered harmful? (was Re: Using
Date: 2003-04-18 15:41:41
Message-ID: 3EA01CB5.2060402@cvc.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

I whole heartedly agree. It would make it easier to do multilangugages in one db, I think.

When I get done with my current project, about a year from now, I'm going to dig in deep to UTF-8 on postgress and see if I cn write a collation function in C to be used in ORDER BY, statements.

Tom Lane wrote:
> Andrew Sullivan <andrew(at)libertyrms(dot)info> writes:
>
>>>[ can't LIKE use an index? ]
>
>
>>Yes. But locale is enabled by default now, and if you use anything
>>other than C (and other-than-C is increasingly the default on the
>>supported platforms), it won't work. Just make sure that your locale
>>is C when you do initdb.
>
>
> I recall someone floating a proposal that initdb should by default
> initialize the database in C locale, not whatever-it-finds-in-the-
> environment. To get a non-C locale you'd have to give an explicit
> command-line switch --- essentially, reversing the sense of the present
> "initdb --no-locale" option.
>
> I'm beginning to think that would be a good idea, given the increasing
> prevalence of en_US as a platform locale setting. Comments?
>
> regards, tom lane
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo(at)postgresql(dot)org
>


From: Lincoln Yeoh <lyeoh(at)pop(dot)jaring(dot)my>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Sullivan <andrew(at)libertyrms(dot)info>
Cc: pgsql-general <pgsql-general(at)postgresql(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: default locale considered harmful? (was Re:
Date: 2003-04-18 16:06:53
Message-ID: 5.2.1.1.1.20030418235339.021fa690@mbox.jaring.my
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Yes. I'd prefer C locale by default.

Usage of other locales seems to incur many issues. Those who know about
locale and want it are usually prepared to turn it on and use it given some
documentation. Those who don't should get C locale.

Link.

At 10:48 AM 4/18/2003 -0400, Tom Lane wrote:
>I recall someone floating a proposal that initdb should by default
>initialize the database in C locale, not whatever-it-finds-in-the-
>environment. To get a non-C locale you'd have to give an explicit
>command-line switch --- essentially, reversing the sense of the present
>"initdb --no-locale" option.
>
>I'm beginning to think that would be a good idea, given the increasing
>prevalence of en_US as a platform locale setting. Comments?
>
> regards, tom lane
>
>
>---------------------------(end of broadcast)---------------------------
>TIP 1: subscribe and unsubscribe commands go to majordomo(at)postgresql(dot)org


From: Darko Prenosil <darko(dot)prenosil(at)finteh(dot)hr>
To: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: Using index for "like 'ABC%'" type query
Date: 2003-04-18 18:17:04
Message-ID: 200304181817.04081.darko.prenosil@finteh.hr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

On Friday 18 April 2003 02:56, lec wrote:
> For current version of postgresql, is it possible for a query like below
> to utilize an index?
>
> select item_name from item where item_code like 'ABC%';
>
> Earlier versions of postgresql, eg, 6.5.3 can utilize the index.
>
>

I think some other DB projects use something that is called
"BEGINS WITH" operator. Some time ago I tried to write such operator for
postgres. Unfortunatelly at that time I had no experience with postgres
internals, so the functions are written in PL/PSQL !!! They are working all
right, but the whole thing should be rewriten in C. If You are interested,
I'll send You the "code". For us using non "C" collations, I do not know for
other way. Of course if "C" collation satisfy Your needs, You do not need to
do such "hacks".

Regards !


From: Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp>
To: tgl(at)sss(dot)pgh(dot)pa(dot)us
Cc: andrew(at)libertyrms(dot)info, pgsql-general(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: default locale considered harmful?
Date: 2003-04-18 22:42:23
Message-ID: 20030419.074223.74750845.t-ishii@sra.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

> Andrew Sullivan <andrew(at)libertyrms(dot)info> writes:
> >> [ can't LIKE use an index? ]
>
> > Yes. But locale is enabled by default now, and if you use anything
> > other than C (and other-than-C is increasingly the default on the
> > supported platforms), it won't work. Just make sure that your locale
> > is C when you do initdb.
>
> I recall someone floating a proposal that initdb should by default
> initialize the database in C locale, not whatever-it-finds-in-the-
> environment. To get a non-C locale you'd have to give an explicit
> command-line switch --- essentially, reversing the sense of the present
> "initdb --no-locale" option.
>
> I'm beginning to think that would be a good idea, given the increasing
> prevalence of en_US as a platform locale setting. Comments?

I agree with that initdb should by default initialize the database in
C locale. I have found on a local list too many users in Japan are
suffered by the locale problem and I'm getting tired of saying "you
should not forget to explicitly specify --no-locale siwtch".
--
Tatsuo Ishii


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andrew Sullivan <andrew(at)libertyrms(dot)info>, PostgreSQL Development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: default locale considered harmful? (was Re: [GENERAL]
Date: 2003-04-20 16:16:20
Message-ID: Pine.LNX.4.44.0304201725360.2891-100000@peter.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Tom Lane writes:

> I recall someone floating a proposal that initdb should by default
> initialize the database in C locale, not whatever-it-finds-in-the-
> environment. To get a non-C locale you'd have to give an explicit
> command-line switch --- essentially, reversing the sense of the present
> "initdb --no-locale" option.

If you're concerned about speed, let's think about fixing the real
problems, not about disabling the feature altogether. A while ago I
proposed an easy solution that made LIKE use an index based on strxfrm
order instead. It was rejected on the grounds that it would prevent a
future enhancement of the LIKE mechanism to use the locale-enabled
collation order, but no one seems to be seriously interested in
implementing that. I still have the patch; we can reconsider it if you
like.

(Btw., LIKE using the locale-enabled collation sequence is hardly going to
work, because most locales compare strings backwards from the end to the
start in the second pass, so something like LIKE 'foo%' can easily give
inconsistent results, since you don't know what the end of the string
really is. It's better to think of pattern matching as
character-by-character matching.)

--
Peter Eisentraut peter_e(at)gmx(dot)net


From: Andrew Sullivan <andrew(at)libertyrms(dot)info>
To: pgsql-general <pgsql-general(at)postgresql(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: default locale considered harmful? (was Re: Using index for "like 'ABC%'" type query)
Date: 2003-04-20 16:55:33
Message-ID: 20030420165533.GD30629@libertyrms.info
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

On Fri, Apr 18, 2003 at 10:48:21AM -0400, Tom Lane wrote:
> prevalence of en_US as a platform locale setting. Comments?

I tend to agree, especially since there's no real fix after you've
initdb'd.

--
----
Andrew Sullivan 204-4141 Yonge Street
Liberty RMS Toronto, Ontario Canada
<andrew(at)libertyrms(dot)info> M2P 2A8
+1 416 646 3304 x110


From: dalgoda(at)ix(dot)netcom(dot)com (Mike Castle)
To: pgsql-general(at)postgresql(dot)org
Subject: Re: default locale considered harmful? (was Re: Using index for "like 'ABC%'" type query)
Date: 2003-04-23 21:47:22
Message-ID: a8ejnxl3f.ln2@thune.mrc-home.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

In article <3954(dot)1050677301(at)sss(dot)pgh(dot)pa(dot)us>,
Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>I'm beginning to think that would be a good idea, given the increasing
>prevalence of en_US as a platform locale setting. Comments?

And who's brilliant idea was that anyway? Grrr. 'file' should NOT sort
next to 'FILE'.

mrc [more than slightly annoyed and off topic]
--
Mike Castle dalgoda(at)ix(dot)netcom(dot)com www.netcom.com/~dalgoda/
We are all of us living in the shadow of Manhattan. -- Watchmen
fatal ("You are in a maze of twisty compiler features, all different"); -- gcc


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Mike Castle <dalgoda(at)ix(dot)netcom(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: default locale considered harmful? (was Re: Using
Date: 2003-04-24 16:38:39
Message-ID: Pine.LNX.4.44.0304241240200.1689-100000@peter.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Mike Castle writes:

> And who's brilliant idea was that anyway? Grrr. 'file' should NOT sort
> next to 'FILE'.

Any reputable dictionary in the world would disagree with you.

--
Peter Eisentraut peter_e(at)gmx(dot)net


From: Olleg Samojlov <olleg(at)telecom(dot)mipt(dot)ru>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: default locale considered harmful? (was Re: [GENERAL] Using
Date: 2003-04-25 12:04:06
Message-ID: b8b87s$18i0$1@news.hub.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Tom Lane wrote:
> I'm beginning to think that would be a good idea, given the increasing
> prevalence of en_US as a platform locale setting. Comments?

Americans will be agree with you. :-)

IMHO:

LC_CTYPE -\
LC_COLLATE - Ideally make client depending for international databases.
But it seems impossible to use indexes. May be add parameter to CREATE
INDEX, which locale (locales) use for sorting text data?

LC_NUMERIC client depending
LC_TIME client depending, default ISO
LC_MONETARY not usefull, lacking monetary type
LC_MESSAGES client depending for notice (client messages), server
depending for console or syslog messages, agree with default C

Summary: Exchange information with local services (stdout, syslog,
files, etc) must be according local locale. With client - according
client locale.

--
Olleg


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Sullivan <andrew(at)libertyrms(dot)info>, PostgreSQL Development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: default locale considered harmful? (was Re: [GENERAL]
Date: 2003-05-31 04:09:04
Message-ID: 200305310409.h4V494124130@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers


Has the single-byte LIKE penalty been eliminated, so we don't need to
consider using C as the default locale for initdb, right?

If fixed, how was it done?

---------------------------------------------------------------------------

Peter Eisentraut wrote:
> Tom Lane writes:
>
> > I recall someone floating a proposal that initdb should by default
> > initialize the database in C locale, not whatever-it-finds-in-the-
> > environment. To get a non-C locale you'd have to give an explicit
> > command-line switch --- essentially, reversing the sense of the present
> > "initdb --no-locale" option.
>
> If you're concerned about speed, let's think about fixing the real
> problems, not about disabling the feature altogether. A while ago I
> proposed an easy solution that made LIKE use an index based on strxfrm
> order instead. It was rejected on the grounds that it would prevent a
> future enhancement of the LIKE mechanism to use the locale-enabled
> collation order, but no one seems to be seriously interested in
> implementing that. I still have the patch; we can reconsider it if you
> like.
>
> (Btw., LIKE using the locale-enabled collation sequence is hardly going to
> work, because most locales compare strings backwards from the end to the
> start in the second pass, so something like LIKE 'foo%' can easily give
> inconsistent results, since you don't know what the end of the string
> really is. It's better to think of pattern matching as
> character-by-character matching.)
>
> --
> Peter Eisentraut peter_e(at)gmx(dot)net
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
> (send "unregister YourEmailAddressHere" to majordomo(at)postgresql(dot)org)
>

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, Andrew Sullivan <andrew(at)libertyrms(dot)info>, PostgreSQL Development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: default locale considered harmful? (was Re: [GENERAL]
Date: 2003-05-31 04:32:15
Message-ID: 17973.1054355535@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> Has the single-byte LIKE penalty been eliminated, so we don't need to
> consider using C as the default locale for initdb, right?

I'm still of the opinion that we should make C the default locale.
But I'm not sure where the consensus is, so I've not made the change.

> If fixed, how was it done?

Peter has provided a hack whereby one can create a LIKE-supporting index
in a non-C locale. But a *default* index in a non-C locale is still not
going to support LIKE ... and the hacked index will not support ordinary
comparison or ordering operators. So I think there's still a lot left
to be desired here.

regards, tom lane


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, Andrew Sullivan <andrew(at)libertyrms(dot)info>, PostgreSQL Development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: default locale considered harmful? (was Re: [GENERAL]
Date: 2003-05-31 11:24:50
Message-ID: Pine.LNX.4.44.0305311321550.6382-100000@peter.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Tom Lane writes:

> Peter has provided a hack whereby one can create a LIKE-supporting index
> in a non-C locale. But a *default* index in a non-C locale is still not
> going to support LIKE ... and the hacked index will not support ordinary
> comparison or ordering operators. So I think there's still a lot left
> to be desired here.

I don't understand why you call this a hack. Pattern matching and string
comparison simply work differently, so the proper solution is to use
different operator classes. After all, that's what operator classes exist
for. What is left to be desired?

--
Peter Eisentraut peter_e(at)gmx(dot)net


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, Andrew Sullivan <andrew(at)libertyrms(dot)info>, PostgreSQL Development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: default locale considered harmful? (was Re: [GENERAL]
Date: 2003-05-31 15:25:07
Message-ID: 20497.1054394707@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Peter Eisentraut <peter_e(at)gmx(dot)net> writes:
> I don't understand why you call this a hack. Pattern matching and string
> comparison simply work differently, so the proper solution is to use
> different operator classes. After all, that's what operator classes exist
> for. What is left to be desired?

I think that a more general solution would be the ability to select a
locale (and hence a sort order) per-column, as the SQL spec envisions.
Then you'd just select C locale for columns you wanted to do pattern
matching for.

Admittedly, you'd still need the opclass-based approach for cases where
you wanted both pattern matching and a non-C-locale sort order ... but
I doubt that constitutes the majority of cases.

I guess my main concern is that we should not feel that this approach
takes the heat off us to support multiple locales. As a solution to the
narrow problem of LIKE performance, it's okay --- but it's not getting
us any nearer to a solution to the general locale problem.

regards, tom lane


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, Andrew Sullivan <andrew(at)libertyrms(dot)info>, PostgreSQL Development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: default locale considered harmful? (was Re: [GENERAL]
Date: 2003-05-31 17:51:03
Message-ID: 200305311751.h4VHp3E02778@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Tom Lane wrote:
> Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> > Has the single-byte LIKE penalty been eliminated, so we don't need to
> > consider using C as the default locale for initdb, right?
>
> I'm still of the opinion that we should make C the default locale.
> But I'm not sure where the consensus is, so I've not made the change.
>
> > If fixed, how was it done?
>
> Peter has provided a hack whereby one can create a LIKE-supporting index
> in a non-C locale. But a *default* index in a non-C locale is still not
> going to support LIKE ... and the hacked index will not support ordinary
> comparison or ordering operators. So I think there's still a lot left
> to be desired here.

So, my understanding is that you would create something such as:

CREATE INDEX iix ON tab (LIKE col)

and that does LIKE lookups and knows how to do col LIKE 'abc%', but it
can't be used for >= or ORDER BY, but it can be used for equality tests?

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, Andrew Sullivan <andrew(at)libertyrms(dot)info>, PostgreSQL Development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: default locale considered harmful? (was Re: [GENERAL]
Date: 2003-05-31 18:25:38
Message-ID: 21490.1054405538@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> So, my understanding is that you would create something such as:
> CREATE INDEX iix ON tab (LIKE col)
> and that does LIKE lookups and knows how to do col LIKE 'abc%', but it
> can't be used for >= or ORDER BY, but it can be used for equality tests?

Hm. Right at the moment, it wouldn't be used for equality tests unless
you spelled equality as "a ~=~ b". I wonder whether that's necessary
though; couldn't we dispense with that operator and use ordinary
equality as the BTEqual member of these opclasses? Are there any
locales that claim that not-physically-identical strings are equal?

regards, tom lane


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, Andrew Sullivan <andrew(at)libertyrms(dot)info>, PostgreSQL Development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: default locale considered harmful? (was Re: [GENERAL]
Date: 2003-05-31 22:18:39
Message-ID: 200305312218.h4VMIee21738@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Tom Lane wrote:
> Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> > So, my understanding is that you would create something such as:
> > CREATE INDEX iix ON tab (LIKE col)
> > and that does LIKE lookups and knows how to do col LIKE 'abc%', but it
> > can't be used for >= or ORDER BY, but it can be used for equality tests?
>
> Hm. Right at the moment, it wouldn't be used for equality tests unless
> you spelled equality as "a ~=~ b". I wonder whether that's necessary
> though; couldn't we dispense with that operator and use ordinary
> equality as the BTEqual member of these opclasses? Are there any
> locales that claim that not-physically-identical strings are equal?

Let me see if I understand.

Our default indexes will be able to do =, >, <, ORDER BY, and the
special index will be able to do LIKE, ORDER BY, and maybe equals. Do I
have that correct?

Looking at CVS, I see the warning about non-C locales has been removed.
Should we instead mention the new LIKE index method?

# (Be sure to maintain the correspondence with locale_is_like_safe() in selfuncs.c.)
if test x`pg_getlocale COLLATE` != xC && test x`pg_getlocale COLLATE` != xPOSIX; then
echo "This locale setting will prevent the use of indexes for pattern matching"
echo "operations. If that is a concern, rerun $CMDNAME with the collation order"
echo "set to \"C\". For more information see the Administrator's Guide."
fi

Doing LIKE with single-byte encodings would be easy because it would be
only 256 compares to find the min/max char values, but that doesn't work
with multi-byte encodings, right?

This LIKE/encoding problem is a tricky one because it gives poor
performance with little warning to users.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, Andrew Sullivan <andrew(at)libertyrms(dot)info>, PostgreSQL Development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: default locale considered harmful? (was Re: [GENERAL]
Date: 2003-06-01 14:30:21
Message-ID: Pine.LNX.4.44.0305311803310.6382-100000@peter.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Tom Lane writes:

> I think that a more general solution would be the ability to select a
> locale (and hence a sort order) per-column, as the SQL spec envisions.

It is a general solution, but not for this problem. The problem was to
make all locales equally suitable for certain optimizations, not to make
locales available in more places. I won't pretend to anyone that this
little change will bring us anywhere closer to a solution for that other
problem.

> Then you'd just select C locale for columns you wanted to do pattern
> matching for.

That's wrong, for a number of reasons:

First of all, I don't agree at all that cases where you want both pattern
matching and collation are rare; in fact, I rarely see a case where you
don't want both. Designing a system on that assumption is not sound,
because all operations should be equally possible in all situations.

Second, we will eventually want pattern matching operations to be locale
aware. Case-sensitive matching needs this, because case mappings depend
on the locale. The character class features of POSIX regexps also need
this. So you cannot make locales and well-performing pattern matching
mutually exclusive.

Third, keep in mind that datums with different locales cannot be combined
liberally. So systems built the way you propose become crippled in ways
that will be hard to understand and justify.

Finally, the locale of a datum should be a property that describes that
language of the stored data and that can be used for that specific purpose
without concerns and tradeoffs with the internal doings of the
optimization engine.

--
Peter Eisentraut peter_e(at)gmx(dot)net


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, Andrew Sullivan <andrew(at)libertyrms(dot)info>, PostgreSQL Development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: default locale considered harmful? (was Re: [GENERAL]
Date: 2003-06-01 18:07:37
Message-ID: Pine.LNX.4.44.0306011722110.2610-100000@peter.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Tom Lane writes:

> Are there any locales that claim that not-physically-identical strings
> are equal?

In Unicode there are plenty such combinations.

--
Peter Eisentraut peter_e(at)gmx(dot)net


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Sullivan <andrew(at)libertyrms(dot)info>, PostgreSQL Development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: default locale considered harmful? (was Re: [GENERAL]
Date: 2003-06-01 18:08:28
Message-ID: Pine.LNX.4.44.0306011724530.2610-100000@peter.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Bruce Momjian writes:

> Our default indexes will be able to do =, >, <, ORDER BY, and the
> special index will be able to do LIKE, ORDER BY, and maybe equals. Do I
> have that correct?

The default operator class supports comparisons (=, >, <, etc.) and ORDER
BY based on those operators. The other operator class supports pattern
matching operations (LIKE, SIMILAR, POSIX regexps).

> Looking at CVS, I see the warning about non-C locales has been removed.
> Should we instead mention the new LIKE index method?

I don't see a need. The old warning was mainly because once you
initdb'ed, you were basically stuck with your choice. Now we have plenty
of options to query and adjust things later.

> Doing LIKE with single-byte encodings would be easy because it would be
> only 256 compares to find the min/max char values, but that doesn't work
> with multi-byte encodings, right?

This has nothing to do with encodings.

--
Peter Eisentraut peter_e(at)gmx(dot)net


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Sullivan <andrew(at)libertyrms(dot)info>, PostgreSQL Development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: default locale considered harmful? (was Re: [GENERAL]
Date: 2003-06-02 02:32:12
Message-ID: 200306020232.h522WCG19699@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Peter Eisentraut wrote:
> Bruce Momjian writes:
>
> > Our default indexes will be able to do =, >, <, ORDER BY, and the
> > special index will be able to do LIKE, ORDER BY, and maybe equals. Do I
> > have that correct?
>
> The default operator class supports comparisons (=, >, <, etc.) and ORDER
> BY based on those operators. The other operator class supports pattern
> matching operations (LIKE, SIMILAR, POSIX regexps).
>
> > Looking at CVS, I see the warning about non-C locales has been removed.
> > Should we instead mention the new LIKE index method?
>
> I don't see a need. The old warning was mainly because once you
> initdb'ed, you were basically stuck with your choice. Now we have plenty
> of options to query and adjust things later.

How are people going to know to use these special LIKE indexes?

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073