Lists: | pgsql-sql |
---|
From: | Michael A Nachbaur <mike(at)nachbaur(dot)com> |
---|---|
To: | pgsql-sql(at)postgresql(dot)org |
Subject: | RFC: i18n2ascii(TEXT) stored procedure |
Date: | 2003-09-25 23:50:53 |
Message-ID: | 200309251650.53855.mike@nachbaur.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-sql |
I've created the following stored procedure to allow me to do
international-insensitive text searches, e.g. a search for "Resume" would
match the text "Résumé".
I wanted to know:
a) am I missing any characters that need to be converted? My first (and only
language) is English, so I'm in the dark when that is concerned;
b) is there a better and/or faster way of implementing this? I don't want
searches to bog down (at least too badly) as a result of this.
CREATE OR REPLACE FUNCTION i18n2ascii (TEXT) RETURNS TEXT AS '
my ($source) = @_;
$source =~
tr/áàâäéèêëíìîïóòôöúùûüÁÀÂÄÉÈÊËÍÌÎÏÓÒÔÖÚÙÛÜ/aaaaeeeeiiiioooouuuuAAAAEEEEIIIIOOOOUUUU/;
return $source;
' LANGUAGE 'plperl';
--
/* Michael A. Nachbaur <mike(at)nachbaur(dot)com>
* http://nachbaur.com/pgpkey.asc
*/
"Ah, " said Arthur, "this is obviously some strange usage
of the word safe that I wasn't previously aware of. "
From: | Michael A Nachbaur <mike(at)nachbaur(dot)com> |
---|---|
To: | Manuel Sugawara <masm(at)fciencias(dot)unam(dot)mx> |
Cc: | pgsql-sql(at)postgresql(dot)org |
Subject: | Re: RFC: i18n2ascii(TEXT) stored procedure |
Date: | 2003-09-26 00:01:49 |
Message-ID: | 200309251701.49224.mike@nachbaur.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-sql |
On Thursday 25 September 2003 05:06 pm, Manuel Sugawara wrote:
> Michael A Nachbaur <mike(at)nachbaur(dot)com> writes:
> > b) is there a better and/or faster way of implementing this? I
> > don't want searches to bog down (at least too badly) as a result of
> > this.
>
> Use to_ascii(text),
[snip]
D'oh! I guess thats what I get for not RTFM. :-)
--
/* Michael A. Nachbaur <mike(at)nachbaur(dot)com>
* http://nachbaur.com/pgpkey.asc
*/
"Oh no, not again."
From: | Manuel Sugawara <masm(at)fciencias(dot)unam(dot)mx> |
---|---|
To: | Michael A Nachbaur <mike(at)nachbaur(dot)com> |
Cc: | pgsql-sql(at)postgresql(dot)org |
Subject: | Re: RFC: i18n2ascii(TEXT) stored procedure |
Date: | 2003-09-26 00:06:07 |
Message-ID: | m34qz0pg5s.fsf@conexa.fciencias.unam.mx |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-sql |
Michael A Nachbaur <mike(at)nachbaur(dot)com> writes:
> b) is there a better and/or faster way of implementing this? I
> don't want searches to bog down (at least too badly) as a result of
> this.
Use to_ascii(text),
masm=# select to_ascii('áéíóú');
to_ascii
----------
aeiou
(1 row)
Regards,
Manuel.
From: | Peter Eisentraut <peter_e(at)gmx(dot)net> |
---|---|
To: | Michael A Nachbaur <mike(at)nachbaur(dot)com> |
Cc: | pgsql-sql(at)postgresql(dot)org |
Subject: | Re: RFC: i18n2ascii(TEXT) stored procedure |
Date: | 2003-09-26 06:30:45 |
Message-ID: | Pine.LNX.4.44.0309260829110.30903-100000@peter.localdomain |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-sql |
Michael A Nachbaur writes:
> a) am I missing any characters that need to be converted?
In Unicode, any character can be dynamically combined with any number of
accent characters, so an enumerated list will never do.
--
Peter Eisentraut peter_e(at)gmx(dot)net
From: | "scott(dot)marlowe" <scott(dot)marlowe(at)ihs(dot)com> |
---|---|
To: | Michael A Nachbaur <mike(at)nachbaur(dot)com> |
Cc: | <pgsql-sql(at)postgresql(dot)org> |
Subject: | Re: RFC: i18n2ascii(TEXT) stored procedure |
Date: | 2003-09-26 14:10:59 |
Message-ID: | Pine.LNX.4.33.0309260809350.21038-100000@css120.ihs.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-sql |
On Thu, 25 Sep 2003, Michael A Nachbaur wrote:
> I've created the following stored procedure to allow me to do
> international-insensitive text searches, e.g. a search for "Resume" would
> match the text "Résumé".
>
> I wanted to know:
>
> a) am I missing any characters that need to be converted? My first (and only
> language) is English, so I'm in the dark when that is concerned;
> b) is there a better and/or faster way of implementing this? I don't want
> searches to bog down (at least too badly) as a result of this.
>
> CREATE OR REPLACE FUNCTION i18n2ascii (TEXT) RETURNS TEXT AS '
> my ($source) = @_;
> $source =~
> tr/áàâäéèêëíìîïóòôöúùûüÁÀÂÄÉÈÊËÍÌÎÏÓÒÔÖÚÙÛÜ/aaaaeeeeiiiioooouuuuAAAAEEEEIIIIOOOOUUUU/;
> return $source;
> ' LANGUAGE 'plperl';
You could probably accomplish the same thing without using perl via the
built in function translate(). Look in the functions-string.html in the
7.3.x documentation.
Also, the regex version of substring() is quite powerful.