RFC: i18n2ascii(TEXT) stored procedure

Lists: pgsql-sql
From: Michael A Nachbaur <mike(at)nachbaur(dot)com>
To: pgsql-sql(at)postgresql(dot)org
Subject: RFC: i18n2ascii(TEXT) stored procedure
Date: 2003-09-25 23:50:53
Message-ID: 200309251650.53855.mike@nachbaur.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

I've created the following stored procedure to allow me to do
international-insensitive text searches, e.g. a search for "Resume" would
match the text "Résumé".

I wanted to know:

a) am I missing any characters that need to be converted? My first (and only
language) is English, so I'm in the dark when that is concerned;
b) is there a better and/or faster way of implementing this? I don't want
searches to bog down (at least too badly) as a result of this.

CREATE OR REPLACE FUNCTION i18n2ascii (TEXT) RETURNS TEXT AS '
my ($source) = @_;
$source =~
tr/áàâäéèêëíìîïóòôöúùûüÁÀÂÄÉÈÊËÍÌÎÏÓÒÔÖÚÙÛÜ/aaaaeeeeiiiioooouuuuAAAAEEEEIIIIOOOOUUUU/;
return $source;
' LANGUAGE 'plperl';

--
/* Michael A. Nachbaur <mike(at)nachbaur(dot)com>
* http://nachbaur.com/pgpkey.asc
*/

"Ah, " said Arthur, "this is obviously some strange usage
of the word safe that I wasn't previously aware of. "


From: Michael A Nachbaur <mike(at)nachbaur(dot)com>
To: Manuel Sugawara <masm(at)fciencias(dot)unam(dot)mx>
Cc: pgsql-sql(at)postgresql(dot)org
Subject: Re: RFC: i18n2ascii(TEXT) stored procedure
Date: 2003-09-26 00:01:49
Message-ID: 200309251701.49224.mike@nachbaur.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

On Thursday 25 September 2003 05:06 pm, Manuel Sugawara wrote:
> Michael A Nachbaur <mike(at)nachbaur(dot)com> writes:
> > b) is there a better and/or faster way of implementing this? I
> > don't want searches to bog down (at least too badly) as a result of
> > this.
>
> Use to_ascii(text),
[snip]

D'oh! I guess thats what I get for not RTFM. :-)

--
/* Michael A. Nachbaur <mike(at)nachbaur(dot)com>
* http://nachbaur.com/pgpkey.asc
*/

"Oh no, not again."


From: Manuel Sugawara <masm(at)fciencias(dot)unam(dot)mx>
To: Michael A Nachbaur <mike(at)nachbaur(dot)com>
Cc: pgsql-sql(at)postgresql(dot)org
Subject: Re: RFC: i18n2ascii(TEXT) stored procedure
Date: 2003-09-26 00:06:07
Message-ID: m34qz0pg5s.fsf@conexa.fciencias.unam.mx
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

Michael A Nachbaur <mike(at)nachbaur(dot)com> writes:

> b) is there a better and/or faster way of implementing this? I
> don't want searches to bog down (at least too badly) as a result of
> this.

Use to_ascii(text),

masm=# select to_ascii('áéíóú');
to_ascii
----------
aeiou
(1 row)

Regards,
Manuel.


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Michael A Nachbaur <mike(at)nachbaur(dot)com>
Cc: pgsql-sql(at)postgresql(dot)org
Subject: Re: RFC: i18n2ascii(TEXT) stored procedure
Date: 2003-09-26 06:30:45
Message-ID: Pine.LNX.4.44.0309260829110.30903-100000@peter.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

Michael A Nachbaur writes:

> a) am I missing any characters that need to be converted?

In Unicode, any character can be dynamically combined with any number of
accent characters, so an enumerated list will never do.

--
Peter Eisentraut peter_e(at)gmx(dot)net


From: "scott(dot)marlowe" <scott(dot)marlowe(at)ihs(dot)com>
To: Michael A Nachbaur <mike(at)nachbaur(dot)com>
Cc: <pgsql-sql(at)postgresql(dot)org>
Subject: Re: RFC: i18n2ascii(TEXT) stored procedure
Date: 2003-09-26 14:10:59
Message-ID: Pine.LNX.4.33.0309260809350.21038-100000@css120.ihs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

On Thu, 25 Sep 2003, Michael A Nachbaur wrote:

> I've created the following stored procedure to allow me to do
> international-insensitive text searches, e.g. a search for "Resume" would
> match the text "Résumé".
>
> I wanted to know:
>
> a) am I missing any characters that need to be converted? My first (and only
> language) is English, so I'm in the dark when that is concerned;
> b) is there a better and/or faster way of implementing this? I don't want
> searches to bog down (at least too badly) as a result of this.
>
> CREATE OR REPLACE FUNCTION i18n2ascii (TEXT) RETURNS TEXT AS '
> my ($source) = @_;
> $source =~
> tr/áàâäéèêëíìîïóòôöúùûüÁÀÂÄÉÈÊËÍÌÎÏÓÒÔÖÚÙÛÜ/aaaaeeeeiiiioooouuuuAAAAEEEEIIIIOOOOUUUU/;
> return $source;
> ' LANGUAGE 'plperl';

You could probably accomplish the same thing without using perl via the
built in function translate(). Look in the functions-string.html in the
7.3.x documentation.

Also, the regex version of substring() is quite powerful.