Re: KEEPONLYALNUM for pg_trgm is not documented

Lists: pgsql-hackers
From: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: KEEPONLYALNUM for pg_trgm is not documented
Date: 2011-03-11 08:52:25
Message-ID: AANLkTimHNoANNyajzkb-RJOQb+-zQ+KTmyTQoYhj5HBf@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

contrib/pg_trgm in 9.1 becomes more attractive feature by index supports
for LIKE operators, but only alphabet and numeric characters are indexed
by default. But, we can modify KEEPONLYALNUM in the source code to
keep all characters in n-gram words.

However, the limitation and KEEPONLYALNUM are not documented in the page:
http://developer.postgresql.org/pgdocs/postgres/pgtrgm.html

An additonal documentation patches acceptable? The issues would be a FAQ for
non-English users. I heard that pg_trgm will be one of the *killer features*
of 9.1 in Japan, where N-gram based text search is preferred.

--
Itagaki Takahiro


From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: KEEPONLYALNUM for pg_trgm is not documented
Date: 2011-03-11 08:59:22
Message-ID: AANLkTin45m1eGAmOsTwMrNq1fR42EfFbd7+wXFhQnWR1@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Mar 11, 2011 at 5:52 PM, Itagaki Takahiro
<itagaki(dot)takahiro(at)gmail(dot)com> wrote:
> contrib/pg_trgm in 9.1 becomes more attractive feature by index supports
> for LIKE operators, but only alphabet and numeric characters are indexed
> by default. But, we can modify KEEPONLYALNUM in the source code to
> keep all characters in n-gram words.
>
> However, the limitation and KEEPONLYALNUM are not documented in the page:
>  http://developer.postgresql.org/pgdocs/postgres/pgtrgm.html
>
> An additonal documentation patches acceptable? The issues would be a FAQ for
> non-English users. I heard that pg_trgm will be one of the *killer features*
> of 9.1 in Japan, where N-gram based text search is preferred.

+10

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: KEEPONLYALNUM for pg_trgm is not documented
Date: 2011-03-11 14:04:56
Message-ID: AANLkTinuWRcMO83o+wS5gA_rkjyX8o50k4LGqqVKk70Z@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Mar 11, 2011 at 3:59 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> On Fri, Mar 11, 2011 at 5:52 PM, Itagaki Takahiro
> <itagaki(dot)takahiro(at)gmail(dot)com> wrote:
>> contrib/pg_trgm in 9.1 becomes more attractive feature by index supports
>> for LIKE operators, but only alphabet and numeric characters are indexed
>> by default. But, we can modify KEEPONLYALNUM in the source code to
>> keep all characters in n-gram words.
>>
>> However, the limitation and KEEPONLYALNUM are not documented in the page:
>>  http://developer.postgresql.org/pgdocs/postgres/pgtrgm.html
>>
>> An additonal documentation patches acceptable? The issues would be a FAQ for
>> non-English users. I heard that pg_trgm will be one of the *killer features*
>> of 9.1 in Japan, where N-gram based text search is preferred.
>
> +10

It's certainly not too late for doc patches.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: KEEPONLYALNUM for pg_trgm is not documented
Date: 2011-03-11 15:46:48
Message-ID: 19399.1299858408@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com> writes:
> contrib/pg_trgm in 9.1 becomes more attractive feature by index supports
> for LIKE operators, but only alphabet and numeric characters are indexed
> by default. But, we can modify KEEPONLYALNUM in the source code to
> keep all characters in n-gram words.

> However, the limitation and KEEPONLYALNUM are not documented in the page:
> http://developer.postgresql.org/pgdocs/postgres/pgtrgm.html

> An additonal documentation patches acceptable? The issues would be a FAQ for
> non-English users. I heard that pg_trgm will be one of the *killer features*
> of 9.1 in Japan, where N-gram based text search is preferred.

I'm not sure it's really a great idea to encourage people to use custom
builds with modified versions of that symbol. And those not using
custom builds will just be frustrated. If we think this is an important
feature then we ought to work out a better way to expose the
functionality.

(Personally I wonder how useful pg_trgm is at all in multibyte
encodings. Its idea of a trigram is 3 bytes, not 3 characters...)

regards, tom lane


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: KEEPONLYALNUM for pg_trgm is not documented
Date: 2011-09-05 17:25:40
Message-ID: 201109051725.p85HPeG20771@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas wrote:
> On Fri, Mar 11, 2011 at 3:59 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> > On Fri, Mar 11, 2011 at 5:52 PM, Itagaki Takahiro
> > <itagaki(dot)takahiro(at)gmail(dot)com> wrote:
> >> contrib/pg_trgm in 9.1 becomes more attractive feature by index supports
> >> for LIKE operators, but only alphabet and numeric characters are indexed
> >> by default. But, we can modify KEEPONLYALNUM in the source code to
> >> keep all characters in n-gram words.
> >>
> >> However, the limitation and KEEPONLYALNUM are not documented in the page:
> >> ?http://developer.postgresql.org/pgdocs/postgres/pgtrgm.html
> >>
> >> An additonal documentation patches acceptable? The issues would be a FAQ for
> >> non-English users. I heard that pg_trgm will be one of the *killer features*
> >> of 9.1 in Japan, where N-gram based text search is preferred.
> >
> > +10
>
> It's certainly not too late for doc patches.

I have applied the attached documention patch to 9.0, 9.1, and current
to mention that only ascii alphanumeric characters are processed by
contrib/pg_trgm.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

Attachment Content-Type Size
/rtmp/diff text/x-diff 774 bytes