Re: Prefix support for synonym dictionary

Lists: pgsql-hackers
From: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
To: Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Prefix support for synonym dictionary
Date: 2009-07-14 15:25:20
Message-ID: Pine.LNX.4.64.0907141922530.8065@sn.sai.msu.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi there,

attached is our patch for CVS HEAD, which adds prefix support for synonym
dictionary.

Quick example:

> cat $SHAREDIR/tsearch_data/synonym_sample.syn
postgres pgsql
postgresql pgsql
postgre pgsql
gogle googl
indices index*

=# create text search dictionary syn( template=synonym,synonyms='synonym_sample');
=# select ts_lexize('syn','indices');
ts_lexize
-----------
{index}
(1 row)
=# create text search configuration tst ( copy=simple);
=# alter text search configuration tst alter mapping for asciiword with syn;
=# select to_tsquery('tst','indices');
to_tsquery
------------
'index':*
(1 row)
=# select 'indexes are very useful'::tsvector @@ to_tsquery('tst','indices');
?column?
----------
t
(1 row)

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

Attachment Content-Type Size
synonym_prefix.gz application/octet-stream 2.2 KB

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
Cc: Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Prefix support for synonym dictionary
Date: 2009-08-02 19:05:24
Message-ID: 1249239924.4765.6926.camel@jdavis
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

The patch looks good.

Comments:

1. The docs should be clarified a little. For instance, it should have a
link back to the definition of a prefix search (12.3.2). I included my
doc suggestions as an attachment.

2. dsynonym_init() uses findwrd() in a slightly confusing (and perhaps
fragile) way. After calling findwrd(), the "end" pointer is pointing at
either the end of the string, or the *; depending on whether the string
ends in * and whether flags is NULL. I only mention this because I had
to take a more careful look to see what was happening. Perhaps add a
comment to make it more clear?

3. The patch looks for the special byte '*'. I think that's fine,
because we depend on the files being in UTF-8 encoding, where it's the
same byte. However, I thought it was worth mentioning in case we want to
support other encodings for text search files later.

Regards,
Jeff Davis

Attachment Content-Type Size
prefix-synonym-review.diff text/x-patch 2.1 KB

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Prefix support for synonym dictionary
Date: 2009-08-05 16:34:03
Message-ID: 603c8f070908050934y72ace1c2sf72b9f10a0c1c652@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Aug 2, 2009 at 3:05 PM, Jeff Davis<pgsql(at)j-davis(dot)com> wrote:
> The patch looks good.
>
> Comments:
>
> 1. The docs should be clarified a little. For instance, it should have a
> link back to the definition of a prefix search (12.3.2). I included my
> doc suggestions as an attachment.
>
> 2. dsynonym_init() uses findwrd() in a slightly confusing (and perhaps
> fragile) way. After calling findwrd(), the "end" pointer is pointing at
> either the end of the string, or the *; depending on whether the string
> ends in * and whether flags is NULL. I only mention this because I had
> to take a more careful look to see what was happening. Perhaps add a
> comment to make it more clear?
>
> 3. The patch looks for the special byte '*'. I think that's fine,
> because we depend on the files being in UTF-8 encoding, where it's the
> same byte. However, I thought it was worth mentioning in case we want to
> support other encodings for text search files later.

Oleg,

Are you planning to update this patch this week? If not I will set it
to "Returned with Feedback".

Thanks,

...Robert


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Prefix support for synonym dictionary
Date: 2009-08-05 18:17:09
Message-ID: 1249496229.3653.1426.camel@monkey-cat.sm.truviso.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, 2009-08-05 at 12:34 -0400, Robert Haas wrote:
> Oleg,
>
> Are you planning to update this patch this week? If not I will set it
> to "Returned with Feedback".

My only comments were related to docs and comments, and I supplied a
patch as a suggested fix for the docs. Also, the patch is very small.

I'd hate to hold it up over such a minor issue, and it seems like a
useful feature. If Oleg is unavailable, would you mind just having a
second review of the patch to see if they agree with my suggestions, and
then mark "ready for committer review"?

Regards,
Jeff Davis


From: Teodor Sigaev <teodor(at)sigaev(dot)ru>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Prefix support for synonym dictionary
Date: 2009-08-06 15:58:49
Message-ID: 4A7AFDB9.4050905@sigaev.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> 1. The docs should be clarified a little. For instance, it should have a
> link back to the definition of a prefix search (12.3.2). I included my
> doc suggestions as an attachment.
Thank you, merged

> 2. dsynonym_init() uses findwrd() in a slightly confusing (and perhaps
> fragile) way. After calling findwrd(), the "end" pointer is pointing at
> either the end of the string, or the *; depending on whether the string
> ends in * and whether flags is NULL. I only mention this because I had
> to take a more careful look to see what was happening. Perhaps add a
> comment to make it more clear?
Add comments:
/*
* Finds the next whitespace-delimited word within the 'in' string.
* Returns a pointer to the first character of the word, and a pointer
* to the next byte after the last character in the word (in *end).
* Character '*' at the end of word will not be threated as word
* charater if flags is not null.
*/
static char *
findwrd(char *in, char **end, uint16 *flags)

> 3. The patch looks for the special byte '*'. I think that's fine,
> because we depend on the files being in UTF-8 encoding, where it's the
> same byte. However, I thought it was worth mentioning in case we want to
> support other encodings for text search files later.

tsearch_readline() converts file's UTF8 encoding into server encoding. pgsql
supports only encoding which are a superset of ASCII. So it's safe to use
asterisk with any encodings

--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/

Attachment Content-Type Size
synonym_prefix-0.2.gz application/x-tar 2.4 KB

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Teodor Sigaev <teodor(at)sigaev(dot)ru>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Prefix support for synonym dictionary
Date: 2009-08-06 16:19:51
Message-ID: 603c8f070908060919o5151741aq3af69fff573ea83e@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2009/8/6 Teodor Sigaev <teodor(at)sigaev(dot)ru>:
>> 1. The docs should be clarified a little. For instance, it should have a
>> link back to the definition of a prefix search (12.3.2). I included my
>> doc suggestions as an attachment.
>
> Thank you, merged
>
>> 2. dsynonym_init() uses findwrd() in a slightly confusing (and perhaps
>> fragile) way. After calling findwrd(), the "end" pointer is pointing at
>> either the end of the string, or the *; depending on whether the string
>> ends in * and whether flags is NULL. I only mention this because I had
>> to take a more careful look to see what was happening. Perhaps add a
>> comment to make it more clear?
>
> Add comments:
> /*
>  * Finds the next whitespace-delimited word within the 'in' string.
>  * Returns a pointer to the first character of the word, and a pointer
>  * to the next byte after the last character in the word (in *end).
>  * Character '*' at the end of word will not be threated as word
>  * charater if flags is not null.
>  */
> static char *
> findwrd(char *in, char **end, uint16 *flags)
>
>
>
>> 3. The patch looks for the special byte '*'. I think that's fine,
>> because we depend on the files being in UTF-8 encoding, where it's the
>> same byte. However, I thought it was worth mentioning in case we want to
>> support other encodings for text search files later.
>
> tsearch_readline() converts file's UTF8 encoding into server encoding. pgsql
> supports only encoding which are a superset of ASCII. So it's safe to use
> asterisk with any encodings

Jeff,

Based on these comments, do you want to go ahead and mark this "Ready
for Committer"?

https://commitfest.postgresql.org/action/patch_view?id=133

...Robert


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Teodor Sigaev <teodor(at)sigaev(dot)ru>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Prefix support for synonym dictionary
Date: 2009-08-06 16:53:42
Message-ID: 1249577622.9256.740.camel@jdavis
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, 2009-08-06 at 12:19 -0400, Robert Haas wrote:
> Based on these comments, do you want to go ahead and mark this "Ready
> for Committer"?

Done, thanks Teodor.

However, on the commitfest page, the patches got updated in the wrong
places: "prefix support" and "filtering dictionary support" are pointing
at each others' patches.

Regards,
Jeff Davis


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Teodor Sigaev <teodor(at)sigaev(dot)ru>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Prefix support for synonym dictionary
Date: 2009-08-06 17:55:50
Message-ID: 603c8f070908061055q63426202ja0d3312c9a263097@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Aug 6, 2009 at 12:53 PM, Jeff Davis<pgsql(at)j-davis(dot)com> wrote:
> On Thu, 2009-08-06 at 12:19 -0400, Robert Haas wrote:
>> Based on these comments, do you want to go ahead and mark this "Ready
>> for Committer"?
>
> Done, thanks Teodor.
>
> However, on the commitfest page, the patches got updated in the wrong
> places: "prefix support" and "filtering dictionary support" are pointing
> at each others' patches.

Fixed.

...Robert