Lists: | pgsql-hackers |
---|
From: | Oleg Bartunov <oleg(at)sai(dot)msu(dot)su> |
---|---|
To: | Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Prefix support for synonym dictionary |
Date: | 2009-07-14 15:25:20 |
Message-ID: | Pine.LNX.4.64.0907141922530.8065@sn.sai.msu.ru |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
Hi there,
attached is our patch for CVS HEAD, which adds prefix support for synonym
dictionary.
Quick example:
> cat $SHAREDIR/tsearch_data/synonym_sample.syn
postgres pgsql
postgresql pgsql
postgre pgsql
gogle googl
indices index*
=# create text search dictionary syn( template=synonym,synonyms='synonym_sample');
=# select ts_lexize('syn','indices');
ts_lexize
-----------
{index}
(1 row)
=# create text search configuration tst ( copy=simple);
=# alter text search configuration tst alter mapping for asciiword with syn;
=# select to_tsquery('tst','indices');
to_tsquery
------------
'index':*
(1 row)
=# select 'indexes are very useful'::tsvector @@ to_tsquery('tst','indices');
?column?
----------
t
(1 row)
Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83
Attachment | Content-Type | Size |
---|---|---|
synonym_prefix.gz | application/octet-stream | 2.2 KB |
From: | Jeff Davis <pgsql(at)j-davis(dot)com> |
---|---|
To: | Oleg Bartunov <oleg(at)sai(dot)msu(dot)su> |
Cc: | Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Prefix support for synonym dictionary |
Date: | 2009-08-02 19:05:24 |
Message-ID: | 1249239924.4765.6926.camel@jdavis |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
Hi,
The patch looks good.
Comments:
1. The docs should be clarified a little. For instance, it should have a
link back to the definition of a prefix search (12.3.2). I included my
doc suggestions as an attachment.
2. dsynonym_init() uses findwrd() in a slightly confusing (and perhaps
fragile) way. After calling findwrd(), the "end" pointer is pointing at
either the end of the string, or the *; depending on whether the string
ends in * and whether flags is NULL. I only mention this because I had
to take a more careful look to see what was happening. Perhaps add a
comment to make it more clear?
3. The patch looks for the special byte '*'. I think that's fine,
because we depend on the files being in UTF-8 encoding, where it's the
same byte. However, I thought it was worth mentioning in case we want to
support other encodings for text search files later.
Regards,
Jeff Davis
Attachment | Content-Type | Size |
---|---|---|
prefix-synonym-review.diff | text/x-patch | 2.1 KB |
From: | Robert Haas <robertmhaas(at)gmail(dot)com> |
---|---|
To: | Jeff Davis <pgsql(at)j-davis(dot)com> |
Cc: | Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Prefix support for synonym dictionary |
Date: | 2009-08-05 16:34:03 |
Message-ID: | 603c8f070908050934y72ace1c2sf72b9f10a0c1c652@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Sun, Aug 2, 2009 at 3:05 PM, Jeff Davis<pgsql(at)j-davis(dot)com> wrote:
> The patch looks good.
>
> Comments:
>
> 1. The docs should be clarified a little. For instance, it should have a
> link back to the definition of a prefix search (12.3.2). I included my
> doc suggestions as an attachment.
>
> 2. dsynonym_init() uses findwrd() in a slightly confusing (and perhaps
> fragile) way. After calling findwrd(), the "end" pointer is pointing at
> either the end of the string, or the *; depending on whether the string
> ends in * and whether flags is NULL. I only mention this because I had
> to take a more careful look to see what was happening. Perhaps add a
> comment to make it more clear?
>
> 3. The patch looks for the special byte '*'. I think that's fine,
> because we depend on the files being in UTF-8 encoding, where it's the
> same byte. However, I thought it was worth mentioning in case we want to
> support other encodings for text search files later.
Oleg,
Are you planning to update this patch this week? If not I will set it
to "Returned with Feedback".
Thanks,
...Robert
From: | Jeff Davis <pgsql(at)j-davis(dot)com> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Prefix support for synonym dictionary |
Date: | 2009-08-05 18:17:09 |
Message-ID: | 1249496229.3653.1426.camel@monkey-cat.sm.truviso.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Wed, 2009-08-05 at 12:34 -0400, Robert Haas wrote:
> Oleg,
>
> Are you planning to update this patch this week? If not I will set it
> to "Returned with Feedback".
My only comments were related to docs and comments, and I supplied a
patch as a suggested fix for the docs. Also, the patch is very small.
I'd hate to hold it up over such a minor issue, and it seems like a
useful feature. If Oleg is unavailable, would you mind just having a
second review of the patch to see if they agree with my suggestions, and
then mark "ready for committer review"?
Regards,
Jeff Davis
From: | Teodor Sigaev <teodor(at)sigaev(dot)ru> |
---|---|
To: | Jeff Davis <pgsql(at)j-davis(dot)com> |
Cc: | Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Prefix support for synonym dictionary |
Date: | 2009-08-06 15:58:49 |
Message-ID: | 4A7AFDB9.4050905@sigaev.ru |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
> 1. The docs should be clarified a little. For instance, it should have a
> link back to the definition of a prefix search (12.3.2). I included my
> doc suggestions as an attachment.
Thank you, merged
> 2. dsynonym_init() uses findwrd() in a slightly confusing (and perhaps
> fragile) way. After calling findwrd(), the "end" pointer is pointing at
> either the end of the string, or the *; depending on whether the string
> ends in * and whether flags is NULL. I only mention this because I had
> to take a more careful look to see what was happening. Perhaps add a
> comment to make it more clear?
Add comments:
/*
* Finds the next whitespace-delimited word within the 'in' string.
* Returns a pointer to the first character of the word, and a pointer
* to the next byte after the last character in the word (in *end).
* Character '*' at the end of word will not be threated as word
* charater if flags is not null.
*/
static char *
findwrd(char *in, char **end, uint16 *flags)
> 3. The patch looks for the special byte '*'. I think that's fine,
> because we depend on the files being in UTF-8 encoding, where it's the
> same byte. However, I thought it was worth mentioning in case we want to
> support other encodings for text search files later.
tsearch_readline() converts file's UTF8 encoding into server encoding. pgsql
supports only encoding which are a superset of ASCII. So it's safe to use
asterisk with any encodings
--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/
Attachment | Content-Type | Size |
---|---|---|
synonym_prefix-0.2.gz | application/x-tar | 2.4 KB |
From: | Robert Haas <robertmhaas(at)gmail(dot)com> |
---|---|
To: | Teodor Sigaev <teodor(at)sigaev(dot)ru> |
Cc: | Jeff Davis <pgsql(at)j-davis(dot)com>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Prefix support for synonym dictionary |
Date: | 2009-08-06 16:19:51 |
Message-ID: | 603c8f070908060919o5151741aq3af69fff573ea83e@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
2009/8/6 Teodor Sigaev <teodor(at)sigaev(dot)ru>:
>> 1. The docs should be clarified a little. For instance, it should have a
>> link back to the definition of a prefix search (12.3.2). I included my
>> doc suggestions as an attachment.
>
> Thank you, merged
>
>> 2. dsynonym_init() uses findwrd() in a slightly confusing (and perhaps
>> fragile) way. After calling findwrd(), the "end" pointer is pointing at
>> either the end of the string, or the *; depending on whether the string
>> ends in * and whether flags is NULL. I only mention this because I had
>> to take a more careful look to see what was happening. Perhaps add a
>> comment to make it more clear?
>
> Add comments:
> /*
> * Finds the next whitespace-delimited word within the 'in' string.
> * Returns a pointer to the first character of the word, and a pointer
> * to the next byte after the last character in the word (in *end).
> * Character '*' at the end of word will not be threated as word
> * charater if flags is not null.
> */
> static char *
> findwrd(char *in, char **end, uint16 *flags)
>
>
>
>> 3. The patch looks for the special byte '*'. I think that's fine,
>> because we depend on the files being in UTF-8 encoding, where it's the
>> same byte. However, I thought it was worth mentioning in case we want to
>> support other encodings for text search files later.
>
> tsearch_readline() converts file's UTF8 encoding into server encoding. pgsql
> supports only encoding which are a superset of ASCII. So it's safe to use
> asterisk with any encodings
Jeff,
Based on these comments, do you want to go ahead and mark this "Ready
for Committer"?
https://commitfest.postgresql.org/action/patch_view?id=133
...Robert
From: | Jeff Davis <pgsql(at)j-davis(dot)com> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | Teodor Sigaev <teodor(at)sigaev(dot)ru>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Prefix support for synonym dictionary |
Date: | 2009-08-06 16:53:42 |
Message-ID: | 1249577622.9256.740.camel@jdavis |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Thu, 2009-08-06 at 12:19 -0400, Robert Haas wrote:
> Based on these comments, do you want to go ahead and mark this "Ready
> for Committer"?
Done, thanks Teodor.
However, on the commitfest page, the patches got updated in the wrong
places: "prefix support" and "filtering dictionary support" are pointing
at each others' patches.
Regards,
Jeff Davis
From: | Robert Haas <robertmhaas(at)gmail(dot)com> |
---|---|
To: | Jeff Davis <pgsql(at)j-davis(dot)com> |
Cc: | Teodor Sigaev <teodor(at)sigaev(dot)ru>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Prefix support for synonym dictionary |
Date: | 2009-08-06 17:55:50 |
Message-ID: | 603c8f070908061055q63426202ja0d3312c9a263097@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Thu, Aug 6, 2009 at 12:53 PM, Jeff Davis<pgsql(at)j-davis(dot)com> wrote:
> On Thu, 2009-08-06 at 12:19 -0400, Robert Haas wrote:
>> Based on these comments, do you want to go ahead and mark this "Ready
>> for Committer"?
>
> Done, thanks Teodor.
>
> However, on the commitfest page, the patches got updated in the wrong
> places: "prefix support" and "filtering dictionary support" are pointing
> at each others' patches.
Fixed.
...Robert