improvements for dict_xsyn extended synonym dictionary

Lists: pgsql-hackers
From: karpov(at)sao(dot)ru (Sergey V(dot) Karpov)
To: pgsql-hackers(at)postgresql(dot)org
Subject: improvements for dict_xsyn extended synonym dictionary
Date: 2009-07-14 19:35:28
Message-ID: 877hyboy67.fsf@sao.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Greetings,

attached is a simple patch that extends the functionality of dict_xsyn
extended synonym dictionary (from contrib) by adding the following
configuration option:

- "mode" option controls the current dictionary mode of operation. Can be one of:

- in "simple" mode it accepts the original word and returns all synonyms
as ORed lis.

- when mode is "symmetric", the dictionary accepts the original word or
any of its synonyms, and return all others as ORed list.

- in "map" regime it accepts any synonym and returns the original word
instead of it. Also, it accepts and returns the original word
itself, even if keeporig is false.

Default for this option is "simple" to keep compatibility with original
version.

Quick example:

> cat $SHAREDIR/tsearch_data/my_rules.syn
word syn1 syn2 syn3

mydb# ALTER TEXT SEARCH DICTIONARY xsyn (RULES='my_rules', KEEPORIG=false, MODE='simple');
ALTER TEXT SEARCH DICTIONARY

mydb=# SELECT ts_lexize('xsyn', 'word');
ts_lexize
-----------------------
{syn1,syn2,syn3}

mydb# ALTER TEXT SEARCH DICTIONARY xsyn (RULES='my_rules', KEEPORIG=true, MODE='simple');
ALTER TEXT SEARCH DICTIONARY

mydb=# SELECT ts_lexize('xsyn', 'word');
ts_lexize
-----------------------
{word,syn1,syn2,syn3}

mydb# ALTER TEXT SEARCH DICTIONARY xsyn (RULES='my_rules', KEEPORIG=false, MODE='symmetric');
ALTER TEXT SEARCH DICTIONARY

mydb=# SELECT ts_lexize('xsyn', 'syn1');
ts_lexize
-----------------------
{word,syn2,syn3}

mydb# ALTER TEXT SEARCH DICTIONARY xsyn (RULES='my_rules', KEEPORIG=false, MODE='map');
ALTER TEXT SEARCH DICTIONARY

mydb=# SELECT ts_lexize('xsyn', 'syn1');
ts_lexize
-----------------------
{word}

Thanks for your attention.

Sergey Karpov.

Attachment Content-Type Size
dict_xsyn_extended.diff.gz application/octet-stream 2.8 KB

From: Andres Freund <andres(at)anarazel(dot)de>
To: pgsql-hackers(at)postgresql(dot)org
Cc: "Sergey V(dot) Karpov" <karpov(at)sao(dot)ru>
Subject: Re: improvements for dict_xsyn extended synonym dictionary - RRR
Date: 2009-07-26 02:01:29
Message-ID: 200907260401.30243.andres@anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi Sergey,

On Tuesday 14 July 2009 21:35:28 Sergey V. Karpov wrote:
> attached is a simple patch that extends the functionality of dict_xsyn
> extended synonym dictionary (from contrib) by adding the following
> configuration option:
>
> - "mode" option controls the current dictionary mode of operation. Can be
> one of:
>
> - in "simple" mode it accepts the original word and returns all synonyms
> as ORed lis.
>
> - when mode is "symmetric", the dictionary accepts the original word or
> any of its synonyms, and return all others as ORed list.
>
> - in "map" regime it accepts any synonym and returns the original word
> instead of it. Also, it accepts and returns the original word
> itself, even if keeporig is false.
Some points:
- Patch looks generally sound
- lacks a bit of a motivational statement, even though one can imagine uses
- Imho mode=MAP should error out if keeporig is false
- I personally find the the names for the different modes a bit nondescriptive.
One possibility would be to introduce parameters like:
- matchorig
- matchsynonym
- keeporig
- keepsynonym
That sounds way much easier to grasp for me.

Comments?

Andres


From: karpov(at)sao(dot)ru (Sergey V(dot) Karpov)
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: improvements for dict_xsyn extended synonym dictionary - RRR
Date: 2009-07-27 10:01:46
Message-ID: 87skgifnqt.fsf@sao.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andres Freund <andres(at)anarazel(dot)de> writes:

Hi Andres,

Thank you for review of my patch.

> Some points:
> - Patch looks generally sound
> - lacks a bit of a motivational statement, even though one can imagine uses

The patch has initially been motivated by the request in pgsql-general
(http://archives.postgresql.org/pgsql-general/2009-02/msg00102.php).

> - Imho mode=MAP should error out if keeporig is false
> - I personally find the the names for the different modes a bit nondescriptive.
> One possibility would be to introduce parameters like:
> - matchorig
> - matchsynonym
> - keeporig
> - keepsynonym
> That sounds way much easier to grasp for me.

Yes, I agree. In such a way user has the complete (and more straightforward)
control over the dictionary behaviour.

Here is the revised patch version, with following options:

* matchorig controls whether the original word is accepted by the
dictionary. Default is true.

* keeporig controls whether the original word is included (if true)
in results, or only its synonyms (if false). Default is true.

* matchsynonyms controls whether any of the synonyms is accepted by
the dictionary (if true). Default is false.

* keepsynonyms controls whether synonyms are returned by the
dictionary (if true). Default is true.

Defaults are set to keep default behaviour compatible with original version.

Thanks,
Sergey

Attachment Content-Type Size
dict_xsyn.diff text/x-patch 12.4 KB

From: Andres Freund <andres(at)anarazel(dot)de>
To: pgsql-hackers(at)postgresql(dot)org
Cc: "Sergey V(dot) Karpov" <karpov(at)sao(dot)ru>
Subject: Re: improvements for dict_xsyn extended synonym dictionary - RRR
Date: 2009-07-29 22:59:49
Message-ID: 200907300059.50365.andres@anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi Sergey,

Sorry that the second round took almost as long as the first one...

On Monday 27 July 2009 12:01:46 Sergey V. Karpov wrote:
> > - Imho mode=MAP should error out if keeporig is false
> > - I personally find the the names for the different modes a bit
> > nondescriptive. One possibility would be to introduce parameters like:
> > - matchorig
> > - matchsynonym
> > - keeporig
> > - keepsynonym
> > That sounds way much easier to grasp for me.
> Yes, I agree. In such a way user has the complete (and more
> straightforward) control over the dictionary behaviour.
>
> Here is the revised patch version, with following options:
>
> * matchorig controls whether the original word is accepted by the
> dictionary. Default is true.
>
> * keeporig controls whether the original word is included (if true)
> in results, or only its synonyms (if false). Default is true.
>
> * matchsynonyms controls whether any of the synonyms is accepted by
> the dictionary (if true). Default is false.
>
> * keepsynonyms controls whether synonyms are returned by the
> dictionary (if true). Default is true.
>
> Defaults are set to keep default behaviour compatible with original
> version.
Looks nice. The only small gripe I have is that the patch adds trailing
whitespaces at a lot of places...

Except maybe that I do see no need for changes anymore...

Andres


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-hackers(at)postgresql(dot)org, "Sergey V(dot) Karpov" <karpov(at)sao(dot)ru>
Subject: Re: improvements for dict_xsyn extended synonym dictionary - RRR
Date: 2009-07-30 00:42:53
Message-ID: 603c8f070907291742g4bb6a1dbi64aedce2e53a0828@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Jul 29, 2009 at 6:59 PM, Andres Freund<andres(at)anarazel(dot)de> wrote:
> Looks nice. The only small gripe I have is that the patch adds trailing
> whitespaces at a lot of places...
>
> Except maybe that I do see no need for changes anymore...

I have fixed this for Sergey in the attached version using "git apply
--whitespace=fix". (For those who may be using git to develop
patches, I highly recommend git --check to catch these types of issues
before submitting.)

I will mark this "Ready for Committer".

...Robert

Attachment Content-Type Size
dict_xsyn.patch text/x-diff 13.8 KB

From: karpov(at)sao(dot)ru (Sergey V(dot) Karpov)
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: improvements for dict_xsyn extended synonym dictionary - RRR
Date: 2009-07-30 07:46:57
Message-ID: 8763dad34e.fsf@sao.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andres Freund <andres(at)anarazel(dot)de> writes:

Hi Andres,

> Looks nice. The only small gripe I have is that the patch adds trailing
> whitespaces at a lot of places...
>
> Except maybe that I do see no need for changes anymore...

My fault. Please check the patch version attached - I've tried to fix
all those.

Thanks,
Sergey

Attachment Content-Type Size
dict_xsyn.nowhite.diff text/x-patch 12.4 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: karpov(at)sao(dot)ru (Sergey V(dot) Karpov), Teodor Sigaev <teodor(at)sigaev(dot)ru>
Cc: Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: improvements for dict_xsyn extended synonym dictionary - RRR
Date: 2009-07-30 20:03:26
Message-ID: 2688.1248984206@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

karpov(at)sao(dot)ru (Sergey V. Karpov) writes:
> Andres Freund <andres(at)anarazel(dot)de> writes:
>> Looks nice. The only small gripe I have is that the patch adds trailing
>> whitespaces at a lot of places...

> My fault. Please check the patch version attached - I've tried to fix
> all those.

I did some minor cleanup on this patch:
* make the two parsing loops less confusingly different
* remove unused 'pos' field of Syn
* avoid some unnecessary pallocs
* improve the comments and docs a bit

I think it's "ready for committer" too, but the committer I have in mind
is Teodor --- he's the ultimate expert on tsearch stuff. Teodor, have
you got time to look this over and commit it?

regards, tom lane

Attachment Content-Type Size
dict_xsyn_3.patch.gz application/octet-stream 2.8 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: karpov(at)sao(dot)ru (Sergey V(dot) Karpov), Teodor Sigaev <teodor(at)sigaev(dot)ru>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: improvements for dict_xsyn extended synonym dictionary - RRR
Date: 2009-08-05 18:08:05
Message-ID: 26349.1249495685@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I wrote:
> karpov(at)sao(dot)ru (Sergey V. Karpov) writes:
>> Andres Freund <andres(at)anarazel(dot)de> writes:
>>> Looks nice. The only small gripe I have is that the patch adds trailing
>>> whitespaces at a lot of places...

>> My fault. Please check the patch version attached - I've tried to fix
>> all those.

> I did some minor cleanup on this patch:

I've committed this version.

regards, tom lane