Re: tsearch stop words

Lists: pgsql-hackers
From: "Christopher Kings-Lynne" <chriskl(at)familyhealth(dot)com(dot)au>
To: "Hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject: tsearch stop words
Date: 2002-09-02 04:57:47
Message-ID: GNELIHDDFBOCMGBFGEFOOEPNCDAA.chriskl@familyhealth.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

How do I get a list of what tsearch considers a stop word?

eg. 'and', 'or', 'the', 'up', 'down', etc. There seem to be heaps of
them...!

Chris


From: "Christopher Kings-Lynne" <chriskl(at)familyhealth(dot)com(dot)au>
To: "Oleg Bartunov" <oleg(at)sai(dot)msu(dot)su>
Cc: "Hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: tsearch stop words
Date: 2002-09-02 08:49:19
Message-ID: GNELIHDDFBOCMGBFGEFOGEABCEAA.chriskl@familyhealth.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Ummm...I totally don't understand the format of that array at all!

static ESWNODE engstoptree[] = {
{'m',L,9,126},
{'d',L,4,71},
{'b',L,2,40},
{'a',F,0,14},
{'c',0,0,62},
{'f',L,2,79},
{'e',0,0,75},
{'h',0,1,90},
{'i',F,0,108},
{'t',L,4,177},
{'o',L,2,135},
{'n',0,0,131},
{'s',0,0,156},

How do I figure out the actual word it's matching?

Chris

> -----Original Message-----
> From: Oleg Bartunov [mailto:oleg(at)sai(dot)msu(dot)su]
> Sent: Monday, 2 September 2002 5:33 PM
> To: Christopher Kings-Lynne
> Cc: Hackers
> Subject: Re: [HACKERS] tsearch stop words
>
>
> Christopher,
>
> current implementation is ugly, we still didn't move functionality
> from OpenFTS to tsearch. Look at makedict subdirectory to create your
> custom dictionary. Default list is in engstoptree[] defined
> in dic/porter_english.dct
>
> On Mon, 2 Sep 2002, Christopher Kings-Lynne wrote:
>
> > How do I get a list of what tsearch considers a stop word?
> >
> > eg. 'and', 'or', 'the', 'up', 'down', etc. There seem to be heaps of
> > them...!
> >
> > Chris
> >
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 2: you can get off all lists at once with the unregister command
> > (send "unregister YourEmailAddressHere" to majordomo(at)postgresql(dot)org)
> >
>
> Regards,
> Oleg
> _____________________________________________________________
> Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
> Sternberg Astronomical Institute, Moscow University (Russia)
> Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
> phone: +007(095)939-16-83, +007(095)939-23-83
>


From: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
To: Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>
Cc: Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: tsearch stop words
Date: 2002-09-02 09:32:38
Message-ID: Pine.GSO.4.44.0209021229270.24590-100000@ra.sai.msu.su
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Christopher,

current implementation is ugly, we still didn't move functionality
from OpenFTS to tsearch. Look at makedict subdirectory to create your
custom dictionary. Default list is in engstoptree[] defined
in dic/porter_english.dct

On Mon, 2 Sep 2002, Christopher Kings-Lynne wrote:

> How do I get a list of what tsearch considers a stop word?
>
> eg. 'and', 'or', 'the', 'up', 'down', etc. There seem to be heaps of
> them...!
>
> Chris
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
> (send "unregister YourEmailAddressHere" to majordomo(at)postgresql(dot)org)
>

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83


From: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
To: Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>
Cc: Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: tsearch stop words
Date: 2002-09-02 16:31:14
Message-ID: Pine.GSO.4.44.0209021607130.18164-101000@ra.sai.msu.su
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, 2 Sep 2002, Christopher Kings-Lynne wrote:

> Ummm...I totally don't understand the format of that array at all!

stop words in suffix tree :) I've told it's default list of stop
words and it's certainly far from perfect. Please find attached file
with stop words we used (I just found on my notebook). But again,
it's better to build your own dictionary specific for you domain,
using makedict script.

OpenFTS is much more flexible in this respect and we hope we'd be able
to implement most features of OpenFTS in tsearch, so OpenFTS would be just
a perl wrapper.

Oleg

>
> static ESWNODE engstoptree[] = {
> {'m',L,9,126},
> {'d',L,4,71},
> {'b',L,2,40},
> {'a',F,0,14},
> {'c',0,0,62},
> {'f',L,2,79},
> {'e',0,0,75},
> {'h',0,1,90},
> {'i',F,0,108},
> {'t',L,4,177},
> {'o',L,2,135},
> {'n',0,0,131},
> {'s',0,0,156},
>
> How do I figure out the actual word it's matching?
>
> Chris
>
>
> > -----Original Message-----
> > From: Oleg Bartunov [mailto:oleg(at)sai(dot)msu(dot)su]
> > Sent: Monday, 2 September 2002 5:33 PM
> > To: Christopher Kings-Lynne
> > Cc: Hackers
> > Subject: Re: [HACKERS] tsearch stop words
> >
> >
> > Christopher,
> >
> > current implementation is ugly, we still didn't move functionality
> > from OpenFTS to tsearch. Look at makedict subdirectory to create your
> > custom dictionary. Default list is in engstoptree[] defined
> > in dic/porter_english.dct
> >
> > On Mon, 2 Sep 2002, Christopher Kings-Lynne wrote:
> >
> > > How do I get a list of what tsearch considers a stop word?
> > >
> > > eg. 'and', 'or', 'the', 'up', 'down', etc. There seem to be heaps of
> > > them...!
> > >
> > > Chris
> > >
> > >
> > > ---------------------------(end of broadcast)---------------------------
> > > TIP 2: you can get off all lists at once with the unregister command
> > > (send "unregister YourEmailAddressHere" to majordomo(at)postgresql(dot)org)
> > >
> >
> > Regards,
> > Oleg
> > _____________________________________________________________
> > Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
> > Sternberg Astronomical Institute, Moscow University (Russia)
> > Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
> > phone: +007(095)939-16-83, +007(095)939-23-83
> >
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo(at)postgresql(dot)org
>

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

Attachment Content-Type Size
english.stem_stopword.gz application/octet-stream 316 bytes