Re: Include Lists for Text Search

From: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Include Lists for Text Search
Date: 2007-09-10 14:04:13
Message-ID: Pine.LNX.4.64.0709101758520.2767@sn.sai.msu.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

On Mon, 10 Sep 2007, Simon Riggs wrote:

> On Mon, 2007-09-10 at 16:35 +0400, Oleg Bartunov wrote:
>> On Mon, 10 Sep 2007, Simon Riggs wrote:
>>
>>> On Mon, 2007-09-10 at 16:10 +0400, Oleg Bartunov wrote:
>>>> On Mon, 10 Sep 2007, Simon Riggs wrote:
>>>>
>>>>> It seems possible to write your own functions to support various
>>>>> possibilities with text search.
>>>>>
>>>>> One of the more common thoughts is to have a list of words that you
>>>>> would like to include, i.e. the opposite of a stop word list.
>>>>>
>>>>> There are clear indications that indexing too many words is a problem
>>>>> for both GIN and GIST. If people already know what they'll be looking
>>>>> for and what they will never be looking for, it seems easier to supply
>>>>> that list up front, rather than hide it behind lots of hand-crafted
>>>>> code.
>>>>>
>>>>> Can we include that functionality now?
>>>>
>>>> This could be realized very easyly using dict_strict, which returns
>>>> only known words, and mapping contains only this dictionary. So,
>>>> feel free to write it and submit.
>>>
>>> So there isn't one yet, but you think it will be easy to write and that
>>> we should call it dict_strict?
>>
>> we have dict_synonym already and if your list is not big you'll be happy.
>
> So I need to do something like
>
> CREATE TEXT SEARCH DICTIONARY my_diction (
> template = snowball,
> synonym = include_only_these_words
> );
>
> which will then look for a file called include_only_these_words.syn?
>
> I would prefer to be able to do something like this
>
> CREATE TEXT SEARCH DICTIONARY my_diction (
> template = snowball,
> include = justthese
> );
> ...which makes more sense to anyone reading it
> and I also want to make the comparison case insensitive.
>
> Would it be better to
> 1. include a new dictionary file (dict_strict, as you suggest)
> 2. a) allow case sensitivity as another option in dictionaries
> b) allow "include" as another word for "stoplist", but with the
> meaning reversed?
>
> e.g.
>
> CREATE TEXT SEARCH DICTIONARY my_diction (
> template = snowball,
> include = justthese,
> case_sensitive = true
> );

No, you need to write new template, which efficiently works with
big lists and support case insensitive comparison.

CREATE TEXT SEARCH TEMPLATE biglist (
.....
);

CREATE TEXT SEARCH DICTIONARY my_diction (
TEMPLATE = biglist,
DictFile = words,
case_sensitive = true
);

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2007-09-10 14:04:30 Re: invalidly encoded strings
Previous Message Oleg Bartunov 2007-09-10 13:58:37 Re: Include Lists for Text Search

Browse pgsql-patches by date

  From Date Subject
Next Message Tom Lane 2007-09-10 14:04:30 Re: invalidly encoded strings
Previous Message Oleg Bartunov 2007-09-10 13:58:37 Re: Include Lists for Text Search