Re: lexemes in prefix search going through dictionary modifications

From: Florian Pflug <fgp(at)phlo(dot)org>
To: sushant354(at)gmail(dot)com
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: lexemes in prefix search going through dictionary modifications
Date: 2011-10-25 17:27:07
Message-ID: 5A1A958A-6F52-4112-A28C-540B6AFBA34A@phlo.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Oct25, 2011, at 18:47 , Sushant Sinha wrote:
> On Tue, 2011-10-25 at 18:05 +0200, Florian Pflug wrote:
>> On Oct25, 2011, at 17:26 , Sushant Sinha wrote:
>>> I am currently using the prefix search feature in text search. I find
>>> that the prefix characters are treated the same as a normal lexeme and
>>> passed through stemming and stopword dictionaries. This seems like a bug
>>> to me.
>>
>> Hm, I don't think so. If they don't pass through stopword dictionaries,
>> then queries containing stopwords will fail to find any rows - which is
>> probably not what one would expect.
>
> I think what you are saying a feature is really a bug. I am fairly sure
> that when someone says to_tsquery('english', 's:*') one is looking for
> an entry that has a *non-stopword* word that starts with 's'. And
> specially so in a text search configuration that eliminates stop words.

But the whole idea of removing stopwords from the query is that users
*don't* need to be aware of the precise list of stopwords. The way I see
it, stopwords are simply an optimization that helps reduce the size of
your fulltext index.

Assume, for example, that the postgres mailing list archive search used
tsearch (which I think it does, but I'm not sure). It'd then probably make
sense to add "postgres" to the list of stopwords, because it's bound to
appear in nearly every mail. But wouldn't you want searched which include
'postgres*' to turn up empty? Quite certainly not.

> Does it even make sense to stem, abbreviate, synonym for a few letters?
> It will be so unpredictable.

That depends on the language. In german (my native tongue), one can
concatenate nouns to form new nouns. It's this not entirely unreasonable
that one would want the prefix to be stemmed to it's singular form before
being matched.

Also, suppose you're using a dictionary which corrects common typos. Who
says you wouldn't want that to be applied to prefix queries?

best regards,
Florian Pflug

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kerem Kat 2011-10-25 17:49:14 Re: (PATCH) Adding CORRESPONDING to Set Operations
Previous Message Pavel Stehule 2011-10-25 16:58:50 Re: Review: [PL/pgSQL] %TYPE and array declaration - second patch