Inconsistency with stemming/stop words in Tsearch2

From: Yishai Lerner <yish(at)alum(dot)mit(dot)edu>
To: pgsql-general(at)postgresql(dot)org
Subject: Inconsistency with stemming/stop words in Tsearch2
Date: 2008-07-14 22:32:24
Message-ID: E18D9633-0692-4D9D-A11D-0C0303FAC508@alum.mit.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi, having an issue with Tsearch2 and how stop words lexemes are
sometimes being utilized and sometimes not. I would expect the
behavior for to_tsquery for the three variations of "what", "what's"
and "whats" to be consistent (using 'en_stem') and for all variations
to be ignored since they all result in a stop word of "what".
However, this is not the case as to_tsquery("whats") returns the stop
word "what" as a result. Even more confusing is that if one were to
look at the lexize results below, they are inconsistent with the
to_tsquery results below. This seems like a bug to me.

goodrec_2=# select lexize('en_stem', 'what''s');
lexize
--------
{what}

goodrec_2=# select lexize('en_stem', 'whats');
lexize
--------
{what}

goodrec_2=# select lexize('en_stem', 'what');
lexize
--------
{}

goodrec_2=# select to_tsquery('what''s');
NOTICE: query contains only stopword(s) or doesn't contain lexeme
(s), ignored
to_tsquery

goodrec_2=# select to_tsquery('whats');
to_tsquery
------------
'what'

goodrec_2=# select to_tsquery('what');
NOTICE: query contains only stopword(s) or doesn't contain lexeme
(s), ignored

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Bayless Kirtley 2008-07-15 00:44:05 Cause of error message?
Previous Message Andrej Ricnik-Bay 2008-07-14 21:52:03 Re: plperl installation