Re: Behaviour of to_tsquery(stopwords only)

Lists: pgsql-hackers
From: Richard Huxton <dev(at)archonet(dot)com>
To: PGSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Behaviour of to_tsquery(stopwords only)
Date: 2008-03-06 10:19:00
Message-ID: 47CFC514.6030800@archonet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I'm not sure what value a tsquery has if it's composed from stopwords
only, but it doesn't seem to be null or equal to itself.

That strikes me as ... unintuitive, although I'm happy to be re-educated
on this.

I think it's because CompareTSQ (tsquery_op.c, line 142) doesn't have a
case to handle query sizes of zero. That's what seems to be returned
from tsearch/to_tsany.c lines ~ 345-350.

SELECT
qid,words,query,
(query is null) AS isnull,
(query = to_tsquery(words)) as issame
FROM
util.queries
ORDER BY qid DESC LIMIT 5;

NOTICE: text-search query contains only stop words or doesn't contain
lexemes, ignored
NOTICE: text-search query contains only stop words or doesn't contain
lexemes, ignored
qid | words | query | isnull | issame
------+----------+------------+--------+--------
1000 | to | | f | f
999 | or | | f | f
998 | requests | 'request' | f | t
997 | site | 'site' | f | t
996 | document | 'document' | f | t
(5 rows)

--
Richard Huxton
Archonet Ltd


From: Richard Huxton <dev(at)archonet(dot)com>
To: PGSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Behaviour of to_tsquery(stopwords only)
Date: 2008-03-06 11:08:05
Message-ID: 47CFD095.7010405@archonet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Further tsquery comparison fun:

=> SELECT q.qid, q.query, count(*) FROM doc.documents d, util.queries q
WHERE d.words @@ q.query AND (q.query::text=$$'tender'$$) GROUP BY
q.qid, q.query ;
qid | query | count
-----+----------+-------
195 | 'tender' | 374
248 | 'tender' | 374
257 | 'tender' | 374
332 | 'tender' | 374
401 | 'tender' | 374
409 | 'tender' | 374
519 | 'tender' | 374
557 | 'tender' | 374
736 | 'tender' | 374
749 | 'tender' | 374
869 | 'tender' | 374
879 | 'tender' | 374
926 | 'tender' | 374
(13 rows)

=> SELECT q.query, count(*) FROM doc.documents d, util.queries q WHERE
d.words @@ q.query AND (q.query::text=$$'tender'$$) GROUP BY q.query ;
query | count
----------+-------
'tender' | 1870
'tender' | 1496
'tender' | 1496
(3 rows)

It seems to be that the tsquery is remembering the shape of the original
query, even though it's been trimmed.

=> SELECT q.query, min(qid), max(qid), count(*) FROM doc.documents d,
util.queries q WHERE d.words @@ q.query AND (q.query::text=$$'tender'$$)
GROUP BY q.query ;
query | min | max | count
----------+-----+-----+-------
'tender' | 736 | 926 | 1870 (5 rows aggregated)
'tender' | 401 | 557 | 1496 (4 rows aggregated)
'tender' | 195 | 332 | 1496 (4 rows aggregated)
(3 rows)

=> SELECT * FROM util.queries WHERE qid IN (195,248, 257, 332,
401,409,519,557,736,749,869,879,926) ORDER BY qid;
qid | words | query
-----+---------------------+----------
195 | can & of & tenders | 'tender' (3 clauses)
248 | tender & the & this | 'tender' (3 clauses)
257 | have & tender & for | 'tender' (3 clauses)
332 | for & tenders & of | 'tender' (3 clauses)
401 | tender & with | 'tender' (2 clauses)
409 | tenders & to | 'tender' (2 clauses)
519 | tender & to | 'tender' (2 clauses)
557 | tenders & be | 'tender' (2 clauses)
736 | tenderer | 'tender' (1 clause)
749 | tender | 'tender' (1 clause)
869 | tender | 'tender' (1 clause)
879 | tender | 'tender' (1 clause)
926 | tender | 'tender' (1 clause)
(13 rows)

So - is this a bug, feature, "feature"?

--
Richard Huxton
Archonet Ltd


From: Teodor Sigaev <teodor(at)sigaev(dot)ru>
To: Richard Huxton <dev(at)archonet(dot)com>
Cc: PGSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Behaviour of to_tsquery(stopwords only)
Date: 2008-03-06 17:21:46
Message-ID: 47D0282A.6030305@sigaev.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> => SELECT * FROM util.queries WHERE qid IN (195,248, 257, 332,
> 401,409,519,557,736,749,869,879,926) ORDER BY qid;
> qid | words | query
> -----+---------------------+----------
> 195 | can & of & tenders | 'tender' (3 clauses)
> 248 | tender & the & this | 'tender' (3 clauses)
> 257 | have & tender & for | 'tender' (3 clauses)
> 332 | for & tenders & of | 'tender' (3 clauses)
> 401 | tender & with | 'tender' (2 clauses)
> 409 | tenders & to | 'tender' (2 clauses)
> 519 | tender & to | 'tender' (2 clauses)
> 557 | tenders & be | 'tender' (2 clauses)
> 736 | tenderer | 'tender' (1 clause)
> 749 | tender | 'tender' (1 clause)
> 869 | tender | 'tender' (1 clause)
> 879 | tender | 'tender' (1 clause)
> 926 | tender | 'tender' (1 clause)
> (13 rows)
>
> So - is this a bug, feature, "feature"?

It's definitely a bug:
select count(*), query from queries group by query;
count | query
-------+----------
3 | 'tender'
4 | 'tender'
4 | 'tender'
(3 rows)

Will fix it soon.
--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/


From: Richard Huxton <dev(at)archonet(dot)com>
To: Teodor Sigaev <teodor(at)sigaev(dot)ru>
Cc: PGSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Behaviour of to_tsquery(stopwords only)
Date: 2008-03-06 18:10:30
Message-ID: 47D03396.1080000@archonet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Teodor Sigaev wrote:
>>
>> So - is this a bug, feature, "feature"?
>
> It's definitely a bug:
> select count(*), query from queries group by query;
> count | query
> -------+----------
> 3 | 'tender'
> 4 | 'tender'
> 4 | 'tender'
> (3 rows)
>
> Will fix it soon.

Ah, smashing.

--
Richard Huxton
Archonet Ltd


From: Teodor Sigaev <teodor(at)sigaev(dot)ru>
To: Richard Huxton <dev(at)archonet(dot)com>
Cc: PGSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Behaviour of to_tsquery(stopwords only)
Date: 2008-03-07 15:56:37
Message-ID: 47D165B5.9050101@sigaev.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Fixed for CVS HEAD and 8.3, will fix for previous versions too.

Richard Huxton wrote:
> Teodor Sigaev wrote:
>>>
>>> So - is this a bug, feature, "feature"?
>>
>> It's definitely a bug:
>> select count(*), query from queries group by query;
>> count | query
>> -------+----------
>> 3 | 'tender'
>> 4 | 'tender'
>> 4 | 'tender'
>> (3 rows)
>>
>> Will fix it soon.
>
> Ah, smashing.
>

--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/