Lists: | pgsql-hackers |
---|
From: | Sushant Sinha <sushant354(at)gmail(dot)com> |
---|---|
To: | pgsql-hackers(at)postgresql(dot)org |
Subject: | bug in ts_rank_cd |
Date: | 2010-12-21 13:38:35 |
Message-ID: | 1292938715.2327.9.camel@yoffice |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
MY PREV EMAIL HAD A PROBLEM. Please reply to this one
======================================================
There is a bug in ts_rank_cd. It does not correctly give rank when the
query lexeme is the first one in the tsvector.
Example:
select ts_rank_cd(to_tsvector('english', 'abc sdd'),
plainto_tsquery('english', 'abc'));
ts_rank_cd
------------
0
select ts_rank_cd(to_tsvector('english', 'bcg abc sdd'),
plainto_tsquery('english', 'abc'));
ts_rank_cd
------------
0.1
The problem is that the Cover finding algorithm ignores the lexeme at
the 0th position, I have attached a patch which fixes it. After the
patch the result is fine.
select ts_rank_cd(to_tsvector('english', 'abc sdd'), plainto_tsquery(
'english', 'abc'));
ts_rank_cd
------------
0.1
Attachment | Content-Type | Size |
---|---|---|
tsrankbugfix.patch | text/x-patch | 415 bytes |
From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Sushant Sinha <sushant354(at)gmail(dot)com> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: bug in ts_rank_cd |
Date: | 2010-12-22 04:03:55 |
Message-ID: | 23981.1292990635@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
Sushant Sinha <sushant354(at)gmail(dot)com> writes:
> There is a bug in ts_rank_cd. It does not correctly give rank when the
> query lexeme is the first one in the tsvector.
Hmm ... I cannot reproduce the behavior you're complaining of.
You say
> select ts_rank_cd(to_tsvector('english', 'abc sdd'),
> plainto_tsquery('english', 'abc'));
> ts_rank_cd
> ------------
> 0
but I get
regression=# select ts_rank_cd(to_tsvector('english', 'abc sdd'),
regression(# plainto_tsquery('english', 'abc'));
ts_rank_cd
------------
0.1
(1 row)
> The problem is that the Cover finding algorithm ignores the lexeme at
> the 0th position,
As far as I can tell, there is no "0th position" --- tsvector counts
positions from one. The only way to see pos == 0 in the input to
Cover() is if the tsvector has been stripped of position information.
ts_rank_cd is documented to return 0 in that situation. Your patch
would have the effect of causing it to return some nonzero, but quite
bogus, ranking.
regards, tom lane
From: | Sushant Sinha <sushant354(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: bug in ts_rank_cd |
Date: | 2010-12-22 12:44:43 |
Message-ID: | 1293021883.1985.2.camel@yoffice |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
Sorry for sounding the false alarm. I was not running the vanilla
postgres and that is why I was seeing that problem. Should have checked
with the vanilla one.
-Sushant
On Tue, 2010-12-21 at 23:03 -0500, Tom Lane wrote:
> Sushant Sinha <sushant354(at)gmail(dot)com> writes:
> > There is a bug in ts_rank_cd. It does not correctly give rank when the
> > query lexeme is the first one in the tsvector.
>
> Hmm ... I cannot reproduce the behavior you're complaining of.
> You say
>
> > select ts_rank_cd(to_tsvector('english', 'abc sdd'),
> > plainto_tsquery('english', 'abc'));
> > ts_rank_cd
> > ------------
> > 0
>
> but I get
>
> regression=# select ts_rank_cd(to_tsvector('english', 'abc sdd'),
> regression(# plainto_tsquery('english', 'abc'));
> ts_rank_cd
> ------------
> 0.1
> (1 row)
>
> > The problem is that the Cover finding algorithm ignores the lexeme at
> > the 0th position,
>
> As far as I can tell, there is no "0th position" --- tsvector counts
> positions from one. The only way to see pos == 0 in the input to
> Cover() is if the tsvector has been stripped of position information.
> ts_rank_cd is documented to return 0 in that situation. Your patch
> would have the effect of causing it to return some nonzero, but quite
> bogus, ranking.
>
> regards, tom lane