Re: Ts_rank internals

From: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
To: Teodor Sigaev <teodor(at)sigaev(dot)ru>
Cc: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Ts_rank internals
Date: 2007-09-11 07:19:51
Message-ID: Pine.LNX.4.64.0709111118150.2767@sn.sai.msu.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 11 Sep 2007, Teodor Sigaev wrote:

>> I tried to understand how ts_rank works, but I failed. What does Cover
>> function do? How does it work? What is the DocRepresentation data
>> structure like? I can see the definition of the struct, and the
>> get_docrep function to convert to that format, but by reading those I
>> can't figure out what the resulting DocRepresentation looks like.
>> I wonder if we could get rid of the istrue flag in QueryOperand, and use
>> a local BitmapSet variable instead? It seems wrong to have a temporary
>> flag that's only used in one function, in a struct that's used everywhere.
> It's a play around CDR algorithms (Cover Density Ranking).
>
> Based on paper Clarke et al., Relevance Ranking for One to Three Term
> Queries. " (http://citeseer.ist.psu.edu/clarke00relevance.html. Sorry, I
> lost the article itself, but may be Oleg has it. Simple and short description
> is placed at http://www2002.org/CDROM/refereed/643/node7.html.
>
> We change original algorithm to support weight of lexeme, details are on
> Oleg's site: http://www.sai.msu.su/~megera/wiki/NewExtentsBasedRanking

Actually, we used two papers
http://citeseer.ist.psu.edu/clarke00relevance.html
and
http://portal.acm.org/ft_gateway.cfm?id=333137&type=pdf&dl=GUIDE&dl=ACM
I can send you the latter if you have no access to the ACM.

>
> Array of DocRepresentation is a representation of document, it contains only
> lexemes from both tsvector and tsquery, and lexemes are ordered by position -
> as in original doc. Each DocRepresentation has links to corresponding
> QueryOperand to optimize query execution while extent search. When we
> enlarge current extent for one word then we set istrue flag for corresponding
> QueryOperand and execution tsquery from cover becomes very simple task.
>
> It's possible to eliminate istrue flag, but it's needed to implement
> algorithm to execute tsquery over continuos part of document, not over whole
> document.
>
>
>
>

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Albe Laurenz 2007-09-11 07:41:34 Re: invalidly encoded strings
Previous Message Tatsuo Ishii 2007-09-11 07:17:06 Re: invalidly encoded strings