Re: [GENERAL] Fragments in tsearch2 headline

From: Sushant Sinha <sushant354(at)gmail(dot)com>
To: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
Cc: Teodor Sigaev <teodor(at)sigaev(dot)ru>, Pierre-Yves Strub <pierre(dot)yves(dot)strub(at)gmail(dot)com>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [GENERAL] Fragments in tsearch2 headline
Date: 2008-07-16 23:08:53
Message-ID: 1216249733.5842.5.camel@dragflick
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

I will add test queries and their results for the corner cases in a
separate file. I guess the only thing I am confused about is what should
be the behavior of headline generation when Query items have words of
size less than ShortWord. I guess the answer is to ignore ShortWord
parameter but let me know if the answer is any different.

-Sushant.

On Thu, 2008-07-17 at 02:53 +0400, Oleg Bartunov wrote:
> Sushant,
>
> first, please, provide simple test queries, which demonstrate the right work
> in the corner cases. This will helps reviewers to test your patch and
> helps you to make sure your new version is ok. For example:
>
> =# select ts_headline('1 2 3 4 5 1 2 3 1','1&3'::tsquery);
> ts_headline
> ------------------------------------------------------
> <b>1</b> 2 <b>3</b> 4 5 <b>1</b> 2 <b>3</b> <b>1</b>
>
> This select breaks your code:
>
> =# select ts_headline('1 2 3 4 5 1 2 3 1','1&3'::tsquery,'maxfragments=2');
> ts_headline
> --------------
> ... 2 ...
>
> and so on ....
>
>
> Oleg
> On Tue, 15 Jul 2008, Sushant Sinha wrote:
>
> > Attached a new patch that:
> >
> > 1. fixes previous bug
> > 2. better handles the case when cover size is greater than the MaxWords.
> > Basically it divides a cover greater than MaxWords into fragments of
> > MaxWords, resizes each such fragment so that each end of the fragment
> > contains a query word and then evaluates best fragments based on number of
> > query words in each fragment. In case of tie it picks up the smaller
> > fragment. This allows more query words to be shown with multiple fragments
> > in case a single cover is larger than the MaxWords.
> >
> > The resizing of a fragment such that each end is a query word provides room
> > for stretching both sides of the fragment. This (hopefully) better presents
> > the context in which query words appear in the document. If a cover is
> > smaller than MaxWords then the cover is treated as a fragment.
> >
> > Let me know if you have any more suggestions or anything is not clear.
> >
> > I have not yet added the regression tests. The regression test suite seemed
> > to be only ensuring that the function works. How many tests should I be
> > adding? Is there any other place that I need to add different test cases for
> > the function?
> >
> > -Sushant.
> >
> >
> > Nice. But it will be good to resolve following issues:
> >> 1) Patch contains mistakes, I didn't investigate or carefully read it. Get
> >> http://www.sai.msu.su/~megera/postgres/fts/apod.dump.gz<http://www.sai.msu.su/%7Emegera/postgres/fts/apod.dump.gz>and load in db.
> >>
> >> Queries
> >> # select ts_headline(body, plainto_tsquery('black hole'), 'MaxFragments=1')
> >> from apod where to_tsvector(body) @@ plainto_tsquery('black hole');
> >>
> >> and
> >>
> >> # select ts_headline(body, plainto_tsquery('black hole'), 'MaxFragments=1')
> >> from apod;
> >>
> >> crash postgresql :(
> >>
> >> 2) pls, include in your patch documentation and regression tests.
> >>
> >>
> >>> Another change that I was thinking:
> >>>
> >>> Right now if cover size > max_words then I just cut the trailing words.
> >>> Instead I was thinking that we should split the cover into more
> >>> fragments such that each fragment contains a few query words. Then each
> >>> fragment will not contain all query words but will show more occurrences
> >>> of query words in the headline. I would like to know what your opinion
> >>> on this is.
> >>>
> >>
> >> Agreed.
> >>
> >>
> >> --
> >> Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
> >> WWW:
> >> http://www.sigaev.ru/
> >>
> >
>
> Regards,
> Oleg
> _____________________________________________________________
> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
> Sternberg Astronomical Institute, Moscow University, Russia
> Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
> phone: +007(495)939-16-83, +007(495)939-23-83

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Oleg Bartunov 2008-07-16 23:28:54 Re: [GENERAL] Fragments in tsearch2 headline
Previous Message Oleg Bartunov 2008-07-16 22:53:12 Re: [GENERAL] Fragments in tsearch2 headline

Browse pgsql-hackers by date

  From Date Subject
Next Message Oleg Bartunov 2008-07-16 23:28:54 Re: [GENERAL] Fragments in tsearch2 headline
Previous Message Simon Riggs 2008-07-16 23:06:10 Re: pgsql: Allow TRUNCATE foo, foo to succeed, per report from Nikhils.