Re: [GENERAL] Fragments in tsearch2 headline

From: Sushant Sinha <sushant354(at)gmail(dot)com>
To: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
Cc: Teodor Sigaev <teodor(at)sigaev(dot)ru>, Pierre-Yves Strub <pierre(dot)yves(dot)strub(at)gmail(dot)com>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [GENERAL] Fragments in tsearch2 headline
Date: 2008-08-02 15:50:18
Message-ID: 1217692218.6000.8.camel@dragflick
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

Sorry for the delay. Here is the patch with FragmentDelimiter option.
It requires an extra option in HeadlineParsedText and uses that option
during generateHeadline.

Implementing notion of fragments in HeadlineParsedText and a separate
function to join them seems more complicated. So for the time being I
just dump a FragmentDelimiter whenever a new fragment (other than the
first one) starts.

The patch also contains the updated regression tests/results and also a
new test for FragmentDelimiter option. It also contains the
documentation for the new options.

I have also attached a separate file that tests different aspects of the
new headline generation function.

Let me know if anything else is needed.

-Sushant.

On Thu, 2008-07-24 at 00:28 +0400, Oleg Bartunov wrote:
> On Wed, 23 Jul 2008, Sushant Sinha wrote:
>
> > I guess it is more readable to add cover separator at the end of a fragment
> > than in the front. Let me know what you think and I can update it.
>
> FragmentsDelimiter should *separate* fragments and that says all.
> Not very difficult algorithmic problem, it's like perl's
> join(FragmentsDelimiter, @array)
>
> >
> > I think the right place for cover separator is in the structure
> > HeadlineParsedText just like startsel and stopsel. This will enable users to
> > specify their own cover separators. But this will require changes to the
> > structure as well as to the generateHeadline function. This option will not
> > also play well with the default headline generation function.
>
> As soon as we introduce FragmentsDelimiter we should make it
> configurable.
>
> >
> > The default MaxWords = 35 seems a bit high for this headline generation
> > function and 20 seems to be more reasonable. Any thoughts?
>
> I think we should not change default value because it could change
> behaviour of existing applications. I'm not sure if it'd be useful and
> possible to define default values in CREATE TEXT SEARCH PARSER
>
> >
> > -Sushant.
> >
> > On Wed, Jul 23, 2008 at 7:44 AM, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su> wrote:
> >
> >> btw, is it intentional to have '....' in headline ?
> >>
> >> =# select ts_headline('1 2 3 4 5 1 2 3 1','1&4'::tsquery,'MaxFragments=1');
> >> ts_headline
> >> -------------------------
> >> ... <b>4</b> 5 <b>1</b>
> >>
> >>
> >>
> >> Oleg
> >>
> >> On Wed, 23 Jul 2008, Teodor Sigaev wrote:
> >>
> >> Let me know of any other changes that are needed.
> >>>>
> >>>
> >>> Looks like ready to commit, but documentation is needed.
> >>>
> >>>
> >>>
> >> Regards,
> >> Oleg
> >> _____________________________________________________________
> >> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
> >> Sternberg Astronomical Institute, Moscow University, Russia
> >> Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/<http://www.sai.msu.su/%7Emegera/>
> >> phone: +007(495)939-16-83, +007(495)939-23-83
> >>
> >
>
> Regards,
> Oleg
> _____________________________________________________________
> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
> Sternberg Astronomical Institute, Moscow University, Russia
> Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
> phone: +007(495)939-16-83, +007(495)939-23-83

Attachment Content-Type Size
headlines_v0.9.patch text/x-patch 23.1 KB
headlines_test.txt text/x-vhdl 8.4 KB

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Jaime Casanova 2008-08-02 17:53:11 Re: non-WAL btree?
Previous Message David Fetter 2008-08-02 15:11:43 Re: Advice on implementing counters in postgreSQL

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2008-08-02 16:12:50 Re: Re: [Pljava-dev] Should creating a new base type require superuser status?
Previous Message Alvaro Herrera 2008-08-02 15:47:08 Re: Parsing of pg_hba.conf and authentication inconsistencies