Re: text search patch status update?

Lists: pgsql-hackers
From: Sushant Sinha <sushant354(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: text search patch status update?
Date: 2008-09-16 04:47:32
Message-ID: 1221540452.10731.3.camel@dragflick
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Any status updates on the following patches?

1. Fragments in tsearch2 headlines:
http://archives.postgresql.org/pgsql-hackers/2008-08/msg00043.php

2. Bug in hlCover:
http://archives.postgresql.org/pgsql-hackers/2008-08/msg00089.php

-Sushant.


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Sushant Sinha <sushant354(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: text search patch status update?
Date: 2008-09-16 15:27:09
Message-ID: 20080916152709.GF9229@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Sushant Sinha escribió:
> Any status updates on the following patches?
>
> 1. Fragments in tsearch2 headlines:
> http://archives.postgresql.org/pgsql-hackers/2008-08/msg00043.php
>
> 2. Bug in hlCover:
> http://archives.postgresql.org/pgsql-hackers/2008-08/msg00089.php

Are these ready for review? If so, please add them to this commitfest,
http://wiki.postgresql.org/wiki/CommitFest:2008-09

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: "Sushant Sinha" <sushant354(at)gmail(dot)com>
To: "Alvaro Herrera" <alvherre(at)commandprompt(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: text search patch status update?
Date: 2008-09-16 16:00:06
Message-ID: 9fb559330809160900q6705e436g92e75cf9cf721c5d@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Patch #1. Teodor was fine with the previous version of the patch. After that
I modified it slightly to allow a FragmentDelimiter option and Teodor may
have to look at that.

Patch #2. I think this is a straigt forward bug fix.

-Sushant.

On Tue, Sep 16, 2008 at 11:27 AM, Alvaro Herrera <alvherre(at)commandprompt(dot)com
> wrote:

> Sushant Sinha escribió:
> > Any status updates on the following patches?
> >
> > 1. Fragments in tsearch2 headlines:
> > http://archives.postgresql.org/pgsql-hackers/2008-08/msg00043.php
> >
> > 2. Bug in hlCover:
> > http://archives.postgresql.org/pgsql-hackers/2008-08/msg00089.php
>
> Are these ready for review? If so, please add them to this commitfest,
> http://wiki.postgresql.org/wiki/CommitFest:2008-09
>
> --
> Alvaro Herrera
> http://www.CommandPrompt.com/
> PostgreSQL Replication, Consulting, Custom Development, 24x7 support
>


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Sushant Sinha <sushant354(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: text search patch status update?
Date: 2008-09-17 10:32:29
Message-ID: 48D0DCBD.7030904@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Sushant Sinha wrote:
> Patch #2. I think this is a straigt forward bug fix.

Yes, I think you're right. In hlCover(), *q is 0 when the only match is
the first item in the text, and we shouldn't bail out with "return
false" in that case.

But there seems to be something else going on here as well:

postgres=# select ts_headline('1 2 3 4 5', '2'::tsquery, 'MinWords=2,
MaxWords=3');
ts_headline
--------------
<b>2</b> 3 4
(1 row)

postgres=# select ts_headline('aaa1 aaa2 aaa3 aaa4
aaa5','aaa2'::tsquery, 'MinWords=2, MaxWords=3');
ts_headline
------------------
<b>aaa2</b> aaa3
(1 row)

In the first example, you get three words, and in the 2nd, just two. It
must be because of the default ShortWord setting of 3. Also, if only the
last word matches, and it's a "short word", you get the whole text:

postgres=# select ts_headline('1 2 3 4 5','5'::tsquery, 'MinWords=2,
MaxWords=3');
ts_headline
------------------
1 2 3 4 <b>5</b>
(1 row)

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Teodor Sigaev <teodor(at)sigaev(dot)ru>
To: sushant354(at)gmail(dot)com
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: text search patch status update?
Date: 2008-09-17 16:06:07
Message-ID: 48D12AEF.9080506@sigaev.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I remember about that but right now I havn't time to make final review. Sorry.
Will return soon.

Sushant Sinha wrote:
> Any status updates on the following patches?
>
> 1. Fragments in tsearch2 headlines:
> http://archives.postgresql.org/pgsql-hackers/2008-08/msg00043.php
>
> 2. Bug in hlCover:
> http://archives.postgresql.org/pgsql-hackers/2008-08/msg00089.php
>
> -Sushant.
>
>

--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Sushant Sinha <sushant354(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, pgsql-hackers(at)postgresql(dot)org, Teodor Sigaev <teodor(at)sigaev(dot)ru>
Subject: Re: text search patch status update?
Date: 2009-01-08 01:50:20
Message-ID: 200901080150.n081oKX20489@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Uh, where are we on this? I see the same output in CVS HEAD as Heikki,
and I assume he thought at least one of them was wrong. ;-)

---------------------------------------------------------------------------

Heikki Linnakangas wrote:
> Sushant Sinha wrote:
> > Patch #2. I think this is a straigt forward bug fix.
>
> Yes, I think you're right. In hlCover(), *q is 0 when the only match is
> the first item in the text, and we shouldn't bail out with "return
> false" in that case.
>
> But there seems to be something else going on here as well:
>
> postgres=# select ts_headline('1 2 3 4 5', '2'::tsquery, 'MinWords=2,
> MaxWords=3');
> ts_headline
> --------------
> <b>2</b> 3 4
> (1 row)
>
> postgres=# select ts_headline('aaa1 aaa2 aaa3 aaa4
> aaa5','aaa2'::tsquery, 'MinWords=2, MaxWords=3');
> ts_headline
> ------------------
> <b>aaa2</b> aaa3
> (1 row)
>
> In the first example, you get three words, and in the 2nd, just two. It
> must be because of the default ShortWord setting of 3. Also, if only the
> last word matches, and it's a "short word", you get the whole text:
>
> postgres=# select ts_headline('1 2 3 4 5','5'::tsquery, 'MinWords=2,
> MaxWords=3');
> ts_headline
> ------------------
> 1 2 3 4 <b>5</b>
> (1 row)
>
> --
> Heikki Linnakangas
> EnterpriseDB http://www.enterprisedb.com
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: Sushant Sinha <sushant354(at)gmail(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, pgsql-hackers(at)postgresql(dot)org, Teodor Sigaev <teodor(at)sigaev(dot)ru>
Subject: Re: text search patch status update?
Date: 2009-01-08 02:42:05
Message-ID: 1231382525.8322.17.camel@dragflick
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

The default headline generation function is complicated. It checks a lot
of cases to determine the best headline to be displayed. So Heikki's
examples just say that headline generation function may not be very
intuitive. However, his examples were not affected by the bug.

Because of the bug, hlcover was not returning a cover when the query
item was the first lexeme in the text. And so the headline generation
function will return just MINWORDS rather than the actual headline as
per the logic.

After the patch you will see the difference in the example:

http://archives.postgresql.org/pgsql-hackers/2008-07/msg00785.php

-Sushant.

On Wed, 2009-01-07 at 20:50 -0500, Bruce Momjian wrote:
> Uh, where are we on this? I see the same output in CVS HEAD as Heikki,
> and I assume he thought at least one of them was wrong. ;-)
>
> ---------------------------------------------------------------------------
>
> Heikki Linnakangas wrote:
> > Sushant Sinha wrote:
> > > Patch #2. I think this is a straigt forward bug fix.
> >
> > Yes, I think you're right. In hlCover(), *q is 0 when the only match is
> > the first item in the text, and we shouldn't bail out with "return
> > false" in that case.
> >
> > But there seems to be something else going on here as well:
> >
> > postgres=# select ts_headline('1 2 3 4 5', '2'::tsquery, 'MinWords=2,
> > MaxWords=3');
> > ts_headline
> > --------------
> > <b>2</b> 3 4
> > (1 row)
> >
> > postgres=# select ts_headline('aaa1 aaa2 aaa3 aaa4
> > aaa5','aaa2'::tsquery, 'MinWords=2, MaxWords=3');
> > ts_headline
> > ------------------
> > <b>aaa2</b> aaa3
> > (1 row)
> >
> > In the first example, you get three words, and in the 2nd, just two. It
> > must be because of the default ShortWord setting of 3. Also, if only the
> > last word matches, and it's a "short word", you get the whole text:
> >
> > postgres=# select ts_headline('1 2 3 4 5','5'::tsquery, 'MinWords=2,
> > MaxWords=3');
> > ts_headline
> > ------------------
> > 1 2 3 4 <b>5</b>
> > (1 row)
> >
> > --
> > Heikki Linnakangas
> > EnterpriseDB http://www.enterprisedb.com
> >
> > --
> > Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> > To make changes to your subscription:
> > http://www.postgresql.org/mailpref/pgsql-hackers
>


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: sushant354(at)gmail(dot)com
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, pgsql-hackers(at)postgresql(dot)org, Teodor Sigaev <teodor(at)sigaev(dot)ru>
Subject: Re: text search patch status update?
Date: 2009-01-08 02:49:51
Message-ID: 200901080249.n082npl28558@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Sushant Sinha wrote:
> The default headline generation function is complicated. It checks a lot
> of cases to determine the best headline to be displayed. So Heikki's
> examples just say that headline generation function may not be very
> intuitive. However, his examples were not affected by the bug.
>
> Because of the bug, hlcover was not returning a cover when the query
> item was the first lexeme in the text. And so the headline generation
> function will return just MINWORDS rather than the actual headline as
> per the logic.
>
> After the patch you will see the difference in the example:
>
> http://archives.postgresql.org/pgsql-hackers/2008-07/msg00785.php

Ah, thank you for the clarification. I now realize Heikki was more
saying the code can be improved rather than reporting a bug. Thanks.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: sushant354(at)gmail(dot)com, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, pgsql-hackers(at)postgresql(dot)org, Teodor Sigaev <teodor(at)sigaev(dot)ru>
Subject: Re: text search patch status update?
Date: 2009-01-08 07:38:00
Message-ID: 4965AD58.5020001@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Bruce Momjian wrote:
> Sushant Sinha wrote:
>> The default headline generation function is complicated. It checks a lot
>> of cases to determine the best headline to be displayed. So Heikki's
>> examples just say that headline generation function may not be very
>> intuitive. However, his examples were not affected by the bug.
>>
>> Because of the bug, hlcover was not returning a cover when the query
>> item was the first lexeme in the text. And so the headline generation
>> function will return just MINWORDS rather than the actual headline as
>> per the logic.
>>
>> After the patch you will see the difference in the example:
>>
>> http://archives.postgresql.org/pgsql-hackers/2008-07/msg00785.php
>
> Ah, thank you for the clarification. I now realize Heikki was more
> saying the code can be improved rather than reporting a bug. Thanks.

No, it does still look wrong to me.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Sushant Sinha <sushant354(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, pgsql-hackers(at)postgresql(dot)org, Teodor Sigaev <teodor(at)sigaev(dot)ru>
Subject: Re: text search patch status update?
Date: 2009-01-08 14:34:42
Message-ID: 200901081434.n08EYgf09060@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


OK, Heikki still believe the behavior below is a bug. Can I get
feedback from anyone else on this? TODO item?

---------------------------------------------------------------------------

> Heikki Linnakangas wrote:
> > Sushant Sinha wrote:
> > > Patch #2. I think this is a straigt forward bug fix.
> >
> > Yes, I think you're right. In hlCover(), *q is 0 when the only match is
> > the first item in the text, and we shouldn't bail out with "return
> > false" in that case.
> >
> > But there seems to be something else going on here as well:
> >
> > postgres=# select ts_headline('1 2 3 4 5', '2'::tsquery, 'MinWords=2,
> > MaxWords=3');
> > ts_headline
> > --------------
> > <b>2</b> 3 4
> > (1 row)
> >
> > postgres=# select ts_headline('aaa1 aaa2 aaa3 aaa4
> > aaa5','aaa2'::tsquery, 'MinWords=2, MaxWords=3');
> > ts_headline
> > ------------------
> > <b>aaa2</b> aaa3
> > (1 row)
> >
> > In the first example, you get three words, and in the 2nd, just two. It
> > must be because of the default ShortWord setting of 3. Also, if only the
> > last word matches, and it's a "short word", you get the whole text:
> >
> > postgres=# select ts_headline('1 2 3 4 5','5'::tsquery, 'MinWords=2,
> > MaxWords=3');
> > ts_headline
> > ------------------
> > 1 2 3 4 <b>5</b>
> > (1 row)
> >
> > --
> > Heikki Linnakangas
> > EnterpriseDB http://www.enterprisedb.com
> >
> > --
> > Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> > To make changes to your subscription:
> > http://www.postgresql.org/mailpref/pgsql-hackers
>
> --
> Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
> EnterpriseDB http://enterprisedb.com
>
> + If your life is a hard drive, Christ can be your backup. +
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +