Re: Fix for stop words in thesaurus file

Lists: pgsql-committerspgsql-hackerspgsql-patches
From: tgl(at)postgresql(dot)org (Tom Lane)
To: pgsql-committers(at)postgresql(dot)org
Subject: pgsql: Temporarily modify tsearch regression tests to suppress notice
Date: 2007-09-23 15:58:58
Message-ID: 20070923155858.DC134753E4C@cvs.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers pgsql-patches

Log Message:
-----------
Temporarily modify tsearch regression tests to suppress notice that comes
out at erratic times, because it is creating a totally unacceptable level
of noise in our buildfarm results. This patch can be reverted when and if
the code is fixed to not issue notices during cache reload events.

Modified Files:
--------------
pgsql/src/backend/tsearch:
thesaurus_sample.ths (r1.1 -> r1.2)
(http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/tsearch/thesaurus_sample.ths?r1=1.1&r2=1.2)
pgsql/src/test/regress/expected:
tsdicts.out (r1.1 -> r1.2)
(http://developer.postgresql.org/cvsweb.cgi/pgsql/src/test/regress/expected/tsdicts.out?r1=1.1&r2=1.2)


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Tom Lane <tgl(at)postgresql(dot)org>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [COMMITTERS] pgsql: Temporarily modify tsearch regression tests to suppress notice
Date: 2007-09-27 05:41:19
Message-ID: 200709270541.l8R5fJI27959@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers pgsql-patches


I just talked to Teodor and we discussed this problem. My idea is to
have a special marker in the synonym table, perhaps "*" to indicate the
presence of _any_ stop word at that location. This will not produce any
warnings because it is clearly intentional. The original warning for a
literal stop word will remain but will happen only when the user users
an actual unintentional use of a stop word. The synonym examples will
need to be updated to use "*" as stop word markers.

---------------------------------------------------------------------------

Tom Lane wrote:
> Log Message:
> -----------
> Temporarily modify tsearch regression tests to suppress notice that comes
> out at erratic times, because it is creating a totally unacceptable level
> of noise in our buildfarm results. This patch can be reverted when and if
> the code is fixed to not issue notices during cache reload events.
>
> Modified Files:
> --------------
> pgsql/src/backend/tsearch:
> thesaurus_sample.ths (r1.1 -> r1.2)
> (http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/tsearch/thesaurus_sample.ths?r1=1.1&r2=1.2)
> pgsql/src/test/regress/expected:
> tsdicts.out (r1.1 -> r1.2)
> (http://developer.postgresql.org/cvsweb.cgi/pgsql/src/test/regress/expected/tsdicts.out?r1=1.1&r2=1.2)
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
> http://archives.postgresql.org

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://postgres.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [COMMITTERS] pgsql: Temporarily modify tsearch regression tests to suppress notice
Date: 2007-09-27 05:49:52
Message-ID: 17301.1190872192@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers pgsql-patches

Bruce Momjian <bruce(at)momjian(dot)us> writes:
> I just talked to Teodor and we discussed this problem. My idea is to
> have a special marker in the synonym table, perhaps "*" to indicate the
> presence of _any_ stop word at that location. This will not produce any
> warnings because it is clearly intentional.

That's not fixing the problem, unless your proposal includes never
issuing any warnings at all, for anything.

regards, tom lane


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [COMMITTERS] pgsql: Temporarily modify tsearch regression tests to suppress notice
Date: 2007-09-27 05:55:49
Message-ID: 200709270555.l8R5tnH03017@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers pgsql-patches

Tom Lane wrote:
> Bruce Momjian <bruce(at)momjian(dot)us> writes:
> > I just talked to Teodor and we discussed this problem. My idea is to
> > have a special marker in the synonym table, perhaps "*" to indicate the
> > presence of _any_ stop word at that location. This will not produce any
> > warnings because it is clearly intentional.
>
> That's not fixing the problem, unless your proposal includes never
> issuing any warnings at all, for anything.

No warning for "*" because it is intentional, but warning for actual
stop words.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://postgres.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Teodor Sigaev <teodor(at)sigaev(dot)ru>
Subject: Re: [COMMITTERS] pgsql: Temporarily modify tsearch regression tests to suppress notice
Date: 2007-09-27 14:08:32
Message-ID: 27596.1190902112@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers pgsql-patches

Bruce Momjian <bruce(at)momjian(dot)us> writes:
> Tom Lane wrote:
>> That's not fixing the problem, unless your proposal includes never
>> issuing any warnings at all, for anything.

> No warning for "*" because it is intentional, but warning for actual
> stop words.

No, you're focusing on one symptom not the problem. The problem is
that we've got user-visible behavior going on during what's effectively
a chance event, ie, a cache reload.

One possible real solution would be to tweak the dictionary APIs so
that the dictionaries can find out whether this is the first load during
a session, or a reload, and emit notices only in the first case.

regards, tom lane


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Teodor Sigaev <teodor(at)sigaev(dot)ru>
Subject: Re: [COMMITTERS] pgsql: Temporarily modify tsearch regression tests to suppress notice
Date: 2007-09-29 11:14:25
Message-ID: 200709291114.l8TBEPk27086@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers pgsql-patches

Tom Lane wrote:
> Bruce Momjian <bruce(at)momjian(dot)us> writes:
> > Tom Lane wrote:
> >> That's not fixing the problem, unless your proposal includes never
> >> issuing any warnings at all, for anything.
>
> > No warning for "*" because it is intentional, but warning for actual
> > stop words.
>
> No, you're focusing on one symptom not the problem. The problem is
> that we've got user-visible behavior going on during what's effectively
> a chance event, ie, a cache reload.
>
> One possible real solution would be to tweak the dictionary APIs so
> that the dictionaries can find out whether this is the first load during
> a session, or a reload, and emit notices only in the first case.

Yea, that would work too. Or just throw an error for a stop word in the
file and then you never get a reload (use "*" instead).

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://postgres.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Teodor Sigaev <teodor(at)sigaev(dot)ru>
Subject: Re: [COMMITTERS] pgsql: Temporarily modify tsearch regression tests to suppress notice
Date: 2007-09-29 14:50:34
Message-ID: 3377.1191077434@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers pgsql-patches

Bruce Momjian <bruce(at)momjian(dot)us> writes:
> Tom Lane wrote:
>> One possible real solution would be to tweak the dictionary APIs so
>> that the dictionaries can find out whether this is the first load during
>> a session, or a reload, and emit notices only in the first case.

> Yea, that would work too. Or just throw an error for a stop word in the
> file and then you never get a reload (use "*" instead).

Hm, that's a thought --- it'd be a way to solve the problem without an
API change for dictionaries, which is something to avoid at this late
stage of the 8.3 cycle. Come to think of it, does the ts_cache stuff
work properly when an error is thrown in dictionary load (ie, is the
cache entry left in a sane state)?

regards, tom lane


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, PostgreSQL-patches <pgsql-patches(at)postgresql(dot)org>
Subject: Fix for stop words in thesaurus file
Date: 2007-11-09 02:32:09
Message-ID: 200711090232.lA92W9614295@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers pgsql-patches

Tom Lane wrote:
> Bruce Momjian <bruce(at)momjian(dot)us> writes:
> > Tom Lane wrote:
> >> One possible real solution would be to tweak the dictionary APIs so
> >> that the dictionaries can find out whether this is the first load during
> >> a session, or a reload, and emit notices only in the first case.
>
> > Yea, that would work too. Or just throw an error for a stop word in the
> > file and then you never get a reload (use "*" instead).
>
> Hm, that's a thought --- it'd be a way to solve the problem without an
> API change for dictionaries, which is something to avoid at this late
> stage of the 8.3 cycle. Come to think of it, does the ts_cache stuff
> work properly when an error is thrown in dictionary load (ie, is the
> cache entry left in a sane state)?

I have developed the attached patch which uses "?" to mark stop words in
the thesaurus file. ("*" was already in use in the file.) I updated
the docs to use "?", which makes the documentation clearer too.

The patch also reenables testing of stop words in the thesuarus file.

FYI, there is no longer a NOTICE for stop words in the thesaurus file;
it throws an error now, and says to use "?" instead.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://postgres.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

Attachment Content-Type Size
/pgpatches/ts_thesaurus text/x-diff 6.5 KB

From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, PostgreSQL-patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: Fix for stop words in thesaurus file
Date: 2007-11-09 09:16:54
Message-ID: 1194599814.4251.362.camel@ebony.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers pgsql-patches

On Thu, 2007-11-08 at 21:32 -0500, Bruce Momjian wrote:
> Tom Lane wrote:
> > Bruce Momjian <bruce(at)momjian(dot)us> writes:
> > > Tom Lane wrote:
> > >> One possible real solution would be to tweak the dictionary APIs so
> > >> that the dictionaries can find out whether this is the first load during
> > >> a session, or a reload, and emit notices only in the first case.
> >
> > > Yea, that would work too. Or just throw an error for a stop word in the
> > > file and then you never get a reload (use "*" instead).
> >
> > Hm, that's a thought --- it'd be a way to solve the problem without an
> > API change for dictionaries, which is something to avoid at this late
> > stage of the 8.3 cycle. Come to think of it, does the ts_cache stuff
> > work properly when an error is thrown in dictionary load (ie, is the
> > cache entry left in a sane state)?
>
> I have developed the attached patch which uses "?" to mark stop words in
> the thesaurus file. ("*" was already in use in the file.) I updated
> the docs to use "?", which makes the documentation clearer too.
>
> The patch also reenables testing of stop words in the thesuarus file.
>
> FYI, there is no longer a NOTICE for stop words in the thesaurus file;
> it throws an error now, and says to use "?" instead.

So this fix requires people to have a different dictionary file from 8.2
to 8.3, and to manually edit the file? That makes upgrade even harder
and more error prone.

Seems easier to do it the way Tom suggested and only emit notices in the
first case.

I notice there's still a placeholder in the docs for how to upgrade.
Perhaps if we wrote that now it would make it clearer where the
difficulties lie?

--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, PostgreSQL-patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: Fix for stop words in thesaurus file
Date: 2007-11-09 16:39:03
Message-ID: 200711091639.lA9Gd3I28633@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers pgsql-patches

Simon Riggs wrote:
> > I have developed the attached patch which uses "?" to mark stop words in
> > the thesaurus file. ("*" was already in use in the file.) I updated
> > the docs to use "?", which makes the documentation clearer too.
> >
> > The patch also reenables testing of stop words in the thesuarus file.
> >
> > FYI, there is no longer a NOTICE for stop words in the thesaurus file;
> > it throws an error now, and says to use "?" instead.
>
> So this fix requires people to have a different dictionary file from 8.2
> to 8.3, and to manually edit the file? That makes upgrade even harder
> and more error prone.
>
> Seems easier to do it the way Tom suggested and only emit notices in the
> first case.

There is no easy way to do that so I have not heard that proposed as an
actual solution.

> I notice there's still a placeholder in the docs for how to upgrade.
> Perhaps if we wrote that now it would make it clearer where the
> difficulties lie?

Yes, this would be one of the many changes in the API. Of course, if
you did use a stop word you would now get an error and it would tell you
explicitly to use "?" for stop words, so I don't see a problem here.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://postgres.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: PostgreSQL-patches <pgsql-patches(at)postgresql(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Teodor Sigaev <teodor(at)sigaev(dot)ru>
Subject: Re: Fix for stop words in thesaurus file
Date: 2007-11-10 15:40:34
Message-ID: 200711101540.lAAFeYS20251@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers pgsql-patches

Bruce Momjian wrote:
> Tom Lane wrote:
> > Bruce Momjian <bruce(at)momjian(dot)us> writes:
> > > Tom Lane wrote:
> > >> One possible real solution would be to tweak the dictionary APIs so
> > >> that the dictionaries can find out whether this is the first load during
> > >> a session, or a reload, and emit notices only in the first case.
> >
> > > Yea, that would work too. Or just throw an error for a stop word in the
> > > file and then you never get a reload (use "*" instead).
> >
> > Hm, that's a thought --- it'd be a way to solve the problem without an
> > API change for dictionaries, which is something to avoid at this late
> > stage of the 8.3 cycle. Come to think of it, does the ts_cache stuff
> > work properly when an error is thrown in dictionary load (ie, is the
> > cache entry left in a sane state)?
>
> I have developed the attached patch which uses "?" to mark stop words in
> the thesaurus file. ("*" was already in use in the file.) I updated
> the docs to use "?", which makes the documentation clearer too.
>
> The patch also reenables testing of stop words in the thesuarus file.
>
> FYI, there is no longer a NOTICE for stop words in the thesaurus file;
> it throws an error now, and says to use "?" instead.

Patch applied. I added documentation on this API change too.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://postgres.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +