tsearch_core for inclusion

Lists: pgsql-hackers
From: Teodor Sigaev <teodor(at)sigaev(dot)ru>
To: Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: tsearch_core for inclusion
Date: 2007-03-23 11:58:10
Message-ID: 4603C0D2.90701@sigaev.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

http://www.sigaev.ru/misc/tsearch_core-0.41.gz
http://mira.sai.msu.su/~megera/pgsql/ftsdoc/

Changes
1) added command
ALTER FULLTEXT MAPPING ON cfgname [FOR lexemetypename[, ...]] REPLACE
olddictname TO newdictname;
2) added operator class for text and varchar
CREATE INDEX idxname ON tblname USING GIN ( textcolumn );
3) changed definition of @@ operation: {tsvector|varchar|text} @@ {text|tsquery}
SELECT * FROM tblname WHERE textcolumn @@ text;

We have two questions:

1. pg_catalog schema, if not explicitly specified in search_path, implicitly
placed as the first schema to browse. To what extent it is intentioned ?

2. At present, visibility of FTS objects conforms to the standard PostgreSQL
rule and defined by search_path variable.

For given schema and server's locale, it's possible to have several FTS
configurations, but the only one (with special flag enabled)
could be used as default. Current (active) FTS configuration contains
in GUC variable tsearch_conf_name. If it's not defined, then FTS configuration
is looked in search_path to match server's locale with default flag enabled.

By default, the first visible schema is the pg_catalog, so that system FTS
objects always mask users. To change that, one need explicitly specify
pg_catalog in the search_path.

This can confuse people, especially unexperienced users. Imagine, she creates
public.fts configuration for ru_RU.UTF-8 locale and enabled it as default.

CREATE FULLTEXT CONFIGURATION public.fts LIKE pg_catalog.russian_utf8 AS DEFAULT;

but with default search_path default configuration will be still
pg_catalog.russian_utf8 and she should redefine search_path to use
public.fts. Then, she can creates index for "simple" (without creating
tsvector column) search on TEXT column

CREATE INDEX pgweb_idx ON pgweb USING gin(body);

Notice, there is no way to specify fts configuration, so CREATE INDEX will use
pg_catalog.russian_utf8 configuration and, consequently, specific dictionaries,
stop-words, etc. Next time, she should remember about search_path, else she will
be very confused, because pg_catalog.russian_utf8 will be used in

SELECT title FROM pgweb WHERE body @@ plainto_tsquery('create table');

Of course, there are several ways to avoid such kind of errors, but
we want to minimize this possible source of confusions and ask community
if it's worth to make user-created fts configuration to be visible prior to
system configurations in pg_catalog, if pg_catalog was not *explicitly*
specified in the search_path ?

--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/


From: "Florian G(dot) Pflug" <fgp(at)master(dot)phlo(dot)org>
To: Teodor Sigaev <teodor(at)sigaev(dot)ru>
Cc: Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: tsearch_core for inclusion
Date: 2007-03-23 12:49:42
Message-ID: 4603CCE6.8090800@master.phlo.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Teodor Sigaev wrote:
> For given schema and server's locale, it's possible to have several FTS
> configurations, but the only one (with special flag enabled)
> could be used as default. Current (active) FTS configuration contains
> in GUC variable tsearch_conf_name. If it's not defined, then FTS
> configuration
> is looked in search_path to match server's locale with default flag
> enabled.

Isn't the real problem that only _one_ configuration per locale should
be marked as DEFAULT at any time, no matter what schema it is in?

Having one DEFAULT configuration per schema per locale will necessarily
cause confusion if search_path is not set carefully I think.

greetings, Florian Pflug


From: "Florian G(dot) Pflug" <fgp(at)phlo(dot)org>
To: Teodor Sigaev <teodor(at)sigaev(dot)ru>
Cc: Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: tsearch_core for inclusion
Date: 2007-03-23 12:51:37
Message-ID: 4603CD59.6080001@phlo.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Teodor Sigaev wrote:
> For given schema and server's locale, it's possible to have several FTS
> configurations, but the only one (with special flag enabled)
> could be used as default. Current (active) FTS configuration contains
> in GUC variable tsearch_conf_name. If it's not defined, then FTS
> configuration
> is looked in search_path to match server's locale with default flag
> enabled.

Isn't the real problem that only _one_ configuration per locale should
be marked as DEFAULT at any time, no matter what schema it is in?

Having one DEFAULT configuration per schema per locale will necessarily
cause confusion if search_path is not set carefully I think.

greetings, Florian Pflug


From: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
To: "Florian G(dot) Pflug" <fgp(at)phlo(dot)org>
Cc: Teodor Sigaev <teodor(at)sigaev(dot)ru>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: tsearch_core for inclusion
Date: 2007-03-26 14:32:10
Message-ID: Pine.LNX.4.64.0703261825530.12152@sn.sai.msu.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, 23 Mar 2007, Florian G. Pflug wrote:

> Teodor Sigaev wrote:
>> For given schema and server's locale, it's possible to have several FTS
>> configurations, but the only one (with special flag enabled)
>> could be used as default. Current (active) FTS configuration contains
>> in GUC variable tsearch_conf_name. If it's not defined, then FTS
>> configuration
>> is looked in search_path to match server's locale with default flag
>> enabled.
>
> Isn't the real problem that only _one_ configuration per locale should
> be marked as DEFAULT at any time, no matter what schema it is in?

I'm not sure I understand you correct (a bit complex :), but it's allowed
to have only _one_ DEFAULT configuration per schema/per locale. So,
visibility is defined by search_path for given locale.

>
> Having one DEFAULT configuration per schema per locale will necessarily
> cause confusion if search_path is not set carefully I think.

That's what we're worry about and try to avoid possible confusions.

>
> greetings, Florian Pflug
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster
>

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83


From: "Florian G(dot) Pflug" <fgp(at)phlo(dot)org>
To: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
Cc: Teodor Sigaev <teodor(at)sigaev(dot)ru>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: tsearch_core for inclusion
Date: 2007-03-26 15:57:29
Message-ID: 4607ED69.9030401@phlo.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Oleg Bartunov wrote:
> On Fri, 23 Mar 2007, Florian G. Pflug wrote:
>
>> Teodor Sigaev wrote:
>>> For given schema and server's locale, it's possible to have several
>>> FTS configurations, but the only one (with special flag enabled)
>>> could be used as default. Current (active) FTS configuration contains
>>> in GUC variable tsearch_conf_name. If it's not defined, then FTS
>>> configuration
>>> is looked in search_path to match server's locale with default flag
>>> enabled.
>>
>> Isn't the real problem that only _one_ configuration per locale should
>> be marked as DEFAULT at any time, no matter what schema it is in?
>
> I'm not sure I understand you correct (a bit complex :), but it's allowed
> to have only _one_ DEFAULT configuration per schema/per locale. So,
> visibility is defined by search_path for given locale.

Yes, but why is that needed? Wouldn't one DEFAULT configuration
per database be sufficient, and avoid the search_path problems?

Sorry if I'm being stupid - I just can't see what having a different
DEFAULT configuration per schema buys you.

greetings, Florian Pflug


From: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
To: "Florian G(dot) Pflug" <fgp(at)phlo(dot)org>
Cc: Teodor Sigaev <teodor(at)sigaev(dot)ru>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: tsearch_core for inclusion
Date: 2007-03-26 16:07:22
Message-ID: Pine.LNX.4.64.0703262000530.12152@sn.sai.msu.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, 26 Mar 2007, Florian G. Pflug wrote:

> Oleg Bartunov wrote:
>> On Fri, 23 Mar 2007, Florian G. Pflug wrote:
>>
>>> Teodor Sigaev wrote:
>>>> For given schema and server's locale, it's possible to have several FTS
>>>> configurations, but the only one (with special flag enabled)
>>>> could be used as default. Current (active) FTS configuration contains
>>>> in GUC variable tsearch_conf_name. If it's not defined, then FTS
>>>> configuration
>>>> is looked in search_path to match server's locale with default flag
>>>> enabled.
>>>
>>> Isn't the real problem that only _one_ configuration per locale should
>>> be marked as DEFAULT at any time, no matter what schema it is in?
>>
>> I'm not sure I understand you correct (a bit complex :), but it's allowed
>> to have only _one_ DEFAULT configuration per schema/per locale. So,
>> visibility is defined by search_path for given locale.
>
> Yes, but why is that needed? Wouldn't one DEFAULT configuration
> per database be sufficient, and avoid the search_path problems?
>
> Sorry if I'm being stupid - I just can't see what having a different
> DEFAULT configuration per schema buys you.

It's what people asked for. Think about several sub-projects which share one
database, for example. They all may need different configurations.
It's not difficult to specify schema-qualified name of fts configuration,
but the problem arises when using "simple search", since there is no
way to specify fts name in CREATE INDEX command.

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
Cc: "Florian G(dot) Pflug" <fgp(at)phlo(dot)org>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: tsearch_core for inclusion
Date: 2007-03-26 17:55:37
Message-ID: 3816.1174931737@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Oleg Bartunov <oleg(at)sai(dot)msu(dot)su> writes:
> On Fri, 23 Mar 2007, Florian G. Pflug wrote:
>> Isn't the real problem that only _one_ configuration per locale should
>> be marked as DEFAULT at any time, no matter what schema it is in?

> I'm not sure I understand you correct (a bit complex :), but it's allowed
> to have only _one_ DEFAULT configuration per schema/per locale. So,
> visibility is defined by search_path for given locale.

Not sure that that's a good idea at all. We used to have
search-path-dependent rules for deciding which opclass was default,
and found that that was not good. Also, I do not understand how
the queries and the indexes are tied together --- but doesn't an
index need to be built using the same rules that are later expected
by the queries? If that varies on search_path it'll be too fragile.

regards, tom lane


From: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Florian G(dot) Pflug" <fgp(at)phlo(dot)org>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: tsearch_core for inclusion
Date: 2007-03-26 19:09:45
Message-ID: Pine.LNX.4.64.0703262259090.12152@sn.sai.msu.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, 26 Mar 2007, Tom Lane wrote:

> Oleg Bartunov <oleg(at)sai(dot)msu(dot)su> writes:
>> On Fri, 23 Mar 2007, Florian G. Pflug wrote:
>>> Isn't the real problem that only _one_ configuration per locale should
>>> be marked as DEFAULT at any time, no matter what schema it is in?
>
>> I'm not sure I understand you correct (a bit complex :), but it's allowed
>> to have only _one_ DEFAULT configuration per schema/per locale. So,
>> visibility is defined by search_path for given locale.
>
> Not sure that that's a good idea at all. We used to have
> search-path-dependent rules for deciding which opclass was default,
> and found that that was not good. Also, I do not understand how
> the queries and the indexes are tied together --- but doesn't an
> index need to be built using the same rules that are later expected
> by the queries? If that varies on search_path it'll be too fragile.

fts is a very rich application and the rules for creating index and
queries could be different. One index could be used for searching
with/without taking into account stop-words, for example.
But, in general, index and queries should be processed by the same
parsers and dictionaries. It can be less fragile, if we store somehow
fts information (fts configuration name) to display it, say, in \di command.

I repeat, I see potential problem (confusion) only for "simple" fts index,
which creates on TEXT/VARCHAR data, using CREATE INDEX command, since it's
impossible explicitly specify which fts configuration to use.

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83