Re: tsearch_core for inclusion

Lists: pgsql-hackers
From: Teodor Sigaev <teodor(at)sigaev(dot)ru>
To: Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: tsearch_core for inclusion
Date: 2007-03-15 16:17:54
Message-ID: 45F971B2.5060208@sigaev.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Last try there was a fight about syntax of introduced commands. And we (Oleg and
me) developed variant of patch with another syntax. We will not change docs
until agreement will be reached, current version
http://mira.sai.msu.su/~megera/pgsql/ftsdoc/

Following demonstrates subset of FTS syntax using example from
http://mira.sai.msu.su/~megera/pgsql/ftsdoc/fts-complete-tut.html.

Syntax which was proposed:
begin;
CREATE FULLTEXT CONFIGURATION public.pg ( LOCALE = 'ru_RU.UTF-8' ) LIKE english
WITH MAP;
CREATE FULLTEXT DICTIONARY pg_dict ( OPT = 'pg_dict.txt' ) LIKE synonym;
CREATE FULLTEXT DICTIONARY en_ispell
( OPT = 'DictFile="/usr/local/share/dicts/ispell/english-utf8.dict",
AffFile="/usr/local/share/dicts/ispell/english-utf8.aff",
StopFile="/usr/local/share/dicts/ispell/english-utf8.stop"'
) LIKE ispell_template;
ALTER FULLTEXT DICTIONARY en_stem SET
OPT='/usr/local/share/dicts/ispell/english-utf8.stop';
ALTER FULLTEXT MAPPING ON pg FOR lword,lhword,lpart_hword
WITH pg_dict,en_ispell,en_stem;
DROP FULLTEXT MAPPING ON pg FOR email, url, sfloat, uri, float;
end;
Patch is http://www.sigaev.ru/misc/tsearch_core-0.38.gz

Making above FTS syntax as Peter suggested, we removed unnecessary parenthesis,
equal sign, OPT becomes OPTION.
begin;
CREATE FULLTEXT CONFIGURATION public.pg LOCALE 'ru_RU.UTF-8' LIKE english WITH
MAP;
CREATE FULLTEXT DICTIONARY pg_dict OPTION 'pg_dict.txt' LIKE synonym;
CREATE FULLTEXT DICTIONARY en_ispell
OPTION 'DictFile="/usr/local/share/dicts/ispell/english-utf8.dict",
AffFile="/usr/local/share/dicts/ispell/english-utf8.aff",
StopFile="/usr/local/share/dicts/ispell/english-utf8.stop"'
LIKE ispell_template;
ALTER FULLTEXT DICTIONARY en_stem SET OPTION '/usr/local/share/dicts/ispell/engl
ish-utf8.stop';
ALTER FULLTEXT MAPPING ON pg FOR lword,lhword,lpart_hword
WITH pg_dict,en_ispell,en_stem;
DROP FULLTEXT MAPPING ON pg FOR email, url, sfloat, uri, float;
end;
Patch is http://www.sigaev.ru/misc/tsearch_core-0.38.1.gz

Comparing that syntaxes with current tsearch2 is placed at
http://mira.sai.msu.su/~megera/pgsql/ftsdoc/fts-syntax-compare.html

So, which is syntax more attractive? And is there some another objections?

--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/


From: Robert Treat <xzilla(at)users(dot)sourceforge(dot)net>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: tsearch_core for inclusion
Date: 2007-03-16 03:03:13
Message-ID: 200703152303.13860.xzilla@users.sourceforge.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thursday 15 March 2007 12:17, Teodor Sigaev wrote:
> Last try there was a fight about syntax of introduced commands. And we
> (Oleg and me) developed variant of patch with another syntax. We will not
> change docs until agreement will be reached, current version
> http://mira.sai.msu.su/~megera/pgsql/ftsdoc/
>
> Following demonstrates subset of FTS syntax using example from
> http://mira.sai.msu.su/~megera/pgsql/ftsdoc/fts-complete-tut.html.
>

This is nice.

<snip>
> Comparing that syntaxes with current tsearch2 is placed at
> http://mira.sai.msu.su/~megera/pgsql/ftsdoc/fts-syntax-compare.html
>
> So, which is syntax more attractive?

Honestly I don't find any of these syntax's to be head and shoulders above the
others, but would probably lean toward the original syntax, since it has some
level of familiarity among the existing user base.

> And is there some another objections?

Most people whom I talk to about tsearch who want the syntax changed to make
it easier want something akin to just "CREATE INDEX fti1 on t1(c1) USING
FULLTEXT" and then be done with it. This patch isn't going to give people
that.

I'm also concerned about the stability of the tsearch api in general wrt
including it in core. Currently the recommended upgrade practice is to
dump/reload without tsearch, installing the new servers version of tsearch
instead. IMHO this is not an acceptable solution for core-included features.
So is this actually going to be improved in a core tsearch? system tables
are not dumped by default, so that seems easier, until you consider that your
custom tsearch install will then be lost on upgrade... oops!

--
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL


From: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To: Robert Treat <xzilla(at)users(dot)sourceforge(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: tsearch_core for inclusion
Date: 2007-03-16 03:16:52
Message-ID: 45FA0C24.3080108@commandprompt.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


>> And is there some another objections?
>
> Most people whom I talk to about tsearch who want the syntax changed to make
> it easier want something akin to just "CREATE INDEX fti1 on t1(c1) USING
> FULLTEXT" and then be done with it. This patch isn't going to give people
> that.

+1 (again) Although I would see something like this:

CREATE INDEX fti1 on t1(c1) USING FULLTEXT [WITH] GIST | GIN

Joshua D. Drake

--

=== The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive PostgreSQL solutions since 1997
http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
Cc: Robert Treat <xzilla(at)users(dot)sourceforge(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: tsearch_core for inclusion
Date: 2007-03-16 03:29:24
Message-ID: 436.1174015764@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Joshua D. Drake" <jd(at)commandprompt(dot)com> writes:
> +1 (again) Although I would see something like this:
> CREATE INDEX fti1 on t1(c1) USING FULLTEXT [WITH] GIST | GIN

Surely the CREATE INDEX syntax has got enough warts on it already.
Can't we express this as a particular operator class, or something
that doesn't add any new syntax?

regards, tom lane


From: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Treat <xzilla(at)users(dot)sourceforge(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: tsearch_core for inclusion
Date: 2007-03-16 03:53:47
Message-ID: 45FA14CB.6000601@commandprompt.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> "Joshua D. Drake" <jd(at)commandprompt(dot)com> writes:
>> +1 (again) Although I would see something like this:
>> CREATE INDEX fti1 on t1(c1) USING FULLTEXT [WITH] GIST | GIN
>
> Surely the CREATE INDEX syntax has got enough warts on it already.
> Can't we express this as a particular operator class, or something
> that doesn't add any new syntax?

Do you mean something like:

CREATE INDEX fti1 ON t1 USING GIST|GIN(c1) WITH (FULLTEXT)

The above I got from the idea of FILLFACTOR so you could have something
like WITH (FILLFACTOR = 70, FULLTEXT) (not even sure if that is relevant)

OR

CREATE INDEX ftil ON t1 USING GIST|GIN(C1 FULLTEXT);

Where FULLTEXT is like VARCHAR OPS?

I could live with that.

Sincerely,

Joshua D. Drake

>
> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend
>

--

=== The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive PostgreSQL solutions since 1997
http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
Cc: Robert Treat <xzilla(at)users(dot)sourceforge(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: tsearch_core for inclusion
Date: 2007-03-16 04:11:45
Message-ID: 964.1174018305@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Joshua D. Drake" <jd(at)commandprompt(dot)com> writes:
> Tom Lane wrote:
>> Surely the CREATE INDEX syntax has got enough warts on it already.

> Do you mean something like:
> CREATE INDEX ftil ON t1 USING GIST|GIN(C1 FULLTEXT);
> Where FULLTEXT is like VARCHAR OPS?

Yeah, that one. It might be more consistent to spell it as "fulltext_ops"
but I wouldn't insist on it.

Of course the issue not addressed here is where you specify all the
secondary configuration data (the stuff currently handled by config
tables in the contrib implementation). Perhaps the WITH clause would
work for that, though in the current code WITH is targeted at the index
AM not individual opclasses.

regards, tom lane


From: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Treat <xzilla(at)users(dot)sourceforge(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: tsearch_core for inclusion
Date: 2007-03-16 04:32:44
Message-ID: 45FA1DEC.70109@commandprompt.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> "Joshua D. Drake" <jd(at)commandprompt(dot)com> writes:
>> Tom Lane wrote:
>>> Surely the CREATE INDEX syntax has got enough warts on it already.
>
>> Do you mean something like:
>> CREATE INDEX ftil ON t1 USING GIST|GIN(C1 FULLTEXT);
>> Where FULLTEXT is like VARCHAR OPS?
>
> Yeah, that one. It might be more consistent to spell it as "fulltext_ops"
> but I wouldn't insist on it.

*shrug* fulltext_ops is probably more accurate but FULLTEXT is more
friendly :). I find you normally can't have both, my vote would probably
be consistency.

>
> Of course the issue not addressed here is where you specify all the
> secondary configuration data (the stuff currently handled by config
> tables in the contrib implementation). Perhaps the WITH clause would
> work for that, though in the current code WITH is targeted at the index
> AM not individual opclasses.

Not sure what to say here. WITH seems logical and I don't think we want
to add yet another keyword but I certainly see your point.

Sincerely,

Joshua D. Drake

>
> regards, tom lane
>

--

=== The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive PostgreSQL solutions since 1997
http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/


From: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
To: Robert Treat <xzilla(at)users(dot)sourceforge(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: tsearch_core for inclusion
Date: 2007-03-16 05:31:41
Message-ID: Pine.LNX.4.64.0703160813320.400@sn.sai.msu.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, 15 Mar 2007, Robert Treat wrote:

> On Thursday 15 March 2007 12:17, Teodor Sigaev wrote:
>> Last try there was a fight about syntax of introduced commands. And we
>> (Oleg and me) developed variant of patch with another syntax. We will not
>> change docs until agreement will be reached, current version
>> http://mira.sai.msu.su/~megera/pgsql/ftsdoc/
>>
>> Following demonstrates subset of FTS syntax using example from
>> http://mira.sai.msu.su/~megera/pgsql/ftsdoc/fts-complete-tut.html.
>>
>
> This is nice.
>
> <snip>
>> Comparing that syntaxes with current tsearch2 is placed at
>> http://mira.sai.msu.su/~megera/pgsql/ftsdoc/fts-syntax-compare.html
>>
>> So, which is syntax more attractive?
>
> Honestly I don't find any of these syntax's to be head and shoulders above the
> others, but would probably lean toward the original syntax, since it has some
> level of familiarity among the existing user base.
>
>> And is there some another objections?
>
> Most people whom I talk to about tsearch who want the syntax changed to make
> it easier want something akin to just "CREATE INDEX fti1 on t1(c1) USING
> FULLTEXT" and then be done with it. This patch isn't going to give people
> that.

Since we use standard postgresql-ish CREATE INDEX command, I assume
people want to skip creation of tsvector column ? How they could manage
complex document indexing, when document is a combination (with different weights)
of many text attributes from several tables, for example ? There are several
other issues with that approach, for example, we need to store positional
information somewhere for ranking information. It's awkward to parse document
every time to get this information.
>
> I'm also concerned about the stability of the tsearch api in general wrt
> including it in core. Currently the recommended upgrade practice is to
> dump/reload without tsearch, installing the new servers version of tsearch
> instead. IMHO this is not an acceptable solution for core-included features.
> So is this actually going to be improved in a core tsearch? system tables
> are not dumped by default, so that seems easier, until you consider that your
> custom tsearch install will then be lost on upgrade... oops!

This is exact reason why we want to include tsearch into core, it was discussed
several times.

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83


From: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
To: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
Cc: Robert Treat <xzilla(at)users(dot)sourceforge(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: tsearch_core for inclusion
Date: 2007-03-16 05:33:17
Message-ID: Pine.LNX.4.64.0703160831530.400@sn.sai.msu.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, 15 Mar 2007, Joshua D. Drake wrote:

>
>>> And is there some another objections?
>>
>> Most people whom I talk to about tsearch who want the syntax changed to make
>> it easier want something akin to just "CREATE INDEX fti1 on t1(c1) USING
>> FULLTEXT" and then be done with it. This patch isn't going to give people
>> that.
>
> +1 (again) Although I would see something like this:
>
> CREATE INDEX fti1 on t1(c1) USING FULLTEXT [WITH] GIST | GIN

Yet another syntax addition :)
What's wrong with the standard CREATE INDEX ?

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83


From: "Magnus Hagander" <magnus(at)hagander(dot)net>
To: oleg(at)sai(dot)msu(dot)su
Cc: "Robert Treat" <xzilla(at)users(dot)sourceforge(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: tsearch_core for inclusion
Date: 2007-03-16 06:19:19
Message-ID: 20070316061914.01EEFDCC3EF@svr2.hagander.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> > Most people whom I talk to about tsearch who want the syntax changed to make
> > it easier want something akin to just "CREATE INDEX fti1 on t1(c1) USING
> > FULLTEXT" and then be done with it. This patch isn't going to give people
> > that.
>
> Since we use standard postgresql-ish CREATE INDEX command, I assume
> people want to skip creation of tsvector column ?

That would be great.

> How they could manage
> complex document indexing, when document is a combination (with different weights)
> of many text attributes from several tables, for example ?

Just to give you some more work, could I have both, please.

Seriously, if the current powerful functionality could be combined with a dead simple solution for new users and those who don't need it, that would be very
good.

This could be an auto generated hidden column or something, as long as the user doesn't need to see or care about it in the simple case.

/Magnus


From: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Robert Treat <xzilla(at)users(dot)sourceforge(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: tsearch_core for inclusion
Date: 2007-03-16 06:56:28
Message-ID: Pine.LNX.4.64.0703160939410.400@sn.sai.msu.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, 16 Mar 2007, Magnus Hagander wrote:

>>> Most people whom I talk to about tsearch who want the syntax changed to make
>>> it easier want something akin to just "CREATE INDEX fti1 on t1(c1) USING
>>> FULLTEXT" and then be done with it. This patch isn't going to give people
>>> that.
>>
>> Since we use standard postgresql-ish CREATE INDEX command, I assume
>> people want to skip creation of tsvector column ?
>
> That would be great.
>
>> How they could manage
>> complex document indexing, when document is a combination (with different weights)
>> of many text attributes from several tables, for example ?
>
> Just to give you some more work, could I have both, please.
>
> Seriously, if the current powerful functionality could be combined with a dead simple solution for new users and those who don't need it, that would be very
> good.
>
> This could be an auto generated hidden column or something, as long as the user doesn't need to see or care about it in the simple case.

hmm, then we should think about very generic fts configuration behind such
simple solution. We need to know what requirements and features should
be supported. for example, "CREATE INDEX fti1 on t1(c1) USING FULLTEXT"
syntax assume default configuration only.

Joshua, Tome proposed something like
CREATE INDEX fti1 on t1(c1) USING FULLTEXT [WITH] GIST | GIN,
so is't worth to extend it to specify fts configuration like
CREATE INDEX fti1 on t1(c1) USING FULLTEXT [public.pg] [WITH] GIST | GIN
Normally, fts configuration is used when creating tsvector, so
CREATE INDEX doesn't need to know it. Hmm, looks rather messy.

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83


From: Teodor Sigaev <teodor(at)sigaev(dot)ru>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Robert Treat <xzilla(at)users(dot)sourceforge(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: tsearch_core for inclusion
Date: 2007-03-16 08:40:32
Message-ID: 45FA5800.7090202@sigaev.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> Yeah, that one. It might be more consistent to spell it as "fulltext_ops"
> but I wouldn't insist on it.

Hmm, you are prompting an idea to me how to simplify usage of full text index in
simple cases.

CREATE INDEX idxname ON tblname USING gin (textcolumn fulltext_ops);

Fulltext_ops opclass parses the document similarly to_tsvector nad stores
lexemes in gin index. It's a full equalent of
CREATE INDEX ... ( to_tsvector( textcolumn ) )

And, let we define operation text @ text, which is equivalent of text @@
plainto_tsquery(text), so, queries will look like
SELECT * FROM tblname WHERE textcolumn @ textquery;

Fulltext_ops can speedup both operation, text @@ tsquery and text @ text.
Because gin API has extractQuery method which calls once per index scan and it
can parse query to lexemes.

Some disadvantage: with that way it isn't possible make fast ranking - there is
no stored parsed text. And, fulltext_ops may be done for GiST index too, but
fulltext opclass will be lossy which means slow search due to reparse texts for
each index match.

BTW, simple syntax sugar for CREATE INDEX (fulltext_ops) may be done:
CREATE INDEX idxname ON tblname USING FULLTEXT (textcolumn)

--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/


From: Teodor Sigaev <teodor(at)sigaev(dot)ru>
To: Robert Treat <xzilla(at)users(dot)sourceforge(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: tsearch_core for inclusion
Date: 2007-03-16 08:44:58
Message-ID: 45FA590A.3050709@sigaev.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> I'm also concerned about the stability of the tsearch api in general wrt
> including it in core. Currently the recommended upgrade practice is to
> dump/reload without tsearch, installing the new servers version of tsearch
That is because pg_ts* tables changes, function names and internal API. Putting
tsearch in core discards a lot of such problem. For example, who notices changes
in pg_am table from release to release? Really it was a developers/hackers, not
a usual users

--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/


From: Robert Treat <xzilla(at)users(dot)sourceforge(dot)net>
To: Teodor Sigaev <teodor(at)sigaev(dot)ru>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: tsearch_core for inclusion
Date: 2007-03-16 13:37:13
Message-ID: 200703160937.13955.xzilla@users.sourceforge.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Friday 16 March 2007 04:44, Teodor Sigaev wrote:
> > I'm also concerned about the stability of the tsearch api in general wrt
> > including it in core. Currently the recommended upgrade practice is to
> > dump/reload without tsearch, installing the new servers version of
> > tsearch
>
> That is because pg_ts* tables changes, function names and internal API.
> Putting tsearch in core discards a lot of such problem. For example, who
> notices changes in pg_am table from release to release? Really it was a
> developers/hackers, not a usual users

I've ran into problems on very vanilla setups that I am sure other users are
going to run across... see the following for details:

http://people.planetpostgresql.org/xzilla/index.php?/archives/291-The-pain-that-is-tsearch2-8.1-8.2-upgrading.html

I don't see how the proposal is going to solve that type of problem, but maybe
I am overlooking something?

--
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Teodor Sigaev <teodor(at)sigaev(dot)ru>
Cc: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Robert Treat <xzilla(at)users(dot)sourceforge(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: tsearch_core for inclusion
Date: 2007-03-16 14:42:39
Message-ID: 6427.1174056159@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Teodor Sigaev <teodor(at)sigaev(dot)ru> writes:
> Hmm, you are prompting an idea to me how to simplify usage of full text index in
> simple cases.

> CREATE INDEX idxname ON tblname USING gin (textcolumn fulltext_ops);

+1 ... makes the easy cases easy, doesn't make the hard cases any
harder.

> BTW, simple syntax sugar for CREATE INDEX (fulltext_ops) may be done:
> CREATE INDEX idxname ON tblname USING FULLTEXT (textcolumn)

Not in favor of this, because FULLTEXT isn't an AM ... how would you
know whether to use GIST or GIN? Actually, if you wanted to simplify
life a bit, you could mark fulltext_ops as being the default opclass
for text (and varchar I guess) under GIST and GIN. Then it reduces
to just

CREATE INDEX idxname ON tblname USING gin (textcolumn);

regards, tom lane


From: Teodor Sigaev <teodor(at)sigaev(dot)ru>
To: Robert Treat <xzilla(at)users(dot)sourceforge(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: tsearch_core for inclusion
Date: 2007-03-16 14:45:39
Message-ID: 45FAAD93.7050909@sigaev.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> I don't see how the proposal is going to solve that type of problem, but maybe
> I am overlooking something?

The same way as other system tables objects, they don't dump, they don't
restore. In 8.3, seems, API to index AM will be changed - will anybody except
pghackers see that? New opclass layout, new opfamily table - users don't that
changes at all.

--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/


From: "Florian G(dot) Pflug" <fgp(at)phlo(dot)org>
To: Teodor Sigaev <teodor(at)sigaev(dot)ru>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Robert Treat <xzilla(at)users(dot)sourceforge(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: tsearch_core for inclusion
Date: 2007-03-16 15:01:09
Message-ID: 45FAB135.5000805@phlo.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Teodor Sigaev wrote:
> CREATE INDEX idxname ON tblname USING gin (textcolumn fulltext_ops);
>
> Fulltext_ops opclass parses the document similarly to_tsvector nad
> stores lexemes in gin index. It's a full equalent of
> CREATE INDEX ... ( to_tsvector( textcolumn ) )
>
> And, let we define operation text @ text, which is equivalent of text @@
> plainto_tsquery(text), so, queries will look like
> SELECT * FROM tblname WHERE textcolumn @ textquery;
>
> Fulltext_ops can speedup both operation, text @@ tsquery and text @
> text. Because gin API has extractQuery method which calls once per index
> scan and it can parse query to lexemes.
>
> Some disadvantage: with that way it isn't possible make fast ranking -
> there is no stored parsed text. And, fulltext_ops may be done for GiST
> index too, but fulltext opclass will be lossy which means slow search
> due to reparse texts for each index match.
Just a thought:

If the patch that implements the "GENERATED ALWAYS" syntax is accepted,
than creating a seperate field that hold the parsed text and an index
on that column becomes as easy as:
alter table t1 add column text_parsed generated always as to_tsvector(text);
create index idx on t1 using gin (text_parsed fulltext_ops);

I know that there is a trigger function in tsearch that support something
similar, but I really like the simplicity of the statements above.

One a related note - will to_tsvector and to_tsquery be renamed to
something like ft_parse_text() and ft_parse_query() if tsearch2 goes
into core? It seems like the "ts" part of those names would be the only
referenced left to the name "tsearch" if they are not, which could be
somewhat confusing for users.

greetings, Florian Pflug


From: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To: "Florian G(dot) Pflug" <fgp(at)phlo(dot)org>
Cc: Teodor Sigaev <teodor(at)sigaev(dot)ru>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Treat <xzilla(at)users(dot)sourceforge(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: tsearch_core for inclusion
Date: 2007-03-16 15:12:37
Message-ID: 45FAB3E5.7020306@commandprompt.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


> One a related note - will to_tsvector and to_tsquery be renamed to
> something like ft_parse_text() and ft_parse_query() if tsearch2 goes

Furthering this... perhaps even:

ft_search()
ft_query()

Joshua D. Drake

--

=== The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive PostgreSQL solutions since 1997
http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Teodor Sigaev <teodor(at)sigaev(dot)ru>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Robert Treat <xzilla(at)users(dot)sourceforge(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: tsearch_core for inclusion
Date: 2007-03-16 15:24:40
Message-ID: 45FAB6B8.3020301@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> Actually, if you wanted to simplify
> life a bit, you could mark fulltext_ops as being the default opclass
> for text (and varchar I guess) under GIST and GIN. Then it reduces
> to just
>
> CREATE INDEX idxname ON tblname USING gin (textcolumn);
>
>

Nice. This gets my vote.

cheers

andrew


From: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
To: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
Cc: "Florian G(dot) Pflug" <fgp(at)phlo(dot)org>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Treat <xzilla(at)users(dot)sourceforge(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: tsearch_core for inclusion
Date: 2007-03-16 16:16:15
Message-ID: Pine.LNX.4.64.0703161909080.400@sn.sai.msu.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, 16 Mar 2007, Joshua D. Drake wrote:

>
>> One a related note - will to_tsvector and to_tsquery be renamed to
>> something like ft_parse_text() and ft_parse_query() if tsearch2 goes
>
> Furthering this... perhaps even:
>
> ft_search()
> ft_query()

ts_ means Text Search, I don't think ft_ (Full Text) is better.
Going further it should be fts_ (Full Text Search), but we have
many concerns about compatibility and stability of api, so I'd prefer
to stay with ts_.

The more important is what syntax we should accept for the final patch ?
Original proposed or modified version ?
On
http://mira.sai.msu.su/~megera/pgsql/ftsdoc/fts-syntax-compare.html
they are D.1.2 and D.1.3

As I understood Teodor's proposal about CREATE INDEX command is ok for all.

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83


From: Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>
To: "Florian G(dot) Pflug" <fgp(at)phlo(dot)org>
Cc: Teodor Sigaev <teodor(at)sigaev(dot)ru>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Robert Treat <xzilla(at)users(dot)sourceforge(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: tsearch_core for inclusion
Date: 2007-03-16 16:18:55
Message-ID: 45FAC36F.7020109@kaltenbrunner.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Florian G. Pflug wrote:
> Teodor Sigaev wrote:
>> CREATE INDEX idxname ON tblname USING gin (textcolumn fulltext_ops);
>>
>> Fulltext_ops opclass parses the document similarly to_tsvector nad
>> stores lexemes in gin index. It's a full equalent of
>> CREATE INDEX ... ( to_tsvector( textcolumn ) )
>>
>> And, let we define operation text @ text, which is equivalent of text
>> @@ plainto_tsquery(text), so, queries will look like
>> SELECT * FROM tblname WHERE textcolumn @ textquery;
>>
>> Fulltext_ops can speedup both operation, text @@ tsquery and text @
>> text. Because gin API has extractQuery method which calls once per
>> index scan and it can parse query to lexemes.
>>
>> Some disadvantage: with that way it isn't possible make fast ranking -
>> there is no stored parsed text. And, fulltext_ops may be done for GiST
>> index too, but fulltext opclass will be lossy which means slow search
>> due to reparse texts for each index match.
> Just a thought:
>
> If the patch that implements the "GENERATED ALWAYS" syntax is accepted,
> than creating a seperate field that hold the parsed text and an index
> on that column becomes as easy as:
> alter table t1 add column text_parsed generated always as
> to_tsvector(text);
> create index idx on t1 using gin (text_parsed fulltext_ops);

or to take tom's idea into consideration:

ALTER TABLE t1 ADD COLUMN text_parsed GENERATED ALWAYS AS to_tsvector(text);
CREATE INDEX idxname ON t1 USING gin (text_parsed);

which looks pretty nice and simple to me

>
> I know that there is a trigger function in tsearch that support something
> similar, but I really like the simplicity of the statements above.
>
> One a related note - will to_tsvector and to_tsquery be renamed to
> something like ft_parse_text() and ft_parse_query() if tsearch2 goes
> into core? It seems like the "ts" part of those names would be the only
> referenced left to the name "tsearch" if they are not, which could be
> somewhat confusing for users.

well either renaming those functions (and completely destroy the upgrade
path for any current users) or just refer to it as "text search" in the
docs (so that the prefix makes sense).

Stefan


From: "Florian G(dot) Pflug" <fgp(at)phlo(dot)org>
To: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
Cc: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Treat <xzilla(at)users(dot)sourceforge(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: tsearch_core for inclusion
Date: 2007-03-16 16:25:19
Message-ID: 45FAC4EF.4080109@phlo.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Oleg Bartunov wrote:
> On Fri, 16 Mar 2007, Joshua D. Drake wrote:
>
>>
>>> One a related note - will to_tsvector and to_tsquery be renamed to
>>> something like ft_parse_text() and ft_parse_query() if tsearch2 goes
>>
>> Furthering this... perhaps even:
>>
>> ft_search()
>> ft_query()
>
> ts_ means Text Search, I don't think ft_ (Full Text) is better.
> Going further it should be fts_ (Full Text Search), but we have many
> concerns about compatibility and stability of api, so I'd prefer
> to stay with ts_.

Hm, so it could be fts_parse_query() and fts_parse_text()
You could alias it to to_tsvector() and to_tsquery() to
archive api compatibility.

I agree that the names of these functions are really a minor
issue, and api compatibility is more important. But confusing
names can be the source of a lot of errors for new users, so
there *is* a point is naming things consistenly. And if the
cost is basically an entry in pg_proc, why not do it?

greetings, Florian Pflug


From: Robert Treat <xzilla(at)users(dot)sourceforge(dot)net>
To: Teodor Sigaev <teodor(at)sigaev(dot)ru>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: tsearch_core for inclusion
Date: 2007-03-16 19:38:45
Message-ID: 200703161538.45803.xzilla@users.sourceforge.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Friday 16 March 2007 10:45, Teodor Sigaev wrote:
> > I don't see how the proposal is going to solve that type of problem, but
> > maybe I am overlooking something?
>
> The same way as other system tables objects, they don't dump, they don't
> restore. In 8.3, seems, API to index AM will be changed - will anybody
> except pghackers see that? New opclass layout, new opfamily table - users
> don't that changes at all.

If I have configured my tsearch install for spanish (spanish
dictionary/stemmers/synonyms/etc...) aren't I going to lose all that on the
next database upgrade?

--
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL


From: "Florian G(dot) Pflug" <fgp(at)phlo(dot)org>
To: Robert Treat <xzilla(at)users(dot)sourceforge(dot)net>
Cc: Teodor Sigaev <teodor(at)sigaev(dot)ru>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: tsearch_core for inclusion
Date: 2007-03-16 22:09:03
Message-ID: 45FB157F.3010501@phlo.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Treat wrote:
> On Friday 16 March 2007 10:45, Teodor Sigaev wrote:
>>> I don't see how the proposal is going to solve that type of problem, but
>>> maybe I am overlooking something?
>> The same way as other system tables objects, they don't dump, they don't
>> restore. In 8.3, seems, API to index AM will be changed - will anybody
>> except pghackers see that? New opclass layout, new opfamily table - users
>> don't that changes at all.
>
> If I have configured my tsearch install for spanish (spanish
> dictionary/stemmers/synonyms/etc...) aren't I going to lose all that on the
> next database upgrade?

I believe what teodor meant is that the *tables* won't be dumped as such
by pg_dump (the same as all other tables in pg_catalog). But pg_dump would
of course need to dump the information stored in the tables - it would
put some "CREATE FULLTEXT ... " statements into your dump. Its really
the same as for any other database object like a function, a type, ...

greetings, Florian Pflug