Re: Full text Indexing -out of contrib and into main..

Lists: pgsql-hackers
From: "John Huttley" <John(at)mwk(dot)co(dot)nz>
To: <pgsql-hackers(at)postgresql(dot)org>
Subject: Full text Indexing -out of contrib and into main..
Date: 2000-11-28 01:51:43
Message-ID: 004001c058dd$c2482aa0$1401a8c0@MWK.co.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

>
> Maybe asking 'Why isn't the contrib full-text-indexer not in the main
> tree?' would be more productive on that front.

Well, yes. Why isn't it?

Full text indexing should be just as much a feature as any other key feature in
PG.
With the advent of unlimited file and record lengths in 7.1, this would be a good
time to
include it.

FTI is particularly useful in the context of web content engines.

> How did you attempt to build it under the RPM install? I assume you had
> the postgresql-devel package installed, and the include paths set
> properly.... Of course, most of what is in contrib assumes a full source
> tree is lying around (argh)....ie, it's wanting to include
> Makefile.global in its Makefile. And, on the RPM dist, Makefile.global
> isn't (yet) packaged.

Yes, I have the devel RPM, but FTI couldn't find its include files. Or its
libraries. Or something.

Building from a source tree has always been better for me. The catch is mixing
that with the RPMS,
which put things in unholy locations.
Its very hard (for me at least) to update an RPM version with a version compiled
from the source.

I recently updated my PG system from 6.5.3 to 7.0.3, still RPMs, (another fun
job) and have not tried to compile FTI
subsequently.

However if I tried hard enough I'm sure I could fix it. For the moment, I'm on
another job so I'm not worrying.

Regards

John


From: Don Baccus <dhogaza(at)pacifier(dot)com>
To: "John Huttley" <John(at)mwk(dot)co(dot)nz>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Full text Indexing -out of contrib and into main..
Date: 2000-11-28 02:27:04
Message-ID: 3.0.1.32.20001127182704.01b54420@mail.pacifier.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

At 02:51 PM 11/28/00 +1300, John Huttley wrote:
>>
>> Maybe asking 'Why isn't the contrib full-text-indexer not in the main
>> tree?' would be more productive on that front.
>
>Well, yes. Why isn't it?
>
>Full text indexing should be just as much a feature as any other key feature in
>PG.
>With the advent of unlimited file and record lengths in 7.1, this would be a good
>time to
>include it.
>
>FTI is particularly useful in the context of web content engines.

Well ... it's pretty inadequate, actually. That might be one reason it's only
in contrib.

- Don Baccus, Portland OR <dhogaza(at)pacifier(dot)com>
Nature photos, on-line guides, Pacific Northwest
Rare Bird Alert Service and other goodies at
http://donb.photo.net.


From: Lamar Owen <lamar(dot)owen(at)wgcr(dot)org>
To: John Huttley <John(at)mwk(dot)co(dot)nz>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Full text Indexing -out of contrib and into main..
Date: 2000-11-28 02:50:35
Message-ID: 3A231D7B.AB56CED1@wgcr.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

John Huttley wrote:
> > Maybe asking 'Why isn't the contrib full-text-indexer not in the main
> > tree?' would be more productive on that front.

> Well, yes. Why isn't it?

I'm hoping to see the answer to that one myself, as that is outside my
scope currently. I just RPMize things... Although, I didn't intend for
my statement to appear as harsh as it does -- for that I apologize.

> Yes, I have the devel RPM, but FTI couldn't find its include files. Or its
> libraries. Or something.

Makefile.global. Tried it here. The RPMset hasn't heretofore needed
Makefile.global. I may package that, amongst other stuff necessary to
build certain things in the devel package -- once I find out how to go
about doing it.

> Building from a source tree has always been better for me. The catch is mixing
> that with the RPMS,
> which put things in unholy locations.
> Its very hard (for me at least) to update an RPM version with a version compiled
> from the source.

I recommend completely removing the RPM version and installing from
source rather than trying to upgrade from an RPM distribution to the
from-source distribution. Or just install the next RPM version. Let
RPM work the headaches for you. If you want to run from a 'from-source'
build, then nix the RPMset altogether and don't worry about it
afterward.

Although I am going to consider a pre-built set of contribs -- most
notably, geodistance is likely to find its way into an RPM in the
future. I just haven't decided whether to split out to individual
contribs or to just make a single 'postgresql-contrib' subpackage. I'm
open to suggestions.

You can, however, install a source-tree preconfigured and built for the
RPM modifications by installing the _source_ RPM, and then issuing, as
root, 'rpm -bi postgresql.spec' from within the /usr/src/redhat/SPECS
dir. You will then have a source tree primed for building whatever in
/usr/src/redhat/BUILD/postgresql-x.y.z (where x.y.z is the version, of
course). You will need python-devel installed in order to do that,
however.

HTH.
--
Lamar Owen
WGCR Internet Radio
1 Peter 4:11


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Don Baccus <dhogaza(at)pacifier(dot)com>
Cc: John Huttley <John(at)mwk(dot)co(dot)nz>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Full text Indexing -out of contrib and into main..
Date: 2000-11-28 02:53:11
Message-ID: 200011280253.VAA02228@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> >Well, yes. Why isn't it?
> >
> >Full text indexing should be just as much a feature as any other key feature in
> >PG.
> >With the advent of unlimited file and record lengths in 7.1, this would be a good
> >time to
> >include it.
> >
> >FTI is particularly useful in the context of web content engines.
>
> Well ... it's pretty inadequate, actually. That might be one reason it's only
> in contrib.

OK, can someone collect suggestions, add the code, and integrate it for
7.1?

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026


From: The Hermit Hacker <scrappy(at)hub(dot)org>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: Don Baccus <dhogaza(at)pacifier(dot)com>, John Huttley <John(at)mwk(dot)co(dot)nz>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Full text Indexing -out of contrib and into main..
Date: 2000-11-28 03:06:01
Message-ID: Pine.BSF.4.21.0011272305500.425-100000@thelab.hub.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, 27 Nov 2000, Bruce Momjian wrote:

> > >Well, yes. Why isn't it?
> > >
> > >Full text indexing should be just as much a feature as any other key feature in
> > >PG.
> > >With the advent of unlimited file and record lengths in 7.1, this would be a good
> > >time to
> > >include it.
> > >
> > >FTI is particularly useful in the context of web content engines.
> >
> > Well ... it's pretty inadequate, actually. That might be one reason it's only
> > in contrib.
>
> OK, can someone collect suggestions, add the code, and integrate it for
> 7.1?

too late in cycle ...


From: Don Baccus <dhogaza(at)pacifier(dot)com>
To: The Hermit Hacker <scrappy(at)hub(dot)org>, Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: John Huttley <John(at)mwk(dot)co(dot)nz>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Full text Indexing -out of contrib and into main..
Date: 2000-11-28 05:06:03
Message-ID: 3.0.1.32.20001127210603.01b5b440@mail.pacifier.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

At 11:06 PM 11/27/00 -0400, The Hermit Hacker wrote:
>On Mon, 27 Nov 2000, Bruce Momjian wrote:

>> OK, can someone collect suggestions, add the code, and integrate it for
>> 7.1?
>
>too late in cycle ...

Yes...

- Don Baccus, Portland OR <dhogaza(at)pacifier(dot)com>
Nature photos, on-line guides, Pacific Northwest
Rare Bird Alert Service and other goodies at
http://donb.photo.net.


From: "Mitch Vincent" <mitch(at)venux(dot)net>
To: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Full text Indexing -out of contrib and into main..
Date: 2000-11-28 06:26:10
Message-ID: 004901c05904$19a13370$0200000a@windows
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I modified the FTI trigger for my own use a while ago (indexes whole
words, eliminates duplicate a few other things) -- I'm not sure if it would
do anyone any good but you're welcome to it. To whom should I send it?

-Mitch

----- Original Message -----
From: "The Hermit Hacker" <scrappy(at)hub(dot)org>
To: "Bruce Momjian" <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: "Don Baccus" <dhogaza(at)pacifier(dot)com>; "John Huttley" <John(at)mwk(dot)co(dot)nz>;
<pgsql-hackers(at)postgresql(dot)org>
Sent: Monday, November 27, 2000 7:06 PM
Subject: Re: [HACKERS] Full text Indexing -out of contrib and into main..

> On Mon, 27 Nov 2000, Bruce Momjian wrote:
>
> > > >Well, yes. Why isn't it?
> > > >
> > > >Full text indexing should be just as much a feature as any other key
feature in
> > > >PG.
> > > >With the advent of unlimited file and record lengths in 7.1, this
would be a good
> > > >time to
> > > >include it.
> > > >
> > > >FTI is particularly useful in the context of web content engines.
> > >
> > > Well ... it's pretty inadequate, actually. That might be one reason
it's only
> > > in contrib.
> >
> > OK, can someone collect suggestions, add the code, and integrate it for
> > 7.1?
>
> too late in cycle ...
>
>
>


From: Thomas Lockhart <lockhart(at)alumni(dot)caltech(dot)edu>
To: Lamar Owen <lamar(dot)owen(at)wgcr(dot)org>
Cc: John Huttley <John(at)mwk(dot)co(dot)nz>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Full text Indexing -out of contrib and into main..
Date: 2000-11-28 06:29:49
Message-ID: 3A2350DD.CEAF6D82@alumni.caltech.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> > > Maybe asking 'Why isn't the contrib full-text-indexer not in the main
> > > tree?' would be more productive on that front.
> > Well, yes. Why isn't it?

I believe that it is appropriate for contrib/ because it is a good demo
of FTI-like capabilities. But nothing more, yet. For at least a couple
of reasons:

1) It generates the "index" as a table, not a PostgreSQL index or
index-like thing.

2) It has a hardcoded list of non-indexed words. This should come from a
table, to allow it to be tuned to the application requirements.

Comments?

- Thomas


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Mitch Vincent <mitch(at)venux(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Full text Indexing -out of contrib and into main..
Date: 2000-11-28 06:56:26
Message-ID: 200011280656.BAA08030@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

[ Charset ISO-8859-1 unsupported, converting... ]
> I modified the FTI trigger for my own use a while ago (indexes whole
> words, eliminates duplicate a few other things) -- I'm not sure if it would
> do anyone any good but you're welcome to it. To whom should I send it?

Is full-word optional or mandatory? It has to be an option.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026


From: "john huttley" <john(at)mwk(dot)co(dot)nz>
To: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Full text Indexing -out of contrib and into main..
Date: 2000-11-28 07:53:42
Message-ID: 002a01c05910$53ca75a0$ca5fa8c0@hisdad.org.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> > OK, can someone collect suggestions, add the code, and integrate it for
> > 7.1?
>
> too late in cycle ...

How about first thing for 7.2 then? While it lies in limbo,
its never going to get the attention it deserves.

Regards


From: "john huttley" <john(at)mwk(dot)co(dot)nz>
To: "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Full text Indexing -out of contrib and into main..
Date: 2000-11-28 08:09:38
Message-ID: 008701c05912$8daa3740$ca5fa8c0@hisdad.org.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


> I believe that it is appropriate for contrib/ because it is a good demo
> of FTI-like capabilities. But nothing more, yet. For at least a couple
> of reasons:
>
> 1) It generates the "index" as a table, not a PostgreSQL index or
> index-like thing.
>
> 2) It has a hardcoded list of non-indexed words. This should come from a
> table, to allow it to be tuned to the application requirements.
>
> Comments?
>
> - Thomas
>

In general..
a) Considering that I was coding up the same thing with triggers and such,
things could only get better.

b) Check out MSSQL 7's capabilities and weep.

c) It would be a start. One its in the tree, it gets used more, gets
improved..

It would be a while yet before 7.2 starts, plenty of time then to develop
it further.

Regards

John


From: Hannu Krosing <hannu(at)tm(dot)ee>
To: john huttley <john(at)mwk(dot)co(dot)nz>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Don Baccus <dhogaza(at)pacifier(dot)com>, Thomas Lockhart <lockhart(at)alumni(dot)caltech(dot)edu>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: Full text Indexing -out of contrib and into main..
Date: 2000-11-28 09:27:51
Message-ID: 3A237A97.D10C9E20@tm.ee
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

john huttley wrote:
>
> > I believe that it is appropriate for contrib/ because it is a good demo
> > of FTI-like capabilities. But nothing more, yet. For at least a couple
> > of reasons:
> >
> > 1) It generates the "index" as a table, not a PostgreSQL index or
> > index-like thing.
> >
> > 2) It has a hardcoded list of non-indexed words. This should come from a
> > table, to allow it to be tuned to the application requirements.
> >
> > Comments?
> >
> > - Thomas
> >
>
> In general..
> a) Considering that I was coding up the same thing with triggers and such,
> things could only get better.

AFAIK, the one in contrib _is_ the same thing coded up with triggers and
such ;)

> b) Check out MSSQL 7's capabilities and weep.

BTW, have you studied MSSQL enough to tell me if it has a
separate/standalone
(as a process) fti engine or just another index type.

I have been contemplating about implementing FTI for postgres for some
time and my
current plan would be to implement a out-of-process fti engine (API +
sample
implementation, in the spirit of PostgreSQLs extensibility) that could
postpone
the actual indexing but still help with queries even for not yet fully
indexed stuff.

Will probably need some choreography but essential for high performance.

You generally don't want to wait for all index entries of an inverted
index to be saved.

Also the thing should be more general than the one in contrib , being
able to index
both fields and full records and support functional indexes.

Is there a way to make PostgresQL optimiser aware of the
selectivity/cost of function,
so that it can do the right thing for a query like

SELECT * FROM ARTICLES
WHERE ADATE BETWEEN YESTERDAY AND TOMORROW
AND ARTICLES.FTI_MATCHES('(CAT & DOG) ! PRESIDENT')

It would be almost automatic if functions could return sets and then be
used like

SELECT * FROM ARTICLE
WHERE ADATE BETWEEN YESTERDAY AND TOMORROW
AND ARTICLE_ID = ARTICLE.FTI_MATCHING_IDS('(CAT & DOG) ! PRESIDENT')

and somehow the optimiser would know that it can join on the returned
ids but this
is probably not the case ;)

> c) It would be a start. One its in the tree, it gets used more, gets
> improved..

But, it is not a _real_ full text index, just a postgresql sample
application that
implements a full text index using an sql database.

----------
Hannu