Re: 8.3 full text search docs

Lists: pgsql-hackers
From: andy <andy(at)squeakycode(dot)net>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: 8.3 full text search docs
Date: 2007-10-13 22:00:06
Message-ID: 47113FE6.5030001@squeakycode.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I have two doc updates I'd like to offer. I see we have two example
sections: creating rule-based dict's and creating parsers. When I was
starting I would have liked to see an example usage.

I'd like to offer: "example usage" and "Upgrading".
This is my first draft, if anyone has suggestions I'd be interested in
hearing them. Also, I'm not sure where or who to send this to, so I
hope -hacker is ok.

----- Example Usage -----
Staring a new project with Full Text Searching is easy. There is
nothing to install anymore, its all built in (in fact, don't install the
contrib module tsearch2 because it will conflict with the tsearch2 built
into the core).

We need to add a new column of type tsvector to the table you'd like to
search. In this example we'll use a table called notes. If your table
exists use:

alter table notes add searchvec tsvector;

If not use:

create table notes (
rowid integer,
note text,
searchvec tsvector
);

The searchvec column is what we will use for searching, so you probably
want to create an index on it... from another place in the manual:

(http://www.postgresql.org/docs/8.3/static/textsearch-indexes.html)

GiST indexes are very good for dynamic data and fast if the number of
unique words (lexemes) is under 100,000,
GIN indexes are best for static data with +100,000 lexemes.

For our example, I'll create a gist index:

create index notesvec on notes using gist(searchvec);

If you have existing data, we need to fill the searchvec column:

update notes set searchvec = to_tsvector(note);

After the update, any rows are inserted or updated will not have their
searchvec column set automatically, for that we need to create a trigger:

create trigger notevecupdate
before insert or update on notes
for each row
execute procedure
tsvector_update_trigger(searchvec, 'pg_catalog.english', note);

Some data:
insert into notes(rowid, note) values(1, 'this is a test');

insert into notes(rowid, note)
values(2, 'I do not like green eggs and ham');

insert into notes(rowid, note) values(3, 'the cat in the hat');

insert into notes(rowid, note)
values(4, 'rage against the dying of the light');

And now we can query it:

select * from notes where searchvec @@ to_tsquery('light');

or

select * from notes, to_tsquery('test') as q where searchvec @@ q;

writing it this way lets you reuse the tsquery "q" like this:

select note, ts_rank(searchvec, q)
from notes, to_tsquery('test') as q
where searchvec @@ q
order by ts_rank(searchvec, q);
----- Example Usage -----

----- Upgrade from prior versions -----

When tsearch2 was put into core, some functions and types were renamed,
among other things. A simple backup and restore will not work to
migrate your database from versions below 8.3 to 8.3.

In general, the way to do it is backup all your data without the
tsearch2 stuff, restore just the data, then recreate the tsearch2 stuff
by hand.

(Its easier to think of this as an upgrade from tsearch2 to tsearch3,
but without the whole renaming it to tsearch3 thing)

To make it a little easier, there is a way using the pg_restore to
selective restore everything that is not in the old tsearch2.

First you must use "pg_dump -Fc" to backup your existing database.

Then we will create an item list of things we want to restore using this
perl script. It will strip out all the things that look like tsearch2,
and return (to stdout) a list of things you should restore.

For example:

pg_dump -Fc -h oldserver -f ubberbase.bak ubberbase
perl maketoc.pl ubberbase.bak > toc
# now restore just the items in toc
pg_restore -Fc --create -d postgres --use-list toc -f ubberbse.bak

There is one thing that will fail, that's the trigger you used to update
the tsvector column. Its because the function tsearch2 was renamed to
tsvector_update_trigger. You'll need to recreate the trigger by hand.

Now that the structures and data are restored, you'll need to go through
and redo the tsearch2 stuff by hand.

After you get the database fixed up, you'll also need to update your
client programs (php, perl, etc). For the most part, just renameing
things (like rank to ts_rank) should be all that's required.

Oleg Bartunov has an incomplete list of items that have been renamed:

http://www.sai.msu.su/~megera/wiki/Tsearch2_83_changes

----- Upgrade from prior versions -----

Attachment Content-Type Size
maketoc.pl text/plain 3.4 KB

From: Albert Cervera i Areny <albert(at)nan-tic(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Cc: andy <andy(at)squeakycode(dot)net>
Subject: Re: 8.3 full text search docs
Date: 2007-10-13 22:24:49
Message-ID: 200710140024.49931.albert@nan-tic.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andy,
note that documentation is discussed in the pgsql-docs list and patches
usually are submitted to the pgsql-patches list. Nice to see both new
sections, by the way.

A Diumenge 14 Octubre 2007, andy va escriure:
> I have two doc updates I'd like to offer. I see we have two example
> sections: creating rule-based dict's and creating parsers. When I was
> starting I would have liked to see an example usage.
>
> I'd like to offer: "example usage" and "Upgrading".
> This is my first draft, if anyone has suggestions I'd be interested in
> hearing them. Also, I'm not sure where or who to send this to, so I
> hope -hacker is ok.
>
> ----- Example Usage -----
> Staring a new project with Full Text Searching is easy. There is
> nothing to install anymore, its all built in (in fact, don't install the
> contrib module tsearch2 because it will conflict with the tsearch2 built
> into the core).
>
> We need to add a new column of type tsvector to the table you'd like to
> search. In this example we'll use a table called notes. If your table
> exists use:
>
> alter table notes add searchvec tsvector;
>
> If not use:
>
> create table notes (
> rowid integer,
> note text,
> searchvec tsvector
> );
>
> The searchvec column is what we will use for searching, so you probably
> want to create an index on it... from another place in the manual:
>
> (http://www.postgresql.org/docs/8.3/static/textsearch-indexes.html)
>
> GiST indexes are very good for dynamic data and fast if the number of
> unique words (lexemes) is under 100,000,
> GIN indexes are best for static data with +100,000 lexemes.
>
> For our example, I'll create a gist index:
>
> create index notesvec on notes using gist(searchvec);
>
> If you have existing data, we need to fill the searchvec column:
>
> update notes set searchvec = to_tsvector(note);
>
> After the update, any rows are inserted or updated will not have their
> searchvec column set automatically, for that we need to create a trigger:
>
> create trigger notevecupdate
> before insert or update on notes
> for each row
> execute procedure
> tsvector_update_trigger(searchvec, 'pg_catalog.english', note);
>
>
> Some data:
> insert into notes(rowid, note) values(1, 'this is a test');
>
> insert into notes(rowid, note)
> values(2, 'I do not like green eggs and ham');
>
> insert into notes(rowid, note) values(3, 'the cat in the hat');
>
> insert into notes(rowid, note)
> values(4, 'rage against the dying of the light');
>
> And now we can query it:
>
> select * from notes where searchvec @@ to_tsquery('light');
>
> or
>
> select * from notes, to_tsquery('test') as q where searchvec @@ q;
>
> writing it this way lets you reuse the tsquery "q" like this:
>
> select note, ts_rank(searchvec, q)
> from notes, to_tsquery('test') as q
> where searchvec @@ q
> order by ts_rank(searchvec, q);
> ----- Example Usage -----
>
>
>
> ----- Upgrade from prior versions -----
>
> When tsearch2 was put into core, some functions and types were renamed,
> among other things. A simple backup and restore will not work to
> migrate your database from versions below 8.3 to 8.3.
>
> In general, the way to do it is backup all your data without the
> tsearch2 stuff, restore just the data, then recreate the tsearch2 stuff
> by hand.
>
> (Its easier to think of this as an upgrade from tsearch2 to tsearch3,
> but without the whole renaming it to tsearch3 thing)
>
> To make it a little easier, there is a way using the pg_restore to
> selective restore everything that is not in the old tsearch2.
>
> First you must use "pg_dump -Fc" to backup your existing database.
>
> Then we will create an item list of things we want to restore using this
> perl script. It will strip out all the things that look like tsearch2,
> and return (to stdout) a list of things you should restore.
>
> For example:
>
> pg_dump -Fc -h oldserver -f ubberbase.bak ubberbase
> perl maketoc.pl ubberbase.bak > toc
> # now restore just the items in toc
> pg_restore -Fc --create -d postgres --use-list toc -f ubberbse.bak
>
> There is one thing that will fail, that's the trigger you used to update
> the tsvector column. Its because the function tsearch2 was renamed to
> tsvector_update_trigger. You'll need to recreate the trigger by hand.
>
> Now that the structures and data are restored, you'll need to go through
> and redo the tsearch2 stuff by hand.
>
> After you get the database fixed up, you'll also need to update your
> client programs (php, perl, etc). For the most part, just renameing
> things (like rank to ts_rank) should be all that's required.
>
>
> Oleg Bartunov has an incomplete list of items that have been renamed:
>
> http://www.sai.msu.su/~megera/wiki/Tsearch2_83_changes
>
> ----- Upgrade from prior versions -----

--
Albert Cervera i Areny
http://www.NaN-tic.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: andy <andy(at)squeakycode(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: 8.3 full text search docs
Date: 2007-10-16 01:24:27
Message-ID: 29857.1192497867@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

andy <andy(at)squeakycode(dot)net> writes:
> I'd like to offer: "example usage" and "Upgrading".

I did some hacking on your perl script to make it a bit more
bulletproof; I was worried about removing any function named 'concat'
for instance. Attached is what it looks like now.

I'm not entirely sure what to do with the script --- perhaps we should
put it into contrib/tsearch2 and have it get installed from there?

regards, tom lane

Attachment Content-Type Size
unknown_filename text/plain 8.1 KB

From: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: andy <andy(at)squeakycode(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: 8.3 full text search docs
Date: 2007-10-16 03:28:06
Message-ID: Pine.LNX.4.64.0710160726320.25678@sn.sai.msu.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, 15 Oct 2007, Tom Lane wrote:

> andy <andy(at)squeakycode(dot)net> writes:
>> I'd like to offer: "example usage" and "Upgrading".
>
> I did some hacking on your perl script to make it a bit more
> bulletproof; I was worried about removing any function named 'concat'
> for instance. Attached is what it looks like now.
>
> I'm not entirely sure what to do with the script --- perhaps we should
> put it into contrib/tsearch2 and have it get installed from there?

yes, contrib/tsearch2 looks ok. As a bonus we could back-port it to the
previous releases and mention in release notes.

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83