Fulltext index

Lists: pgsql-general
From: Andreas Kraftl <andreas(dot)kraftl(at)kraftl(dot)at>
To: pgsql-general(at)postgresql(dot)org
Subject: Fulltext index
Date: 2008-11-08 08:44:17
Message-ID: 1226133857.9831.7.camel@zeus
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Hello again,

my previous post wasn't answered. I think i told my question wrong :-).

I have a table like
a | b
--------------------
'de' | Hallo welt
'en' | Hello world

How can I create a full text index over b?
CREATE INDEX idx ON table USING gin(
to_tsvector(
case
when a = 'de' then 'german'
when a = 'en' then 'english'
else 'english'
end
), b);

This doesn't work. Error Message in german:
FEHLER: Zugriffsmethode »gin« unterstützt keine mehrspaltigen Indexe
SQL state: 0A000
means, gin doesn't accept multicolumn indexes.

Any ideas?

Greetings Andreas
--
Kraftl EDV - Dienstleistungen
Linux, Linuxschulungen, Webprogrammierung
Autofabrikstraße 16/6
1230 Wien


From: Ivan Sergio Borgonovo <mail(at)webthatworks(dot)it>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Fulltext index
Date: 2008-11-08 13:18:29
Message-ID: 20081108141829.0c71abbb@dawn.webthatworks.it
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Sat, 08 Nov 2008 09:44:17 +0100
Andreas Kraftl <andreas(dot)kraftl(at)kraftl(dot)at> wrote:

> Hello again,

> my previous post wasn't answered. I think i told my question
> wrong :-).
>
> I have a table like
> a | b
> --------------------
> 'de' | Hallo welt
> 'en' | Hello world
>
> How can I create a full text index over b?
> CREATE INDEX idx ON table USING gin(
> to_tsvector(
> case
> when a = 'de' then 'german'
> when a = 'en' then 'english'
> else 'english'
> end
> ), b);
>
> This doesn't work. Error Message in german:
> FEHLER: Zugriffsmethode »gin« unterstützt keine mehrspaltigen
> Indexe SQL state: 0A000
> means, gin doesn't accept multicolumn indexes.
>
> Any ideas?

Multicolumn indexes should get into 8.4.

You may add a column tsvector and compute it with a trigger that
chose the correct language when generating the tsvector.
Then you'll have to pick up the correct language when you generate
the tsquery in your search.

http://www.sigaev.ru/gin/fastinsert_and_multicolumn_GIN.pdf

--
Ivan Sergio Borgonovo
http://www.webthatworks.it


From: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
To: Andreas Kraftl <andreas(dot)kraftl(at)kraftl(dot)at>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Fulltext index
Date: 2008-11-08 17:14:27
Message-ID: Pine.LNX.4.64.0811082003481.15810@sn.sai.msu.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Sat, 8 Nov 2008, Andreas Kraftl wrote:

> Hello again,
>
> my previous post wasn't answered. I think i told my question wrong :-).
>
> I have a table like
> a | b
> --------------------
> 'de' | Hallo welt
> 'en' | Hello world
>
> How can I create a full text index over b?

use concatenation operator: tsvector('de',b_de) || tsvector('en',b_en)

> CREATE INDEX idx ON table USING gin(
> to_tsvector(
> case
> when a = 'de' then 'german'
> when a = 'en' then 'english'
> else 'english'
> end
> ), b);
>
> This doesn't work. Error Message in german:
> FEHLER: Zugriffsmethode ЪЪginЪЪ unterstЪЪtzt keine mehrspaltigen Indexe
> SQL state: 0A000
> means, gin doesn't accept multicolumn indexes.
>
> Any ideas?
>
> Greetings Andreas
>

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andreas Kraftl <andreas(dot)kraftl(at)kraftl(dot)at>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Fulltext index
Date: 2008-11-08 17:57:15
Message-ID: 18922.1226167035@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Andreas Kraftl <andreas(dot)kraftl(at)kraftl(dot)at> writes:
> CREATE INDEX idx ON table USING gin(
> to_tsvector(
> case
> when a = 'de' then 'german'
> when a = 'en' then 'english'
> else 'english'
> end
> ), b);

> This doesn't work. Error Message in german:
> FEHLER: Zugriffsmethode gin untersttzt keine mehrspaltigen Indexe
> SQL state: 0A000
> means, gin doesn't accept multicolumn indexes.

You've got the parentheses in the wrong place.

The way I'd suggest doing this is

regression=# create table tab (a regconfig, b text);
CREATE TABLE
regression=# create index idx on tab using gin(to_tsvector(a,b));
CREATE INDEX
regression=# explain select * from tab where to_tsvector(a,b) @@ to_tsquery('english','foo');
QUERY PLAN
----------------------------------------------------------------
Index Scan using idx on tab (cost=0.00..8.27 rows=1 width=36)
Index Cond: (to_tsvector(a, b) @@ '''foo'''::tsquery)
(2 rows)

If you want to use abbreviations like 'en' and 'de', create text search
configurations named that way instead of inserting a run-time
conversion.

regards, tom lane


From: Andreas Kraftl <andreas(dot)kraftl(at)kraftl(dot)at>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Fulltext index
Date: 2008-11-10 08:14:21
Message-ID: d064e$4917ed5d$506ccc24$22153@news.chello.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Am Sat, 08 Nov 2008 09:44:17 +0100 schrieb Andreas Kraftl:
> How can I create a full text index over b?

Hello

thanks for the answers. But nothing matches my problem.

I read the manual again and decide me for an other way.
I change my table that it looks like:

lang | text
----------------------
german | hallo welt
english | hello world

Then i follow the manual which says:
http://www.postgresql.org/docs/8.3/interactive/textsearch-
tables.html#TEXTSEARCH-TABLES-INDEX
It is possible to set up more complex expression indexes wherein the
configuration name is specified by another column, e.g.:
CREATE INDEX pgweb_idx ON pgweb USING gin(to_tsvector(config_name, body));

It is possible to set up more complex expression indexes wherein the
configuration name is specified by another column, e.g.:

I did a
CREATE INDEX idx ON test.test1 USING gin(to_tsvector(lang, "text"));
and
CREATE INDEX idx ON test.test1 USING gin(to_tsvector(lang::regconfig,
"text"));

For both I get this error in german:
FEHLER: Funktionen im Indexausdruck müssen als IMMUTABLE markiert sein
SQL state: 42P17
means in english
functions in index expression must be marked IMMUTABLE

Now I have no idea. My experience with databases and postgresql are too
less to decide if this is a bug, or myself is the bug ;-).

Thanks again for help
Andreas


From: Sam Mason <sam(at)samason(dot)me(dot)uk>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Fulltext index
Date: 2008-11-10 11:26:13
Message-ID: 20081110112613.GZ2459@frubble.xen.chris-lamb.co.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Mon, Nov 10, 2008 at 09:14:21AM +0100, Andreas Kraftl wrote:
> Am Sat, 08 Nov 2008 09:44:17 +0100 schrieb Andreas Kraftl:
> > How can I create a full text index over b?
>
> thanks for the answers. But nothing matches my problem.

I'm not sure what's wrong with Oleg's suggestion--he's the guy who wrote
most of the code so his suggestions should be reasonable! I'm just
learning about this stuff myself, so it may be somewhat sub-optimal.
That said, I got things working when doing the following:

CREATE TABLE test (
lang TEXT,
text TEXT
);

INSERT INTO test VALUES
('german', 'hallo welt'),
('english', 'hello world');

CREATE INDEX idx ON test USING gin(tsvector_concat(
to_tsvector('german', CASE lang WHEN 'german' THEN text ELSE '' END),
to_tsvector('english', CASE lang WHEN 'english' THEN text ELSE '' END)));

"text" as a column name gets a bit confusing to read, but I'm trying
to follow your names. Also my version of PG didn't seem to know that
the '||' operator knows how to concat tsvectors, so I had to spell out
tsvector_concat in full. Querying is a bit awkward, but works:

SELECT *
FROM test
WHERE tsvector_concat(
to_tsvector('english', CASE lang WHEN 'english' THEN text ELSE '' END),
to_tsvector('german', CASE lang WHEN 'german' THEN text ELSE '' END))
@@ to_tsquery('english', 'hello');

Putting most of the above into a query would work, as would having PG
automatically maintaining a column of type TSVECTOR.

> I read the manual again and decide me for an other way.
> I change my table that it looks like:
>
> lang | text
> ----------------------
> german | hallo welt
> english | hello world

What types do these columns have? if "lang" is of type REGCONFIG all
works for me:

CREATE TABLE test (
lang REGCONFIG,
text TEXT
);

INSERT INTO test VALUES
('german', 'hallo welt'),
('english', 'hello world');

CREATE INDEX idx ON test USING gin(to_tsvector(lang, text));

SELECT *
FROM test
WHERE to_tsvector(lang, text) @@ to_tsquery('english', 'hello');

This all seems much easier than having "lang" as a TEXT column.

> Now I have no idea. My experience with databases and postgresql are too
> less to decide if this is a bug, or myself is the bug ;-).

I think the awkward thing is that text/strings are visually indistin-
guishable from arbitrary literals in SQL. The 'english' that's going
into the to_tsquery() call above is actually of type REGCONFIG, but it
looks like a string literal. I think that could be why you were getting
confused before.

Hope that all makes sense and helps a bit!

Sam


From: Sam Mason <sam(at)samason(dot)me(dot)uk>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Fulltext index
Date: 2008-11-10 11:57:59
Message-ID: 20081110115759.GB2459@frubble.xen.chris-lamb.co.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Before waking up properly, I wrote:
> Querying is a bit awkward, but works:
>
> SELECT *
> FROM test
> WHERE tsvector_concat(
> to_tsvector('english', CASE lang WHEN 'english' THEN text ELSE '' END),
> to_tsvector('german', CASE lang WHEN 'german' THEN text ELSE '' END))
> @@ to_tsquery('english', 'hello');
>
> Putting most of the above into a query would work

That should be "Putting most of the above into a VIEW would work"! The
example would be:

CREATE VIEW test_v AS
SELECT lang, text, tsvector_concat(
to_tsvector('english', CASE lang WHEN 'english' THEN text ELSE '' END),
to_tsvector('german', CASE lang WHEN 'german' THEN text ELSE '' END))
AS tsvec
FROM test;

Allowing you to do:

SELECT lang, text
FROM test_v
WHERE tsvec @@ to_tsquery('english', 'hello');

Sam