ERROR: Gin doesn't support full scan due to it's awful inefficiency

Lists: pgsql-general
From: Charlie Savage <cfis(at)savagexi(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: ERROR: Gin doesn't support full scan due to it's awful inefficiency
Date: 2006-09-05 22:23:25
Message-ID: 44FDF8DD.9000703@savagexi.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

I've run across another GIN index issue - using postgresql 8.1.4 on
Window/Linux with the GIN/tsearch2 patch.

I have two tables like this:

CREATE TABLE maps
(
id serial,
query tsearch2.tsquery
)

CREATE TABLE features
(
id serial,
vector tsearch2.tsvector
)

CREATE INDEX idx_features_tags_vector ON features USING gin (tags_vector);

Where maps.query contains cached tsquery (they are cached for
performance reasons).

When I run this query:

select *
from maps, features
where to_tsquery('test') @@ features.tags_vector

I get this error:

ERROR: Gin doesn't support full scan due to it's awful inefficiency

Here is explain (from a very small test database):

Nested Loop (cost=0.00..1878.71 rows=370 width=208)
-> Seq Scan on maps (cost=0.00..14.80 rows=480 width=136)
-> Index Scan using idx_features_tags_vector on features
(cost=0.00..3.87 rows=1 width=72)
Index Cond: ("outer".query @@ features.tags_vector)

I thought that this would solve my problem:

set enable_indexscan to off;

But it does not.

Interestingly, this does work:

select *
from features
where to_tsquery('test') @@ features.tags_vector;

Explain:

Index Scan using idx_features_tags_vector on features (cost=0.00..3.87
rows=1 width=72)
Index Cond: ('''test'''::tsquery @@ tags_vector)

At first I thought the issue was that you couldn't use an Index Scan on
gin index, but that now seems like an incorrect conclusion.

So, two things:

1. How do I work around this issue?
2. Seems like postgresql should be smart enough to pick a query that
will run.

Thanks,

Charlie


From: Charlie Savage <cfis(at)savagexi(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: ERROR: Gin doesn't support full scan due to it's awful
Date: 2006-09-05 22:27:37
Message-ID: 44FDF9D9.5020903@savagexi.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Sorry, mistyped the query causing the problem. It is:

select *
from maps, features
where maps.query @@ features.tags_vector;

Thanks,

Charlie

Charlie Savage wrote:
> I've run across another GIN index issue - using postgresql 8.1.4 on
> Window/Linux with the GIN/tsearch2 patch.
>
> I have two tables like this:
>
> CREATE TABLE maps
> (
> id serial,
> query tsearch2.tsquery
> )
>
>
> CREATE TABLE features
> (
> id serial,
> vector tsearch2.tsvector
> )
>
> CREATE INDEX idx_features_tags_vector ON features USING gin (tags_vector);
>
>
> Where maps.query contains cached tsquery (they are cached for
> performance reasons).
>
> When I run this query:
>
> select *
> from maps, features
> where to_tsquery('test') @@ features.tags_vector
>
> I get this error:
>
> ERROR: Gin doesn't support full scan due to it's awful inefficiency
>
> Here is explain (from a very small test database):
>
> Nested Loop (cost=0.00..1878.71 rows=370 width=208)
> -> Seq Scan on maps (cost=0.00..14.80 rows=480 width=136)
> -> Index Scan using idx_features_tags_vector on features
> (cost=0.00..3.87 rows=1 width=72)
> Index Cond: ("outer".query @@ features.tags_vector)
>
> I thought that this would solve my problem:
>
> set enable_indexscan to off;
>
> But it does not.
>
> Interestingly, this does work:
>
> select *
> from features
> where to_tsquery('test') @@ features.tags_vector;
>
> Explain:
>
> Index Scan using idx_features_tags_vector on features (cost=0.00..3.87
> rows=1 width=72)
> Index Cond: ('''test'''::tsquery @@ tags_vector)
>
> At first I thought the issue was that you couldn't use an Index Scan on
> gin index, but that now seems like an incorrect conclusion.
>
> So, two things:
>
> 1. How do I work around this issue?
> 2. Seems like postgresql should be smart enough to pick a query that
> will run.
>
> Thanks,
>
> Charlie
>
>
>
>
>
>
>
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
> http://archives.postgresql.org


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Charlie Savage <cfis(at)savagexi(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: ERROR: Gin doesn't support full scan due to it's awful
Date: 2006-09-05 22:43:47
Message-ID: 29410.1157496227@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Charlie Savage <cfis(at)savagexi(dot)com> writes:
> Sorry, mistyped the query causing the problem. It is:

> select *
> from maps, features
> where maps.query @@ features.tags_vector;

In that case it's fair to ask what query values you have stored in maps.
In particular I imagine that you'll find that a specific query is
causing the problem ...

regards, tom lane


From: Charlie Savage <cfis(at)savagexi(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: ERROR: Gin doesn't support full scan due to it's awful
Date: 2006-09-05 23:26:40
Message-ID: 44FE07B0.3050803@savagexi.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Hi Tom,

Thanks for the quick reply.

>> Sorry, mistyped the query causing the problem. It is:
>
>> select *
>> from maps, features
>> where maps.query @@ features.tags_vector;
>
> In that case it's fair to ask what query values you have stored in maps.
> In particular I imagine that you'll find that a specific query is
> causing the problem ...
>
> regards, tom lane

Interesting...that seems to be the case. For example, this will fail:

explain analyze
select *
from test.features
where to_tsquery('') @@ features.vector

ERROR: Gin doesn't support full scan due to it's awful inefficiency

Interestingly this works:

explain analyze
select *
from test.features
where NULL @@ features.vector

Here is a slightly bigger test case:

--drop schema test cascade;
create schema test;

CREATE TABLE test.maps
(
id serial,
query tsquery
);

CREATE TABLE test.features
(
id serial,
vector tsvector
);

CREATE INDEX features_vector ON test.features USING gin (vector);

INSERT INTO test.maps (query)
VALUES (to_tsquery(''));

INSERT INTO test.features (vector)
VALUES (to_tsvector('test'));

analyze test.maps;
analyze test.features;

----------

Now try this, which won't work (ERROR: Gin doesn't support full scan
due to it's awful inefficiency):

set enable_seqscan to off;

explain
select *
from test.maps, test.features
where features.vector @@ maps.query

Nested Loop (cost=100000000.00..100000004.04 rows=1 width=36)
-> Seq Scan on maps (cost=100000000.00..100000001.01 rows=1 width=12)
-> Index Scan using features_vector on features (cost=0.00..3.01
rows=1 width=24)
Index Cond: (features.vector @@ "outer".query)

However, this works:

set enable_seqscan to on;
set enable_indexscan to off;
set enable_bitmapscan to off;

explain analyze
select *
from test.maps, test.features
where features.vector @@ maps.query

Nested Loop (cost=200000000.00..200000002.03 rows=1 width=36) (actual
time=0.055..0.055 rows=0 loops=1)
Join Filter: ("inner".vector @@ "outer".query)
-> Seq Scan on maps (cost=100000000.00..100000001.01 rows=1
width=12) (actual time=0.011..0.014 rows=1 loops=1)
-> Seq Scan on features (cost=100000000.00..100000001.01 rows=1
width=24) (actual time=0.006..0.010 rows=1 loops=1)
Total runtime: 0.129 ms

You see the same things if you put a NULL in the query column (unlike
above). If instead, you do this in the script above:

INSERT INTO test.maps (query)
VALUES (to_tsquery('test'));

Then it always works.

Seems like the moral of the story, tsquery values of '' or NULL don't
work.

That is surprising to me - maybe the documentation should point out this
issue?

Thanks,

Charlie


From: Teodor Sigaev <teodor(at)sigaev(dot)ru>
To: Charlie Savage <cfis(at)savagexi(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-general(at)postgresql(dot)org
Subject: Re: ERROR: Gin doesn't support full scan due to it's awful
Date: 2006-09-06 07:42:46
Message-ID: 44FE7BF6.8090407@sigaev.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

> explain analyze
> select *
> from test.features
> where to_tsquery('') @@ features.vector
>
> ERROR: Gin doesn't support full scan due to it's awful inefficiency

Look:
contrib_regression=# select '{1,2,3}'::int4[] @ '{}'; --contains
?column?
----------
t
(1 row)

contrib_regression=# select '{1,2,3}'::int4[] && '{}'; --overlap
?column?
----------
f
(1 row)

contrib_regression=# select to_tsvector('asdasd') @@ ''::tsquery;
NOTICE: query doesn't contain lexeme(s)
?column?
----------
f
(1 row)

Semantic of different operation with void (but not NULL) argument is very
different. If query doesn't contain any entry (returned by extractQuery() index
support method), then GIN, in any case, doesn't know what it should return:
whole set of pointers or nothing. But GIN can't return all - it will be
very-very slow, because there is a lot of pointers in GIN index to each table's row.

It seems to me that message makes confuse about reason of error...

>
> Interestingly this works:
>
> explain analyze
> select *
> from test.features
> where NULL @@ features.vector

That is because @@ is marked as 'returns NULL on NULL input', ie index will not
be used.

--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/


From: Charlie Savage <cfis(at)savagexi(dot)com>
To: Teodor Sigaev <teodor(at)sigaev(dot)ru>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-general(at)postgresql(dot)org
Subject: Re: ERROR: Gin doesn't support full scan due to it's awful
Date: 2006-09-06 07:59:42
Message-ID: 44FE7FEE.5070003@savagexi.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Thanks for the clarification Teodor - it makes much more sense now.

I'd agree the error message is confusing, particularly since this works:

set enable_seqscan to on;
set enable_indexscan to off;
set enable_bitmapscan to off;

select *
from test.features
where to_tsquery('') @@ features.vector;

I see why it would now. But it would be nice if the message mentioned
the root cause of the problem - that the tsquery value does not contain
any lexemes and thus is not valid for doing a search.

Thanks,

Charlie

Teodor Sigaev wrote:
>> explain analyze
>> select *
>> from test.features
>> where to_tsquery('') @@ features.vector
>>
>> ERROR: Gin doesn't support full scan due to it's awful inefficiency
>
> Look:
> contrib_regression=# select '{1,2,3}'::int4[] @ '{}'; --contains
> ?column?
> ----------
> t
> (1 row)
>
> contrib_regression=# select '{1,2,3}'::int4[] && '{}'; --overlap
> ?column?
> ----------
> f
> (1 row)
>
> contrib_regression=# select to_tsvector('asdasd') @@ ''::tsquery;
> NOTICE: query doesn't contain lexeme(s)
> ?column?
> ----------
> f
> (1 row)
>
> Semantic of different operation with void (but not NULL) argument is
> very different. If query doesn't contain any entry (returned by
> extractQuery() index support method), then GIN, in any case, doesn't
> know what it should return: whole set of pointers or nothing. But GIN
> can't return all - it will be very-very slow, because there is a lot of
> pointers in GIN index to each table's row.
>
> It seems to me that message makes confuse about reason of error...
>
>
>
>>
>> Interestingly this works:
>>
>> explain analyze
>> select *
>> from test.features
>> where NULL @@ features.vector
>
> That is because @@ is marked as 'returns NULL on NULL input', ie index
> will not be used.
>


From: Teodor Sigaev <teodor(at)sigaev(dot)ru>
To: Charlie Savage <cfis(at)savagexi(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-general(at)postgresql(dot)org
Subject: Re: ERROR: Gin doesn't support full scan due to it's awful
Date: 2006-09-06 08:11:00
Message-ID: 44FE8294.9060301@sigaev.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

> I see why it would now. But it would be nice if the message mentioned
> the root cause of the problem - that the tsquery value does not contain
> any lexemes and thus is not valid for doing a search.

... doing a index search :(. GIN core doesn't know anything about
tsvector/tsquery/tsearch - so, error message should mention something like this:
GIN index doesn't support search with void argument.

--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/