GIN vs. Partial Indexes

Lists: pgsql-hackers
From: Josh Berkus <josh(at)agliodbs(dot)com>
To: postgres hackers <pgsql-hackers(at)postgresql(dot)org>
Cc: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
Subject: GIN vs. Partial Indexes
Date: 2010-10-08 00:49:22
Message-ID: 4CAE6A92.4060703@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

All,

I thought we fixed this in 8.4.4, but apparently not. In the event that
you have a GIN index containing a WHERE clause which is sufficiently
restrictive, PostgreSQL will attempt to use the index even though it
can't. Since this is completely out of the control of the user, it
effectively prohibits using partial GIN indexes:

Setup:

postgres=# select version();
version

--------------------------------------------------------------------------------------------------------------------------------------
PostgreSQL 9.0.0 on i386-apple-darwin9.8.0, compiled by GCC
i686-apple-darwin9-gcc-4.0.1 (GCC) 4.0.1 (Apple Inc. build 5465), 32-bit

postgres=# create table gin_test ( id serial not null primary key, type
CREATE TABLE

DO $$
DECLARE i INT := 1;
r INT;
qts tsvector;
del BOOLEAN;
BEGIN

qts := to_tsvector('The most anticipated PostgreSQL version in five
years has been released. With built-in binary replication and over a
dozen new major features, PostgreSQL 9.0 has compelling reasons to
upgrade or migrate for every database user and developer.');

WHILE i < 1000 LOOP
r := ( random() * 20 )::INT;
INSERT INTO gin_test ( "type", deleted, some_ts )
VALUES ( r,
( r % 2 ) = 0,
qts );
i := i + 1;
END LOOP;

END;$$;

create index gin_test_type ON gin_test("type");
create index gin_test_text ON gin_test USING GIN ( some_ts)
WHERE deleted = FALSE AND "type" = 1;

postgres=# SELECT COUNT(*) from gin_test WHERE deleted = FALSE and
"type" = 1;
ERROR: GIN indexes do not support whole-index scans

postgres-# EXPLAIN SELECT COUNT(*) from gin_test WHERE deleted = FALSE
and "type" = 1;
QUERY PLAN

------------------------------------------------------------------------------------
Aggregate (cost=54.01..54.02 rows=1 width=0)
-> Bitmap Heap Scan on gin_test (cost=12.38..53.95 rows=24 width=0)
Recheck Cond: ((NOT deleted) AND (type = 1))
-> Bitmap Index Scan on gin_test_text (cost=0.00..12.37
rows=24 width=0)
(4 rows)

I find the above error interesting, because: (a) I didn't actually
select the some_ts column, and (b) I can do an actual TS search which
hits the whole index with no problem:

postgres=# SELECT COUNT(*) from gin_test WHERE some_ts @@
to_tsquery('replication');
count
-------
999

Note that if I add the perfect index for that query, it works:

postgres=# create index gin_test_type_undeleted on gin_test("type")
where not deleted;
CREATE INDEX

postgres=# SELECT COUNT(*) from gin_test WHERE deleted = FALSE and
"type" = 1;
count
-------
46
(1 row)

postgres=# EXPLAIN

SELECT COUNT(*) from gin_test
WHERE deleted = FALSE and "type" = 1;
QUERY PLAN

---------------------------------------------------------------------------------------------
Aggregate (cost=46.23..46.24 rows=1 width=0)
-> Bitmap Heap Scan on gin_test (cost=4.60..46.17 rows=24 width=0)
Recheck Cond: ((type = 1) AND (NOT deleted))
-> Bitmap Index Scan on gin_test_type_undeleted
(cost=0.00..4.60 rows=24 width=0)
Index Cond: (type = 1)
(5 rows)

Clearly the answer here seems to be that our planner should not pick GIN
indexes for any query in which the indexed column is not referenced. Is
that practical to implement?

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: postgres hackers <pgsql-hackers(at)postgresql(dot)org>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
Subject: Re: GIN vs. Partial Indexes
Date: 2010-10-08 02:52:08
Message-ID: 11352.1286506328@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Josh Berkus <josh(at)agliodbs(dot)com> writes:
> I thought we fixed this in 8.4.4, but apparently not. In the event that
> you have a GIN index containing a WHERE clause which is sufficiently
> restrictive, PostgreSQL will attempt to use the index even though it
> can't.

We could probably kluge the planner to avoid that case, but in view
of the other issues explained here:
http://developer.postgresql.org/pgdocs/postgres/gin-limit.html
I'm not sure it's worth the trouble. There's nothing the planner can do
to guard against the equivalent issue of non-restrictive queries, ie
there is a WHERE clause but it's something like "array-column contains
empty-array". The fact that the comparison value is empty might not be
known until runtime.

IMO, what's needed is to fix GIN so it doesn't go insane for empty
values or non-restrictive queries, by ensuring there's at least one
index entry for every row. This has been discussed before; see the TODO
section for GIN.

regards, tom lane


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: GIN vs. Partial Indexes
Date: 2010-10-08 16:27:46
Message-ID: 4CAF4682.1080808@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


> IMO, what's needed is to fix GIN so it doesn't go insane for empty
> values or non-restrictive queries, by ensuring there's at least one
> index entry for every row. This has been discussed before; see the TODO
> section for GIN.

Well, what is that fix waiting on, then? Oleg, Teodor? We may even
have a modest budget for a patch, so if funding would help, I'll ask.

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, postgres hackers <pgsql-hackers(at)postgresql(dot)org>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
Subject: Re: GIN vs. Partial Indexes
Date: 2010-10-08 17:37:55
Message-ID: AANLkTi=tYR5z+DJkTk3Ffom9vmMN95QajyOmGFQraMVb@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Oct 7, 2010 at 10:52 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Josh Berkus <josh(at)agliodbs(dot)com> writes:
>> I thought we fixed this in 8.4.4, but apparently not.  In the event that
>> you have a GIN index containing a WHERE clause which is sufficiently
>> restrictive, PostgreSQL will attempt to use the index even though it
>> can't.
>
> We could probably kluge the planner to avoid that case, but in view
> of the other issues explained here:
> http://developer.postgresql.org/pgdocs/postgres/gin-limit.html
> I'm not sure it's worth the trouble.  There's nothing the planner can do
> to guard against the equivalent issue of non-restrictive queries, ie
> there is a WHERE clause but it's something like "array-column contains
> empty-array".  The fact that the comparison value is empty might not be
> known until runtime.
>
> IMO, what's needed is to fix GIN so it doesn't go insane for empty
> values or non-restrictive queries, by ensuring there's at least one
> index entry for every row.  This has been discussed before; see the TODO
> section for GIN.

That seems like it could waste an awful lot of disk space (and
therefore I/O, etc.). No?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, postgres hackers <pgsql-hackers(at)postgresql(dot)org>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
Subject: Re: GIN vs. Partial Indexes
Date: 2010-10-08 17:47:36
Message-ID: 13837.1286560056@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Thu, Oct 7, 2010 at 10:52 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> IMO, what's needed is to fix GIN so it doesn't go insane for empty
>> values or non-restrictive queries, by ensuring there's at least one
>> index entry for every row. This has been discussed before; see the TODO
>> section for GIN.

> That seems like it could waste an awful lot of disk space (and
> therefore I/O, etc.). No?

How so? In a typical application, there would not likely be very many
such rows --- we're talking about cases like documents containing zero
indexable words. In any case, the problem right now is that GIN has
significant functional limitations because it fails to make any index
entry at all for such rows. Even if there are in fact no such rows
in a particular table, it has to fail on some queries because there
*might* be such rows. There is no way to fix those limitations
unless it undertakes to have some index entry for every row. That
will take disk space, but it's *necessary*. (To adapt the old saw,
I can make this index arbitrarily small if it doesn't have to give
the right answers.)

In any case, I would expect that GIN could actually do this quite
efficiently. What we'd probably want is a concept of a "null word",
with empty indexable rows entered in the index as if they contained the
null word. So there'd be just one index entry with a posting list of
however many such rows there are.

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, postgres hackers <pgsql-hackers(at)postgresql(dot)org>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
Subject: Re: GIN vs. Partial Indexes
Date: 2010-10-08 21:44:50
Message-ID: AANLkTi=kQK3UFrQ5G82uMpdyy0bnS9KWSKAaovvyJNaZ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Oct 8, 2010 at 1:47 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Thu, Oct 7, 2010 at 10:52 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> IMO, what's needed is to fix GIN so it doesn't go insane for empty
>>> values or non-restrictive queries, by ensuring there's at least one
>>> index entry for every row.  This has been discussed before; see the TODO
>>> section for GIN.
>
>> That seems like it could waste an awful lot of disk space (and
>> therefore I/O, etc.).  No?
>
> How so?  In a typical application, there would not likely be very many
> such rows --- we're talking about cases like documents containing zero
> indexable words.  In any case, the problem right now is that GIN has
> significant functional limitations because it fails to make any index
> entry at all for such rows.  Even if there are in fact no such rows
> in a particular table, it has to fail on some queries because there
> *might* be such rows.  There is no way to fix those limitations
> unless it undertakes to have some index entry for every row.  That
> will take disk space, but it's *necessary*.  (To adapt the old saw,
> I can make this index arbitrarily small if it doesn't have to give
> the right answers.)
>
> In any case, I would expect that GIN could actually do this quite
> efficiently.  What we'd probably want is a concept of a "null word",
> with empty indexable rows entered in the index as if they contained the
> null word.  So there'd be just one index entry with a posting list of
> however many such rows there are.

<thinks about it more>

Yeah, I think you're right.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, postgres hackers <pgsql-hackers(at)postgresql(dot)org>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
Subject: Re: GIN vs. Partial Indexes
Date: 2010-10-09 02:23:15
Message-ID: 417D10D7-0AA8-4B0C-94E2-4A2DE863051E@kineticode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Oct 8, 2010, at 1:47 PM, Tom Lane wrote:

> How so? In a typical application, there would not likely be very many
> such rows --- we're talking about cases like documents containing zero
> indexable words. In any case, the problem right now is that GIN has
> significant functional limitations because it fails to make any index
> entry at all for such rows. Even if there are in fact no such rows
> in a particular table, it has to fail on some queries because there
> *might* be such rows. There is no way to fix those limitations
> unless it undertakes to have some index entry for every row. That
> will take disk space, but it's *necessary*. (To adapt the old saw,
> I can make this index arbitrarily small if it doesn't have to give
> the right answers.)

And could you not keep it the same with a partial index?

Best,

David


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, postgres hackers <pgsql-hackers(at)postgresql(dot)org>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
Subject: Re: GIN vs. Partial Indexes
Date: 2010-10-23 19:01:21
Message-ID: 4CC33101.1000205@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 10/08/2010 02:44 PM, Robert Haas wrote:
>> In any case, I would expect that GIN could actually do this quite
>> efficiently. What we'd probably want is a concept of a "null word",
>> with empty indexable rows entered in the index as if they contained the
>> null word. So there'd be just one index entry with a posting list of
>> however many such rows there are.

So, given the lack of objections to this idea, do we have a plan for
fixing GIN?

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, postgres hackers <pgsql-hackers(at)postgresql(dot)org>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
Subject: Re: GIN vs. Partial Indexes
Date: 2010-11-12 18:11:06
Message-ID: 201011121811.oACIB6Q15546@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Josh Berkus wrote:
> On 10/08/2010 02:44 PM, Robert Haas wrote:
> >> In any case, I would expect that GIN could actually do this quite
> >> efficiently. What we'd probably want is a concept of a "null word",
> >> with empty indexable rows entered in the index as if they contained the
> >> null word. So there'd be just one index entry with a posting list of
> >> however many such rows there are.
>
> So, given the lack of objections to this idea, do we have a plan for
> fixing GIN?

Is this a TODO?

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, postgres hackers <pgsql-hackers(at)postgresql(dot)org>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
Subject: Re: GIN vs. Partial Indexes
Date: 2010-11-12 18:14:51
Message-ID: 4CDD841B.5090202@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 11/12/2010 01:11 PM, Bruce Momjian wrote:
> Josh Berkus wrote:
>> On 10/08/2010 02:44 PM, Robert Haas wrote:
>>>> In any case, I would expect that GIN could actually do this quite
>>>> efficiently. What we'd probably want is a concept of a "null word",
>>>> with empty indexable rows entered in the index as if they contained the
>>>> null word. So there'd be just one index entry with a posting list of
>>>> however many such rows there are.
>> So, given the lack of objections to this idea, do we have a plan for
>> fixing GIN?
> Is this a TODO?

Yes.

cheers

andrew


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, postgres hackers <pgsql-hackers(at)postgresql(dot)org>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
Subject: Re: GIN vs. Partial Indexes
Date: 2010-11-12 18:16:02
Message-ID: 201011121816.oACIG2a17174@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Dunstan wrote:
>
>
> On 11/12/2010 01:11 PM, Bruce Momjian wrote:
> > Josh Berkus wrote:
> >> On 10/08/2010 02:44 PM, Robert Haas wrote:
> >>>> In any case, I would expect that GIN could actually do this quite
> >>>> efficiently. What we'd probably want is a concept of a "null word",
> >>>> with empty indexable rows entered in the index as if they contained the
> >>>> null word. So there'd be just one index entry with a posting list of
> >>>> however many such rows there are.
> >> So, given the lack of objections to this idea, do we have a plan for
> >> fixing GIN?
> > Is this a TODO?
>
> Yes.

OK, can you add it or give me wording, or it is already on the TODO
list?

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Josh Berkus <josh(at)agliodbs(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, postgres hackers <pgsql-hackers(at)postgresql(dot)org>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
Subject: Re: GIN vs. Partial Indexes
Date: 2010-11-12 18:31:44
Message-ID: 13839.1289586704@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Bruce Momjian <bruce(at)momjian(dot)us> writes:
> OK, can you add it or give me wording, or it is already on the TODO
> list?

It's already there.

regards, tom lane