Re: [PERFORM] GIST versus GIN indexes for intarrays

Lists: pgsql-hackerspgsql-performance
From: Rusty Conover <rconover(at)infogears(dot)com>
To: psql performance <pgsql-performance(at)postgresql(dot)org>
Subject: GIST versus GIN indexes for intarrays
Date: 2009-02-12 20:09:14
Message-ID: BD68A118-B850-4659-AFBB-9C154F99F3DA@infogears.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

Hi Guys,

I'm a bit confused when the proper way to use GIST versus GIN indexes
with integer arrays.

The documentation states:

http://www.postgresql.org/docs/current/static/intarray.html

The choice between GiST and GIN indexing depends on the relative
performance characteristics of GiST and GIN, which are discussed
elsewhere. As a rule of thumb, a GIN index is faster to search than a
GiST index, but slower to build or update; so GIN is better suited for
static data and GiST for often-updated data.

Since 100% of my queries are for retrieval, I should use GIN but it
never appears to be used unlike how GIST indexes are:

gearbuyer_ig=# select version();
version
----------------------------------------------------------------------------------------------------
PostgreSQL 8.3.6 on i686-pc-linux-gnu, compiled by GCC gcc (GCC)
4.1.2 20070925 (Red Hat 4.1.2-33)
(1 row)

With just a GIN index I get this plan (no use of GIN):

gearbuyer_ig=# explain select count(*) from items where
items.fast_colors @> ARRAY[0];
QUERY PLAN
-----------------------------------------------------------------
Aggregate (cost=21194.27..21194.28 rows=1 width=0)
-> Seq Scan on items (cost=0.00..21193.64 rows=251 width=0)
Filter: (fast_colors @> '{0}'::integer[])
(3 rows)

With a GIST index created like:

gearbuyer_ig=# CREATE INDEX items_fast_colors_rdtree2_idx ON items
USING gist (fast_colors gist__int_ops);

gearbuyer_ig=# explain select count(*) from items where
items.fast_colors @> ARRAY[0];
QUERY PLAN
-----------------------------------------------------------------------------------------------------
Aggregate (cost=929.81..929.82 rows=1 width=0)
-> Bitmap Heap Scan on items (cost=14.30..929.18 rows=251 width=0)
Recheck Cond: (fast_colors @> '{0}'::integer[])
-> Bitmap Index Scan on items_fast_colors_rdtree2_idx
(cost=0.00..14.24 rows=251 width=0)
Index Cond: (fast_colors @> '{0}'::integer[])
(5 rows)

Any insight is greatly appreciated. Could this be a regression from
8.3.5 and 8.3.6?

Thanks,

Rusty
--
Rusty Conover
rconover(at)infogears(dot)com
InfoGears Inc / GearBuyer.com / FootwearBuyer.com
http://www.infogears.com
http://www.gearbuyer.com
http://www.footwearbuyer.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Rusty Conover <rconover(at)infogears(dot)com>
Cc: psql performance <pgsql-performance(at)postgresql(dot)org>
Subject: Re: GIST versus GIN indexes for intarrays
Date: 2009-02-12 20:54:16
Message-ID: 16433.1234472056@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

Rusty Conover <rconover(at)infogears(dot)com> writes:
> Since 100% of my queries are for retrieval, I should use GIN but it
> never appears to be used unlike how GIST indexes are:

You haven't shown us either the table or the index declaration,
so it's a bit tough to comment on that. It's worth noting though
that your GIST example appears to rely on a nonstandard operator class.

regards, tom lane


From: Rusty Conover <rconover(at)infogears(dot)com>
To: psql performance <pgsql-performance(at)postgresql(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: GIST versus GIN indexes for intarrays
Date: 2009-02-12 21:05:02
Message-ID: 1363CC06-551F-46E7-A662-8DABB42A8910@infogears.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance


On Feb 12, 2009, at 1:54 PM, Tom Lane wrote:

> Rusty Conover <rconover(at)infogears(dot)com> writes:
>> Since 100% of my queries are for retrieval, I should use GIN but it
>> never appears to be used unlike how GIST indexes are:
>
> You haven't shown us either the table or the index declaration,
> so it's a bit tough to comment on that. It's worth noting though
> that your GIST example appears to rely on a nonstandard operator
> class.
>
> regards, tom lane
>

Hi Tom,

My apologies, below is the table definition, and the GIN index creation.

The gist__int_ops is the default operator class for integer[] arrays,
as shown at:

http://www.postgresql.org/docs/current/static/intarray.html

gearbuyer_ig=# \d items
Table "public.items"
Column | Type | Modifiers
-------------------------+-----------
+---------------------------------------------------
item_id | integer | not null default
nextval('generic_seq'::regclass)
gb_product_url | text | not null
group_id | integer |
category_id | integer |
product_name | text | not null
gender | text | not null
description_extract | text | not null
sort_price | real | not null
price_range | text | not null
brand_id | integer | not null
xapian_doc_id | integer |
average_rating | uint1 |
reviews_count | smallint |
store_count | uint1 |
default_image_id | integer |
available_sizes | integer[] |
fast_colors | integer[] |
has_coupons | boolean | not null default false
age_low | uint1 |
sale_percentage_low | uint1 |
store_count_low | uint1 |
price_range_low | smallint |
offering_stores | integer[] |
subclassification_ids | integer[] |
popularity_rank | integer |
default_similarity_type | uint1 |
default_similarity_id | integer |
gc_lookup_id | integer |

The GIN index was created via:

CREATE INDEX items_fast_colors_rdtree_idx ON items USING gin
(fast_colors);

Cheers,

Rusty
--
Rusty Conover
rconover(at)infogears(dot)com
InfoGears Inc / GearBuyer.com / FootwearBuyer.com
http://www.infogears.com
http://www.gearbuyer.com
http://www.footwearbuyer.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Rusty Conover <rconover(at)infogears(dot)com>
Cc: psql performance <pgsql-performance(at)postgresql(dot)org>, pgsql-hackers(at)postgresql(dot)org, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Teodor Sigaev <teodor(at)sigaev(dot)ru>
Subject: Re: [PERFORM] GIST versus GIN indexes for intarrays
Date: 2009-02-12 21:29:38
Message-ID: 17021.1234474178@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

Rusty Conover <rconover(at)infogears(dot)com> writes:
> The gist__int_ops is the default operator class for integer[] arrays,
> as shown at:
> http://www.postgresql.org/docs/current/static/intarray.html

Ah, so you have contrib/intarray installed.

[ pokes at it... ] Seems like what we have here is another iteration
of this ancient bug:
http://archives.postgresql.org/pgsql-committers/2004-01/msg00073.php
to wit, contrib/intarray is defining its own @> and <@ operators that
conflict with those since added to the core. In the case Rusty is
showing, the @> gets resolved as intarray's @> (because that's an
exact match, where the core provides anyarray @> anyarray) and then
this operator is NOT a member of the core-provided GIN opclass for
integer arrays.

The short-term workaround for Rusty is probably to create his GIN
index using the intarray-provided gin__int_ops opclass. But it
seems to me that we ought to get rid of intarray's @> and <@ operators
and have the module depend on the core anyarray operators, just as we
have already done for = and <>. Comments?

regards, tom lane


From: Rusty Conover <rconover(at)infogears(dot)com>
To: psql performance <pgsql-performance(at)postgresql(dot)org>
Subject: Re: GIST versus GIN indexes for intarrays
Date: 2009-02-13 01:32:43
Message-ID: 4EB92524-5225-44EC-8ABB-CD8CF162708A@infogears.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance


On Feb 12, 2009, at 2:29 PM, Tom Lane wrote:

> Rusty Conover <rconover(at)infogears(dot)com> writes:
>> The gist__int_ops is the default operator class for integer[] arrays,
>> as shown at:
>> http://www.postgresql.org/docs/current/static/intarray.html
>
> Ah, so you have contrib/intarray installed.
>
> [ pokes at it... ] Seems like what we have here is another iteration
> of this ancient bug:
> http://archives.postgresql.org/pgsql-committers/2004-01/msg00073.php
> to wit, contrib/intarray is defining its own @> and <@ operators that
> conflict with those since added to the core. In the case Rusty is
> showing, the @> gets resolved as intarray's @> (because that's an
> exact match, where the core provides anyarray @> anyarray) and then
> this operator is NOT a member of the core-provided GIN opclass for
> integer arrays.
>
> The short-term workaround for Rusty is probably to create his GIN
> index using the intarray-provided gin__int_ops opclass. But it
> seems to me that we ought to get rid of intarray's @> and <@ operators
> and have the module depend on the core anyarray operators, just as we
> have already done for = and <>. Comments?

Hi Tom,

For the record using the GIN opclass does resolve the problem for me.
The indexes are now seeing usage.

Thanks for the help,

Rusty
--
Rusty Conover
rconover(at)infogears(dot)com
InfoGears Inc / GearBuyer.com / FootwearBuyer.com
http://www.infogears.com
http://www.gearbuyer.com
http://www.footwearbuyer.com


From: Teodor Sigaev <teodor(at)sigaev(dot)ru>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Rusty Conover <rconover(at)infogears(dot)com>, psql performance <pgsql-performance(at)postgresql(dot)org>, pgsql-hackers(at)postgresql(dot)org, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
Subject: Re: [PERFORM] GIST versus GIN indexes for intarrays
Date: 2009-02-13 13:12:53
Message-ID: 499571D5.6000906@sigaev.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

> The short-term workaround for Rusty is probably to create his GIN
> index using the intarray-provided gin__int_ops opclass. But it
Right
> seems to me that we ought to get rid of intarray's @> and <@ operators
> and have the module depend on the core anyarray operators, just as we
> have already done for = and <>. Comments?
Agree, will do. Although built-in anyarray operators have ~N^2 behaviour while
intarray's version - only N*log(N)
--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/


From: Kenneth Marshall <ktm(at)rice(dot)edu>
To: Teodor Sigaev <teodor(at)sigaev(dot)ru>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Rusty Conover <rconover(at)infogears(dot)com>, psql performance <pgsql-performance(at)postgresql(dot)org>, pgsql-hackers(at)postgresql(dot)org, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
Subject: Re: [PERFORM] GIST versus GIN indexes for intarrays
Date: 2009-02-13 14:04:58
Message-ID: 20090213140458.GA4134@it.is.rice.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

On Fri, Feb 13, 2009 at 04:12:53PM +0300, Teodor Sigaev wrote:
>> The short-term workaround for Rusty is probably to create his GIN
>> index using the intarray-provided gin__int_ops opclass. But it
> Right
>> seems to me that we ought to get rid of intarray's @> and <@ operators
>> and have the module depend on the core anyarray operators, just as we
>> have already done for = and <>. Comments?
> Agree, will do. Although built-in anyarray operators have ~N^2 behaviour
> while intarray's version - only N*log(N)
Is there a way to have the buily-in anyarray opeators be N*log(N)?

Ken


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Teodor Sigaev <teodor(at)sigaev(dot)ru>
Cc: Rusty Conover <rconover(at)infogears(dot)com>, psql performance <pgsql-performance(at)postgresql(dot)org>, pgsql-hackers(at)postgresql(dot)org, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
Subject: Re: [PERFORM] GIST versus GIN indexes for intarrays
Date: 2009-02-13 17:20:58
Message-ID: 4854.1234545658@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

Teodor Sigaev <teodor(at)sigaev(dot)ru> writes:
>> seems to me that we ought to get rid of intarray's @> and <@ operators
>> and have the module depend on the core anyarray operators, just as we
>> have already done for = and <>. Comments?

> Agree, will do. Although built-in anyarray operators have ~N^2 behaviour while
> intarray's version - only N*log(N)

Really? isort() looks like a bubble sort to me.

But in any case, a pre-sort is probably actually *slower* for small
numbers of array elements. I wonder where the crossover is. In
principle we could make the core implementation do a sort when working
with a sortable datatype, but I'm unsure it's worth the trouble.

regards, tom lane