contrib/intarray vs empty arrays

Lists: pgsql-hackers
From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Teodor Sigaev <teodor(at)sigaev(dot)ru>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: contrib/intarray vs empty arrays
Date: 2009-04-05 01:11:16
Message-ID: 7706.1238893876@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I just got rid of the contrib/intarray duplicates of <@ and @>,
as we discussed here:
http://archives.postgresql.org/message-id/17021.1234474178@sss.pgh.pa.us

While I was testing this I realized that I wasn't getting quite the same
answers :-(. In particular, it seems that the core operators consider
an empty array to be contained in anything else, while intarray will
only return true for two nonempty arrays.

I would just go change that, except that the *real* problem is it means
GIN index searches behave differently from the operators themselves.
Using intarray's regression database,

contrib_regression=# select * from test__int where a <@ array[1,2];
a
-------
{1,1}
(1 row)

contrib_regression=# drop index text_idx;
DROP INDEX
contrib_regression=# select * from test__int where a <@ array[1,2];
a
-------
{}
{}
{}
{}
{}
{}
{}
{}
{}
{1,1}
(10 rows)

From what I understand of GIN's internal workings, this is unfixable
because there is nothing to make an index entry on when looking at
an empty array. Unless you know of a trick to get around that,
we've got a problem here.

[ pokes around ... ] Oh dear, it doesn't work with intarray's
GIST opclasses either.

Do we have to document these operators as being not quite real
containment?

regards, tom lane


From: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
To: tgl(at)sss(dot)pgh(dot)pa(dot)us (Tom Lane), pgsql-hackers(at)postgresql(dot)org
Subject: Re: contrib/intarray vs empty arrays
Date: 2009-04-05 04:29:53
Message-ID: 87fxgnvi2m.fsf@news-spur.riddles.org.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

>>>>> "Tom" == Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

[empty arrays and containment ops]

Tom> From what I understand of GIN's internal workings, this is
Tom> unfixable because there is nothing to make an index entry on
Tom> when looking at an empty array. Unless you know of a trick to
Tom> get around that, we've got a problem here.

Umm. In theory, could the extract function return some distinct
value when applied to an empty array, and the extract_query function
include the same value when extracting a query?

It's not very clean, since it means that the element type for an
index of sometype[] is no longer sometype, but it would allow the
semantics to be kept consistent.

[whether it is worth the pain of doing this is a separate question]

--
Andrew (irc:RhodiumToad)


From: Teodor Sigaev <teodor(at)sigaev(dot)ru>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: contrib/intarray vs empty arrays
Date: 2009-04-09 13:19:47
Message-ID: 49DDF5F3.8030700@sigaev.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> While I was testing this I realized that I wasn't getting quite the same
> answers :-(. In particular, it seems that the core operators consider
> an empty array to be contained in anything else, while intarray will
> only return true for two nonempty arrays.

Urgh. We (with Oleg) digged a bit around that.

8.1 and earlier versions (contrib/intarray) assume that empty array neither
contained in nor overlap with any array.
_int_contains(PG_FUNCTION_ARGS)
{
....
if (ARRISVOID(a) || ARRISVOID(b))
return FALSE;

_int_overlap(PG_FUNCTION_ARGS)
{
....
if (ARRISVOID(a) || ARRISVOID(b))
return FALSE;

However, that versions have an *implementation* bug: ARRISVOID returns false for
empty array :(. 8.2 and later versions have built-in assumption that empty array
is contained in any array, although the bug in contrib/intarray was fixed while
defending intarray from nulls in array:
http://archives.postgresql.org/pgsql-committers/2005-11/msg00420.php
But I don't remember any complaints about changed behaviour of contains operator.

On other hand, geometrical containment operator guarantees that arguments will
overlaps, whereas this is not true for arrays - empty array is contained in any
array but doesn't overlap.

Since results of operations with empty array are not documented and we didn't
find definition of "contains" and "overlap" in the documentation, we suggest
following strict definitions:
contains - all elements of second array are contained in the first one.
Empty array has no element, so it can't be contained.
overlap - there is at least one common element in both arrays. Empty array has
no element, so it can't overlap.

Then we need to fix:
- ARRISVOID in contrib/intarray in 8.1 and later
- contains/contained in 8.2 and later

> From what I understand of GIN's internal workings, this is unfixable
Yes :(

> [ pokes around ... ] Oh dear, it doesn't work with intarray's
> GIST opclasses either.
That's easy fixable

--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/


From: Greg Stark <stark(at)enterprisedb(dot)com>
To: Teodor Sigaev <teodor(at)sigaev(dot)ru>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: contrib/intarray vs empty arrays
Date: 2009-04-09 13:29:33
Message-ID: 4136ffa0904090629o1aa3f5daw435b7ff58df9fea9@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2009/4/9 Teodor Sigaev <teodor(at)sigaev(dot)ru>:
> contains - all elements of second array are contained in the first one.
> Empty array has no element, so it can't be contained.

That sounds wrong. A <contains> B should surely always be true if B is
empty. ie "for all x, x in B implies x in A". Or put another way,
"contains" just means "is a superset of" and all sets are supersets of
the empty set (even the empty set).

--
greg