BUG #5784: CREATE INDEX USING GIN complains about array containing null values yet none exist

Lists: pgsql-bugs
From: "Martin Atukunda" <matlads(at)gmail(dot)com>
To: pgsql-bugs(at)postgresql(dot)org
Subject: BUG #5784: CREATE INDEX USING GIN complains about array containing null values yet none exist
Date: 2010-12-05 06:50:51
Message-ID: 201012050650.oB56opvM090787@wwwmaster.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs


The following bug has been logged online:

Bug reference: 5784
Logged by: Martin Atukunda
Email address: matlads(at)gmail(dot)com
PostgreSQL version: 8.4
Operating system: Linux
Description: CREATE INDEX USING GIN complains about array containing
null values yet none exist
Details:

CREATE INDEX USING GIN complains about array containing null values yet none
exist.

Take the following SQL as an example.

-- >8 --

BEGIN;
CREATE TABLE t (id serial primary key, apps bigint[]);
INSERT INTO t (apps) VALUES ('{1,2,3,4}');
INSERT INTO t (apps) VALUES ('{1,2,3,4}');

-- insert an array with one element as a null value
INSERT INTO t (apps) VALUES ('{1,2,3,NULL}');

-- remove the null value
UPDATE t SET apps[4] = -1 WHERE apps[4] IS NULL;

-- the create index fails
CREATE INDEX t_apps_idx ON t USING GIN(apps);

COMMIT;

-- 8< --

note that index creatiion succeeds if I first create the index without
inserting a row containing an array with a null element.

- Martin -


From: Andres Freund <andres(at)anarazel(dot)de>
To: pgsql-bugs(at)postgresql(dot)org
Cc: "Martin Atukunda" <matlads(at)gmail(dot)com>
Subject: Re: BUG #5784: CREATE INDEX USING GIN complains about array containing null values yet none exist
Date: 2010-12-05 11:39:23
Message-ID: 201012051239.23510.andres@anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Hi,
On Sunday 05 December 2010 07:50:51 Martin Atukunda wrote:
>
> CREATE INDEX USING GIN complains about array containing null values yet
> none exist.
>
> Take the following SQL as an example.
>
> -- >8 --
>
> BEGIN;
> CREATE TABLE t (id serial primary key, apps bigint[]);
> INSERT INTO t (apps) VALUES ('{1,2,3,4}');
> INSERT INTO t (apps) VALUES ('{1,2,3,4}');
>
> -- insert an array with one element as a null value
> INSERT INTO t (apps) VALUES ('{1,2,3,NULL}');
>
> -- remove the null value
> UPDATE t SET apps[4] = -1 WHERE apps[4] IS NULL;
>
> -- the create index fails
> CREATE INDEX t_apps_idx ON t USING GIN(apps);
>
> COMMIT;
>
> -- 8< --
>
>
> note that index creatiion succeeds if I first create the index without
> inserting a row containing an array with a null element.
Due to the wonders of MVCC the old row is still available in the heap. Best
read the docs about what MVCC means. And as pg's indexes don't care about
visibility it will still try to index the "old" row.

Andres


From: Martin Atukunda <matlads(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #5784: CREATE INDEX USING GIN complains about array containing null values yet none exist
Date: 2010-12-05 12:07:23
Message-ID: AANLkTikSHA0CvwZsQLNFUXCBVELEmXG_bPCqTnHqwXEN@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

> Due to the wonders of MVCC the old row is still available in the heap. Best
> read the docs about what MVCC means. And as pg's indexes don't care about
> visibility it will still try to index the "old" row.
>
>
Thanks andreas,

so, basically, the only way out of this would be to:

1. copy out all the rows
2. truncate the Tables
3. then create the index
4. copy in the rows

- Martin -


From: Andres Freund <andres(at)anarazel(dot)de>
To: Martin Atukunda <matlads(at)gmail(dot)com>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #5784: CREATE INDEX USING GIN complains about array containing null values yet none exist
Date: 2010-12-05 12:29:35
Message-ID: 201012051329.35665.andres@anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

On Sunday 05 December 2010 13:07:23 Martin Atukunda wrote:
> > Due to the wonders of MVCC the old row is still available in the heap.
> > Best read the docs about what MVCC means. And as pg's indexes don't care
> > about visibility it will still try to index the "old" row.
>
> Thanks andreas,
>
> so, basically, the only way out of this would be to:
>
> 1. copy out all the rows
> 2. truncate the Tables
> 3. then create the index
> 4. copy in the rows
Something like:

ALTER TABLE t ALTER apps TYPE text[];
ALTER TABLE t ALTER apps TYPE bigint[] USING apps::bigint[];

could work.

Andres


From: Andres Freund <andres(at)anarazel(dot)de>
To: pgsql-bugs(at)postgresql(dot)org
Cc: Martin Atukunda <matlads(at)gmail(dot)com>
Subject: Re: BUG #5784: CREATE INDEX USING GIN complains about array containing null values yet none exist
Date: 2010-12-05 12:58:22
Message-ID: 201012051358.22813.andres@anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

On Sunday 05 December 2010 13:29:35 Andres Freund wrote:
> On Sunday 05 December 2010 13:07:23 Martin Atukunda wrote:
> > > Due to the wonders of MVCC the old row is still available in the heap.
> > > Best read the docs about what MVCC means. And as pg's indexes don't
> > > care about visibility it will still try to index the "old" row.
> >
> > Thanks andreas,
> >
> > so, basically, the only way out of this would be to:
> >
> > 1. copy out all the rows
> > 2. truncate the Tables
> > 3. then create the index
> > 4. copy in the rows
>
> Something like:
>
> ALTER TABLE t ALTER apps TYPE text[];
> ALTER TABLE t ALTER apps TYPE bigint[] USING apps::bigint[];
On further thought the second one ought to be enough.

Andres


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-bugs(at)postgresql(dot)org, Martin Atukunda <matlads(at)gmail(dot)com>
Subject: Re: BUG #5784: CREATE INDEX USING GIN complains about array containing null values yet none exist
Date: 2010-12-05 16:42:59
Message-ID: 26122.1291567379@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Andres Freund <andres(at)anarazel(dot)de> writes:
>> On Sunday 05 December 2010 13:07:23 Martin Atukunda wrote:
>>>> so, basically, the only way out of this would be to:

I think the reason the given example fails is just that it's all being
done in one transaction. If the null-containing row were known dead
it wouldn't get indexed. So: commit.

regards, tom lane


From: Andres Freund <andres(at)anarazel(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-bugs(at)postgresql(dot)org, Martin Atukunda <matlads(at)gmail(dot)com>
Subject: Re: BUG #5784: CREATE INDEX USING GIN complains about array containing null values yet none exist
Date: 2010-12-05 17:06:20
Message-ID: 201012051806.20719.andres@anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

On Sunday 05 December 2010 17:42:59 Tom Lane wrote:
> Andres Freund <andres(at)anarazel(dot)de> writes:
> >> On Sunday 05 December 2010 13:07:23 Martin Atukunda wrote:
> >>>> so, basically, the only way out of this would be to:
> I think the reason the given example fails is just that it's all being
> done in one transaction. If the null-containing row were known dead
> it wouldn't get indexed. So: commit.
Um I doubt it.

test=# \i /tmp/test.sql
DROP TABLE IF EXISTS t;
DROP TABLE
Time: 36.070 ms
BEGIN;
BEGIN
Time: 0.122 ms
CREATE TABLE t (id serial primary key, apps bigint[]);
psql:/tmp/test.sql:3: NOTICE: 00000: CREATE TABLE will create implicit
sequence "t_id_seq" for serial column "t.id"
LOCATION: transformColumnDefinition, parse_utilcmd.c:341
psql:/tmp/test.sql:3: NOTICE: 00000: CREATE TABLE / PRIMARY KEY will create
implicit index "t_pkey" for table "t"
LOCATION: DefineIndex, indexcmds.c:438
CREATE TABLE
Time: 102.843 ms
INSERT INTO t (apps) VALUES ('{1,2,3,4}');
INSERT 0 1
Time: 0.408 ms
INSERT INTO t (apps) VALUES ('{1,2,3,4}');
INSERT 0 1
Time: 0.111 ms
INSERT INTO t (apps) VALUES ('{1,2,3,NULL}');
INSERT 0 1
Time: 0.127 ms
UPDATE t SET apps[4] = -1 WHERE apps[4] IS NULL;
UPDATE 1
Time: 0.271 ms
COMMIT;
COMMIT
Time: 0.086 ms
BEGIN;
BEGIN
Time: 0.192 ms
CREATE INDEX t_apps_idx ON t USING GIN(apps);
psql:/tmp/test.sql:18: ERROR: 22004: array must not contain null values
LOCATION: ginarrayextract, ginarrayproc.c:53
COMMIT;
ROLLBACK
Time: 0.076 ms

I remembered seeing that before....

Andres


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-bugs(at)postgresql(dot)org, Martin Atukunda <matlads(at)gmail(dot)com>
Subject: Re: BUG #5784: CREATE INDEX USING GIN complains about array containing null values yet none exist
Date: 2010-12-05 17:26:45
Message-ID: 26741.1291570005@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Andres Freund <andres(at)anarazel(dot)de> writes:
> On Sunday 05 December 2010 17:42:59 Tom Lane wrote:
>> I think the reason the given example fails is just that it's all being
>> done in one transaction. If the null-containing row were known dead
>> it wouldn't get indexed. So: commit.

> Um I doubt it.

[ gets out gdb... ] Oh: the reason GIN is complaining is that it's just
looking at ARR_HASNULL(), and the array's has-nulls flag is still set
because we don't bother to try to clear it after replacing one element
of the array. (Which in general would be an expensive thing to try to
do...)

If we were intending to leave GIN in its current nulls-hating state,
the thing to do would be to replace the stupid ARR_HASNULL check with
something more intelligent. But really it needs to be fixed to handle
nulls properly, so I'm thinking that might be a dead-end patch.

regards, tom lane


From: Martin Atukunda <matlads(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #5784: CREATE INDEX USING GIN complains about array containing null values yet none exist
Date: 2010-12-05 19:06:21
Message-ID: AANLkTimYnW6=wG+uf+UH1hEcGbde5M3DnMc2kzVkQ74b@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

On Sun, Dec 5, 2010 at 7:58 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:

> On Sunday 05 December 2010 13:29:35 Andres Freund wrote:
> > On Sunday 05 December 2010 13:07:23 Martin Atukunda wrote:
> > > > Due to the wonders of MVCC the old row is still available in the
> heap.
> > > > Best read the docs about what MVCC means. And as pg's indexes don't
> > > > care about visibility it will still try to index the "old" row.
> > >
> > > Thanks andreas,
> > >
> > > so, basically, the only way out of this would be to:
> > >
> > > 1. copy out all the rows
> > > 2. truncate the Tables
> > > 3. then create the index
> > > 4. copy in the rows
> >
> > Something like:
> >
> > ALTER TABLE t ALTER apps TYPE text[];
> > ALTER TABLE t ALTER apps TYPE bigint[] USING apps::bigint[];
> On further thought the second one ought to be enough.
>

Actually on my tests here both are required, though for the large tables -
writing them twice makes the process very long. The copy out, truncate,
create index, copy in approach seems to work best in this case.

- Martin -


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andres Freund <andres(at)anarazel(dot)de>, pgsql-bugs(at)postgresql(dot)org, Martin Atukunda <matlads(at)gmail(dot)com>
Subject: Re: BUG #5784: CREATE INDEX USING GIN complains about array containing null values yet none exist
Date: 2010-12-06 22:20:46
Message-ID: 4CFD61BE.7060301@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

On 05.12.2010 18:26, Tom Lane wrote:
> Andres Freund<andres(at)anarazel(dot)de> writes:
>> On Sunday 05 December 2010 17:42:59 Tom Lane wrote:
>>> I think the reason the given example fails is just that it's all being
>>> done in one transaction. If the null-containing row were known dead
>>> it wouldn't get indexed. So: commit.
>
>> Um I doubt it.
>
> [ gets out gdb... ] Oh: the reason GIN is complaining is that it's just
> looking at ARR_HASNULL(), and the array's has-nulls flag is still set
> because we don't bother to try to clear it after replacing one element
> of the array. (Which in general would be an expensive thing to try to
> do...)
>
> If we were intending to leave GIN in its current nulls-hating state,
> the thing to do would be to replace the stupid ARR_HASNULL check with
> something more intelligent. But really it needs to be fixed to handle
> nulls properly, so I'm thinking that might be a dead-end patch.

Sounds like we'd still want to just replace ARR_HASNULL() with something
more intelligent in back-branches though.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-bugs(at)postgresql(dot)org, Martin Atukunda <matlads(at)gmail(dot)com>
Subject: Re: BUG #5784: CREATE INDEX USING GIN complains about array containing null values yet none exist
Date: 2010-12-27 21:14:19
Message-ID: 201012272114.oBRLEJs04232@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Heikki Linnakangas wrote:
> On 05.12.2010 18:26, Tom Lane wrote:
> > Andres Freund<andres(at)anarazel(dot)de> writes:
> >> On Sunday 05 December 2010 17:42:59 Tom Lane wrote:
> >>> I think the reason the given example fails is just that it's all being
> >>> done in one transaction. If the null-containing row were known dead
> >>> it wouldn't get indexed. So: commit.
> >
> >> Um I doubt it.
> >
> > [ gets out gdb... ] Oh: the reason GIN is complaining is that it's just
> > looking at ARR_HASNULL(), and the array's has-nulls flag is still set
> > because we don't bother to try to clear it after replacing one element
> > of the array. (Which in general would be an expensive thing to try to
> > do...)
> >
> > If we were intending to leave GIN in its current nulls-hating state,
> > the thing to do would be to replace the stupid ARR_HASNULL check with
> > something more intelligent. But really it needs to be fixed to handle
> > nulls properly, so I'm thinking that might be a dead-end patch.
>
> Sounds like we'd still want to just replace ARR_HASNULL() with something
> more intelligent in back-branches though.

Added to TODO:

Improve GIN's handling of NULL array values

* http://archives.postgresql.org/pgsql-bugs/2010-12/msg00032.php

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +