Re: consistency check on SPI tuple count failed

Lists: pgsql-hackers
From: "Gaetano Mendola" <mendola(at)bigfoot(dot)com>
To: <pgsql-hackers(at)postgresql(dot)org>
Subject: consistency check on SPI tuple count failed
Date: 2003-08-08 01:34:40
Message-ID: 000a01c35d4d$3d086090$10d4a8c0@mm.eutelsat.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi all,
the following code was working properly under Postgres 7.3.X
I'm now running my regression test with Postgres 7.4beta1 and I'm
having the error in subj.

CREATE TABLE test ( a integer, b integer );

INSERT INTO test VALUES ( 1 );

CREATE OR REPLACE FUNCTION foo(INTEGER)
RETURNS INTEGER AS'
BEGIN
RETURN $1 + 1;
END;
' LANGUAGE 'plpgsql';

CREATE OR REPLACE FUNCTION bar()
RETURNS INTEGER AS'
DECLARE
my_ret RECORD;
BEGIN

FOR my_ret IN
SELECT foo(a) AS ret
FROM test
LOOP
IF my_ret.ret = 3 THEN
RETURN -1;
END IF;

END LOOP;

RETURN 0;

END;
' LANGUAGE 'plpgsql';

Regards
Gaetano Mendola


From: "Gaetano Mendola" <mendola(at)bigfoot(dot)com>
To: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: consistency check on SPI tuple count failed
Date: 2003-08-08 01:40:28
Message-ID: 001b01c35d4e$0c7e4560$10d4a8c0@mm.eutelsat.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I forgot to say to do a:

select bar()

at the end!

Gaetano


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Gaetano Mendola" <mendola(at)bigfoot(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: consistency check on SPI tuple count failed
Date: 2003-08-08 15:55:27
Message-ID: 27674.1060358127@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Gaetano Mendola" <mendola(at)bigfoot(dot)com> writes:
> the following code was working properly under Postgres 7.3.X
> I'm now running my regression test with Postgres 7.4beta1 and I'm
> having the error in subj.

I tried this and got

regression=# select bar();
bar
-----
0
(1 row)

regression=#

Anyone else see the problem?

regards, tom lane


From: Stephan Szabo <sszabo(at)megazone(dot)bigpanda(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Gaetano Mendola <mendola(at)bigfoot(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: consistency check on SPI tuple count failed
Date: 2003-08-08 16:10:57
Message-ID: 20030808090919.I71867-100000@megazone.bigpanda.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On Fri, 8 Aug 2003, Tom Lane wrote:

> "Gaetano Mendola" <mendola(at)bigfoot(dot)com> writes:
> > the following code was working properly under Postgres 7.3.X
> > I'm now running my regression test with Postgres 7.4beta1 and I'm
> > having the error in subj.
>
> I tried this and got
>
> regression=# select bar();
> bar
> -----
> 0
> (1 row)
>
> regression=#
>
> Anyone else see the problem?

I got the same thing as Gaetano on my just prior to beta1 system.


From: Rod Taylor <rbt(at)rbt(dot)ca>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Gaetano Mendola <mendola(at)bigfoot(dot)com>, PostgreSQL Development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: consistency check on SPI tuple count failed
Date: 2003-08-08 16:12:22
Message-ID: 1060359141.97914.36.camel@jester
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, 2003-08-08 at 11:55, Tom Lane wrote:
> "Gaetano Mendola" <mendola(at)bigfoot(dot)com> writes:
> > the following code was working properly under Postgres 7.3.X
> > I'm now running my regression test with Postgres 7.4beta1 and I'm
> > having the error in subj.
>
> I tried this and got
>
> regression=# select bar();
> bar
> -----
> 0
> (1 row)
>
> regression=#
>
> Anyone else see the problem?

Bar gives 0 for me as well.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Stephan Szabo <sszabo(at)megazone(dot)bigpanda(dot)com>
Cc: Gaetano Mendola <mendola(at)bigfoot(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: consistency check on SPI tuple count failed
Date: 2003-08-08 16:26:14
Message-ID: 27910.1060359974@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Stephan Szabo <sszabo(at)megazone(dot)bigpanda(dot)com> writes:
> I got the same thing as Gaetano on my just prior to beta1 system.

Well, we couldn't have fixed it since beta1 --- there's been no changes
anywhere near SPI. I'm thinking it must be platform-dependent. What
are you guys using, exactly?

regards, tom lane


From: "Mendola Gaetano" <mendola(at)bigfoot(dot)com>
To: <pgsql-hackers(at)postgresql(dot)org>
Cc: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: consistency check on SPI tuple count failed
Date: 2003-08-08 16:33:29
Message-ID: 04b001c35dca$cce1e880$152aa8c0@GMENDOLA2
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> "Gaetano Mendola" <mendola(at)bigfoot(dot)com> writes:
> > the following code was working properly under Postgres 7.3.X
> > I'm now running my regression test with Postgres 7.4beta1 and I'm
> > having the error in subj.
>
> I tried this and got
>
> regression=# select bar();
> bar
> -----
> 0
> (1 row)
>
> regression=#
>
> Anyone else see the problem?
>
> regards, tom lane

Incredible to believe but after playng around that funcion started
to work. I'm not crazy.

I deleted the DB.
Stopped postgres.
Restart postgres.
Create the DB.
Create the language.
Inserted my example.

Again the error:

kalman=# select bar();
ERROR: consistency check on SPI tuple count failed
CONTEXT: PL/pgSQL function "bar" line 5 at for over select rows
kalman=# select bar();
ERROR: consistency check on SPI tuple count failed
CONTEXT: PL/pgSQL function "bar" line 5 at for over select rows
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

Gaetano


From: "Mendola Gaetano" <mendola(at)bigfoot(dot)com>
To: <pgsql-hackers(at)postgresql(dot)org>
Cc: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: consistency check on SPI tuple count failed
Date: 2003-08-08 16:35:32
Message-ID: 04b501c35dcb$1623e020$152aa8c0@GMENDOLA2
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
> Stephan Szabo <sszabo(at)megazone(dot)bigpanda(dot)com> writes:
> > I got the same thing as Gaetano on my just prior to beta1 system.
>
> Well, we couldn't have fixed it since beta1 --- there's been no changes
> anywhere near SPI. I'm thinking it must be platform-dependent. What
> are you guys using, exactly?
>
> regards, tom lane

kalman=# select version();
version
----------------------------------------------------------------------------
--------------------------------
PostgreSQL 7.4beta1 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.2.2
20030222 (Red Hat Linux 3.2.2-5)
(1 row)

Regards
Gateano Mendola


From: Stephan Szabo <sszabo(at)megazone(dot)bigpanda(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Gaetano Mendola <mendola(at)bigfoot(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: consistency check on SPI tuple count failed
Date: 2003-08-08 16:36:33
Message-ID: 20030808093313.F72940-100000@megazone.bigpanda.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On Fri, 8 Aug 2003, Tom Lane wrote:

> Stephan Szabo <sszabo(at)megazone(dot)bigpanda(dot)com> writes:
> > I got the same thing as Gaetano on my just prior to beta1 system.
>
> Well, we couldn't have fixed it since beta1 --- there's been no changes
> anywhere near SPI. I'm thinking it must be platform-dependent. What
> are you guys using, exactly?

I'm using RedHat 9.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Mendola Gaetano" <mendola(at)bigfoot(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: consistency check on SPI tuple count failed
Date: 2003-08-08 17:02:25
Message-ID: 28162.1060362145@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Mendola Gaetano" <mendola(at)bigfoot(dot)com> writes:
> Again the error:

> kalman=# select bar();
> ERROR: consistency check on SPI tuple count failed
> CONTEXT: PL/pgSQL function "bar" line 5 at for over select rows
> kalman=# select bar();
> ERROR: consistency check on SPI tuple count failed
> CONTEXT: PL/pgSQL function "bar" line 5 at for over select rows
> server closed the connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
> The connection to the server was lost. Attempting reset: Failed.

After adding a second row to the test table, I am able to reproduce
the above (including the core dump after second try) on an intel/linux
box, but *not* on HPUX.

I now suspect a memory-stomp kind of problem, like someone writing one
too many bytes in a struct. HPUX tends to mask these in situations
where intel will not, because it uses MAXALIGN 8 rather than 4.

I have also just traced through _SPI_cursor_operation() in spi.c,
watched PortalRunFetch return 2, and then watched _SPI_checktuples read
zero from _SPI_current->processed. How the heck could that happen?
Compiler bug, or am I just crazy?

regards, tom lane


From: Stephan Szabo <sszabo(at)megazone(dot)bigpanda(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Mendola Gaetano <mendola(at)bigfoot(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: consistency check on SPI tuple count failed
Date: 2003-08-08 18:24:30
Message-ID: 20030808112122.R75184-100000@megazone.bigpanda.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On Fri, 8 Aug 2003, Tom Lane wrote:

> "Mendola Gaetano" <mendola(at)bigfoot(dot)com> writes:
> > Again the error:
>
> > kalman=# select bar();
> > ERROR: consistency check on SPI tuple count failed
> > CONTEXT: PL/pgSQL function "bar" line 5 at for over select rows
> > kalman=# select bar();
> > ERROR: consistency check on SPI tuple count failed
> > CONTEXT: PL/pgSQL function "bar" line 5 at for over select rows
> > server closed the connection unexpectedly
> > This probably means the server terminated abnormally
> > before or while processing the request.
> > The connection to the server was lost. Attempting reset: Failed.
>
> After adding a second row to the test table, I am able to reproduce
> the above (including the core dump after second try) on an intel/linux
> box, but *not* on HPUX.
>
> I now suspect a memory-stomp kind of problem, like someone writing one
> too many bytes in a struct. HPUX tends to mask these in situations
> where intel will not, because it uses MAXALIGN 8 rather than 4.
>
> I have also just traced through _SPI_cursor_operation() in spi.c,
> watched PortalRunFetch return 2, and then watched _SPI_checktuples read
> zero from _SPI_current->processed. How the heck could that happen?
> Compiler bug, or am I just crazy?

Not sure, but I got the same thing. When I changed it to put the
result in a temporary int variable and then put it in it started
working for me (returning 0), reverting to the original made it fail
again. I'm going to try -O0 and see what happens there.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Stephan Szabo <sszabo(at)megazone(dot)bigpanda(dot)com>
Cc: Mendola Gaetano <mendola(at)bigfoot(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: consistency check on SPI tuple count failed
Date: 2003-08-08 19:05:13
Message-ID: 29030.1060369513@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Stephan Szabo <sszabo(at)megazone(dot)bigpanda(dot)com> writes:
> On Fri, 8 Aug 2003, Tom Lane wrote:
>> I have also just traced through _SPI_cursor_operation() in spi.c,
>> watched PortalRunFetch return 2, and then watched _SPI_checktuples read
>> zero from _SPI_current->processed. How the heck could that happen?
>> Compiler bug, or am I just crazy?

> Not sure, but I got the same thing. When I changed it to put the
> result in a temporary int variable and then put it in it started
> working for me (returning 0), reverting to the original made it fail
> again. I'm going to try -O0 and see what happens there.

Oooohhhh ...

<lightbulb>
SPI_stack can move around as functions are entered/exited.
</lightbulb>

Wonder why we've not seen that kind of failure happen before? Someone
(doubtless me) must have changed the coding of this routine since 7.3.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Mendola Gaetano" <mendola(at)bigfoot(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: consistency check on SPI tuple count failed
Date: 2003-08-08 19:21:42
Message-ID: 29147.1060370502@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Mendola Gaetano" <mendola(at)bigfoot(dot)com> writes:
> Incredible to believe but after playng around that funcion started
> to work. I'm not crazy.

Yeah, it was a problem with storing into a possibly-obsolete pointer ---
the visible effects could range from nothing to a core dump depending on
whether the pointer was really out-of-date and what got clobbered if it
was.

Fix is in CVS.

regards, tom lane