Gcc 4.4 causes abort in plpython.

Lists: pgsql-hackers
From: Kurt Roeckx <kurt(at)roeckx(dot)be>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Gcc 4.4 causes abort in plpython.
Date: 2008-12-26 17:47:50
Message-ID: 20081226174750.GA26150@roeckx.be
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

I've been trying a gcc 4.4 snapshot (20081213) on buildfarm member
panda. It gets a abort during the pl-install-check part.

Here is the backtrace:
Core was generated by `postgres: build-farm pl_regression [local] SELECT '.
Program terminated with signal 6, Aborted.
[New process 3588]
#0 0x00002b41e7662ed5 in raise () from /lib/libc.so.6
(gdb) bt
#0 0x00002b41e7662ed5 in raise () from /lib/libc.so.6
#1 0x00002b41e76643f3 in abort () from /lib/libc.so.6
#2 0x00000000006a889d in ExceptionalCondition (
conditionName=<value optimized out>, errorType=<value optimized out>,
fileName=<value optimized out>, lineNumber=<value optimized out>)
at assert.c:57
#3 0x00000000006c8033 in MemoryContextAlloc (context=0x0, size=112)
at mcxt.c:507
#4 0x00000000006abe82 in CopyErrorData () at elog.c:1082
#5 0x00002b41ea61a755 in PLy_spi_execute_plan (ob=<value optimized out>,
list=<value optimized out>, limit=<value optimized out>) at plpython.c:2587
#6 0x00002b41ea61a9a6 in PLy_spi_execute (self=<value optimized out>,
args=0x2b41eae11d20) at plpython.c:2477
#7 0x00002b41ea8e5fdd in PyEval_EvalFrameEx ()
from /usr/lib/libpython2.5.so.1.0
#8 0x00002b41ea8e7385 in PyEval_EvalFrameEx ()
from /usr/lib/libpython2.5.so.1.0
#9 0x00002b41ea8e7bfd in PyEval_EvalCodeEx ()
from /usr/lib/libpython2.5.so.1.0
#10 0x00002b41ea8e7df2 in PyEval_EvalCode () from /usr/lib/libpython2.5.so.1.0
#11 0x00002b41ea61b89b in PLy_procedure_call (proc=0xc62880,
kargs=<value optimized out>, vargs=<value optimized out>) at plpython.c:962
#12 0x00002b41ea61eaae in PLy_function_handler (fcinfo=<value optimized out>,
proc=<value optimized out>) at plpython.c:790
#13 0x00002b41ea61f359 in plpython_call_handler (fcinfo=<value optimized out>)
at plpython.c:355
#14 0x000000000054f171 in ExecMakeFunctionResult (
fcache=<value optimized out>, econtext=<value optimized out>,
isNull=0xbdd3d0 "\177~\177\177\177\177\177\177", isDone=0xbdd488)
at execQual.c:1635
#15 0x000000000054a39b in ExecProject (projInfo=<value optimized out>,
isDone=<value optimized out>) at execQual.c:4922
#16 0x000000000055dfab in ExecResult (node=0xbdc7d8) at nodeResult.c:155
#17 0x0000000000549928 in ExecProcNode (node=0xbdc7d8) at execProcnode.c:338
#18 0x00000000005474c9 in standard_ExecutorRun (
queryDesc=<value optimized out>, direction=ForwardScanDirection,
count=<value optimized out>) at execMain.c:1343
#19 0x00000000005fc878 in PortalRunSelect (portal=0xbd6c58,
forward=<value optimized out>, count=0, dest=0xbd4c60) at pquery.c:942
#20 0x00000000005fdd30 in PortalRun (portal=<value optimized out>,
count=<value optimized out>, isTopLevel=<value optimized out>,
dest=<value optimized out>, altdest=<value optimized out>,
completionTag=<value optimized out>) at pquery.c:768
#21 0x00000000005f90cd in exec_simple_query (
query_string=<value optimized out>) at postgres.c:992
#22 0x00000000005fa707 in PostgresMain (argc=<value optimized out>,
argv=<value optimized out>, username=<value optimized out>)
at postgres.c:3569
#23 0x00000000005c7227 in ServerLoop () at postmaster.c:3258
#24 0x00000000005c963d in PostmasterMain (argc=3, argv=0xaf3720)
at postmaster.c:1031
#25 0x0000000000571695 in main (argc=3, argv=0xaf3710) at main.c:188
(gdb) frame 3
#3 0x00000000006c8033 in MemoryContextAlloc (context=0x0, size=112)
at mcxt.c:507
507 AssertArg(MemoryContextIsValid(context));
(gdb) p context
$1 = (MemoryContext) 0x0

I've tried looking at it, but I have no idea what could be wrong.

Note that this might be a compiler bug, and it would be nice
if someone could figure out if it's a bug in pgsql or gcc.

kurt

Kurt


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Kurt Roeckx <kurt(at)roeckx(dot)be>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Gcc 4.4 causes abort in plpython.
Date: 2008-12-29 12:25:47
Message-ID: 20081229122547.GC4545@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Kurt Roeckx wrote:

> #3 0x00000000006c8033 in MemoryContextAlloc (context=0x0, size=112)
> at mcxt.c:507
> #4 0x00000000006abe82 in CopyErrorData () at elog.c:1082
> #5 0x00002b41ea61a755 in PLy_spi_execute_plan (ob=<value optimized out>,
> list=<value optimized out>, limit=<value optimized out>) at plpython.c:2587

It's calling CopyErrorData with CurrentMemoryContext pointing to NULL,
which is not impossible since the GCC-inlined version of
MemoryContextSwitchTo does not check that it wasn't (the other version
does -- should we fix that?).

The question is why is that memory context set to NULL. The code looks
like this:

PLy_spi_execute_plan( ... )
{
MemoryContext oldcontext;
...
oldcontext = CurrentMemoryContext;
PG_TRY();
{
...
}
PG_CATCH();
{
MemoryContextSwitchTo(oldcontext);
CopyErrorData();
...
}

This has been like this for quite a while, which I find surprising
because I got scolded for a similar coding pattern awhile back. I think
I found that the variable was reversed to the value it had on entering
the block by the longjmp call. (IIRC Tom complained because his
compiler threw a "variable might be clobbered by longjmp" warning). We
at Command Prompt also had a similar case on the then-proprietary
Replicator code.

I think a simplistic solution is to declare the variable volatile.
Would you test that and report back?

Thanks.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


From: Kurt Roeckx <kurt(at)roeckx(dot)be>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Gcc 4.4 causes abort in plpython.
Date: 2008-12-29 14:24:16
Message-ID: 20081229142416.GA10372@roeckx.be
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Dec 29, 2008 at 09:25:47AM -0300, Alvaro Herrera wrote:
>
> I think a simplistic solution is to declare the variable volatile.
> Would you test that and report back?

Yes, making oldcontext volatile makes the test pass.

It now fails at the ECPG-Check stage, but it seems that is a common
problem.

Kurt


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Kurt Roeckx <kurt(at)roeckx(dot)be>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Gcc 4.4 causes abort in plpython.
Date: 2008-12-29 16:19:56
Message-ID: 8388.1230567596@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Kurt Roeckx <kurt(at)roeckx(dot)be> writes:
> On Mon, Dec 29, 2008 at 09:25:47AM -0300, Alvaro Herrera wrote:
>> I think a simplistic solution is to declare the variable volatile.
>> Would you test that and report back?

> Yes, making oldcontext volatile makes the test pass.

This is a gcc bug and you should report it. Since the variable is
not assigned within the try-block, volatile marking should not be
necessary.

regards, tom lane


From: Kurt Roeckx <kurt(at)roeckx(dot)be>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Gcc 4.4 causes abort in plpython.
Date: 2008-12-29 17:26:34
Message-ID: 20081229172634.GA26149@roeckx.be
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Dec 29, 2008 at 11:19:56AM -0500, Tom Lane wrote:
> Kurt Roeckx <kurt(at)roeckx(dot)be> writes:
> > On Mon, Dec 29, 2008 at 09:25:47AM -0300, Alvaro Herrera wrote:
> >> I think a simplistic solution is to declare the variable volatile.
> >> Would you test that and report back?
>
> > Yes, making oldcontext volatile makes the test pass.
>
> This is a gcc bug and you should report it. Since the variable is
> not assigned within the try-block, volatile marking should not be
> necessary.

Reported as:
http://gcc.gnu.org/PR38660

kurt