Re: pl/python do not delete function arguments

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Jan Urbański <wulczer(at)wulczer(dot)org>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>, Postgres - Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pl/python do not delete function arguments
Date: 2011-02-26 05:17:32
Message-ID: AANLkTinX_Cs-UK5DgHXRm-k3hrTqTVkS7_4fS+DzYoQh@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Feb 15, 2011 at 6:04 PM, Jan Urbański <wulczer(at)wulczer(dot)org> wrote:
> On 15/02/11 20:39, Peter Eisentraut wrote:
>> On tis, 2011-02-15 at 09:58 +0100, Jan Urbański wrote:
>>> Because the invocation that actually recurses sets up the scene for
>>> failure.
>>
>> That's what we're observing, but I can't figure out why it is.  If you
>> can, could you explain it?
>>
>> It actually makes sense to me that the arguments should be deleted at
>> the end of the call.  The data belongs to that call only, and
>> PLy_procedure_delete() that would otherwise clean it up is only called
>> rarely.
>>
>> Apparently, the recursive call ends up deleting the wrong arguments, but
>> it's not clear to me why that would affect the next top-level call,
>> because that would set up its own arguments again anyway.  In any case,
>> perhaps the right fix is to fix PLy_function_delete_args() to delete the
>> args correctly.
>
> Aaah, ok, I got it (again). Let me write this in full before I forget
> and spend another hour chasing that bug (and boy, bugs that disappear
> because you're doing things in the debugger are so annoying). And
> actually, my patch doesn't fix it fully :| Let me demonstrate:
>
> CREATE FUNCTION rec(n integer) RETURNS integer AS $$
> if n == 0:
>    return
> plpy.notice("before n is %d" % n)
> plpy.execute("select rec(0)")
> plpy.notice("after n is %d" % n)
> $$ LANGUAGE plpythonu;
>
> Without the patch the second plpy.notice raises a NameError. With the
> patch the output is:
>
> NOTICE:  before n is 4
> CONTEXT:  PL/Python function "rec"
> NOTICE:  after n is 0
> CONTEXT:  PL/Python function "rec"
>
> What happens? In PLy_function_handler, PLy_function_build_args is
> called, and proc->globals is set. After that PLy_procedure_call is
> called, which starts executing Python code. The Python code does a call
> into C with plpy.execute, and PLy_function_handler gets called (a
> reentrant call).
>
> Then PLy_function_build_args is called again. It overwrites the "n"
> entry in proc->globals and then PLy_procedure_call gets called, which
> drops us back into Python (on the stack there's now C, Python, C,
> Python). This second invocation exits quickly because n == 0, and we're
> back in C.
>
> Now without my patch, the next thing to happen was deleting the
> arguments, which removed "n" from the proc->globals dict. The rest of C
> code runs and finally plpy.execute returns and we;re back in Python (the
> stack is C, Python).
>
> The second plpy.notice is run, which fetches "n" from the globals, and
> not finding it, raises a NameError. With the patch it simply fetches the
> overwritten value, namely 0.
>
> The KeyError was a red herring - that's how Python reacted when
> evaluating "n in (0, 1)", and if you look in the server log you'll see a
> RuntimeWarning complaining about something internal, that doesn't
> matter. The bottom line is that PLy_procedure_call is not reentrant
> because of proc->globals, and it has to be.
>
> Now when fixing this bug I tries copying the globals dict and restoring
> it, but ran into issues (I think the problem was that the function
> didn't like running with different globals then the one it has been
> compiled with). Not sure what to do with this :( Document it as a caveat
> (with or without my patch) and carry on? That sucks quite badly...

From this discussion I gather that we have a problem here that we
don't exactly know how to fix, so I'm inclined to suggest that we mark
this Returned with Feedback in the CommitFest and instead add it to
the TODO. Since this is a pre-existing bug and not a new regression,
it should not be something we hold up beta for.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2011-02-26 05:20:10 Re: wCTE: about the name of the feature
Previous Message Tom Lane 2011-02-26 05:15:58 Re: Review: Fix snapshot taking inconsistencies