Quick Links

Re: crash in plancache with subtransactions

Lists:	pgsql-hackers

From:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	crash in plancache with subtransactions
Date:	2010-10-21 21:05:52
Message-ID:	1287694169-sup-7188@alvh.no-ip.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

A customer was hitting some misbehavior in one of their internal tests and
I tracked it down to plancache not behaving properly with
subtransactions: in particular, a plan is not being marked "dead" when
the subtransaction on which it is planned rolls back. It was reported
in 8.4, but I can reproduce the problem on 9.0 too with this small
script:

drop schema alvherre cascade;
drop schema test cascade;
create schema test;
create schema alvherre;
set search_path = 'alvherre';

create or replace function dummy(text) returns text language sql
as $$ SELECT relname::text FROM pg_class c WHERE c.oid = $1::regclass $$;

create or replace function broken(p_name_table text) returns void
language plpgsql as $$
declare
v_table_full text := alvherre.dummy(p_name_table);
begin
return;
end;
$$;

BEGIN;
create table test.stuffs (stuff text);
SAVEPOINT a;
select broken('nonexistant.stuffs');

ROLLBACK TO a;
select broken('test.stuffs');

rollback;

The symptom is that the second call to broken() fails with this error
message:

ERROR: relation "" does not exist
CONTEXT: SQL function "dummy" statement 1
PL/pgSQL function "broken" line 3 during statement block local variable initialization

Note that this is totally bogus, because the relation being referenced
does indeed exist. In fact, if you commit the transaction and call the
function again, it works.

Also, the state after the first call is a bit bogus: if you repeat the
whole sequence starting at the BEGIN line, it causes a crash on 8.4.

I hacked up plancache a bit so that it marks plans as dead when the
subtransaction resource owner releases it. It adds a new arg to
ReleaseCachedPlan(); if true, the plan is marked dead. All current
callers, except the one in ResourceOwnerReleaseInternal(), use false
thus preserving the current behavior. resowner sets this as true when
aborting a (sub)transaction.

I have to admit that it seems somewhat the wrong API, but I don't see a
better way. (I thought above relcache or syscache inval, but as far as
I can't tell there isn't any here). I'm open to suggestions.

Patch attached.

--
Álvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>

Attachment	Content-Type	Size
0001-Mark-a-cache-plan-as-dead-when-aborting-its-creating.patch	application/octet-stream	7.3 KB

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: crash in plancache with subtransactions
Date:	2010-10-21 22:36:07
Message-ID:	7979.1287700567@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> writes:
> A customer was hitting some misbehavior in one of their internal tests and
> I tracked it down to plancache not behaving properly with
> subtransactions: in particular, a plan is not being marked "dead" when
> the subtransaction on which it is planned rolls back.

I don't believe that it's plancache's fault; the real problem is that
plpgsql is keeping "simple expression" execution trees around longer
than it should. Your patch masks the problem by forcing those trees to
be rebuilt, but it's the execution trees not the plan trees that contain
stale data.

I'm not immediately sure why plpgsql_subxact_cb is failing to clean up
correctly in this example, but that seems to be where to look.

regards, tom lane

From:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: crash in plancache with subtransactions
Date:	2010-10-21 23:01:08
Message-ID:	1287701882-sup-6965@alvh.no-ip.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Excerpts from Tom Lane's message of jue oct 21 19:36:07 -0300 2010:
> Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> writes:
> > A customer was hitting some misbehavior in one of their internal tests and
> > I tracked it down to plancache not behaving properly with
> > subtransactions: in particular, a plan is not being marked "dead" when
> > the subtransaction on which it is planned rolls back.
>
> I don't believe that it's plancache's fault; the real problem is that
> plpgsql is keeping "simple expression" execution trees around longer
> than it should. Your patch masks the problem by forcing those trees to
> be rebuilt, but it's the execution trees not the plan trees that contain
> stale data.

Ahh, this probably explains why I wasn't been able to reproduce the
problem without involving subxacts, or prepared plans, that seemed to
follow mostly the same paths around plancache cleanup.

It's also the likely cause that this hasn't ben reported earlier.

> I'm not immediately sure why plpgsql_subxact_cb is failing to clean up
> correctly in this example, but that seems to be where to look.

Will take a look ... if the girls let me ...

--
Álvaro Herrera <alvherre(at)commandprompt(dot)com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

From:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: crash in plancache with subtransactions
Date:	2010-10-22 00:26:01
Message-ID:	1287707071-sup-9015@alvh.no-ip.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Excerpts from Tom Lane's message of jue oct 21 19:36:07 -0300 2010:

> I'm not immediately sure why plpgsql_subxact_cb is failing to clean up
> correctly in this example, but that seems to be where to look.

I think the reason is that one econtext is pushed for function
execution, and another one for blocks that contain exceptions. The
example function does not contain exceptions -- the savepoints are
handled by the external SQL code.

I'll have a closer look tomorrow.

--
Álvaro Herrera <alvherre(at)commandprompt(dot)com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: crash in plancache with subtransactions
Date:	2010-10-22 03:10:47
Message-ID:	12392.1287717047@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> Excerpts from Tom Lane's message of jue oct 21 19:36:07 -0300 2010:
>> I don't believe that it's plancache's fault; the real problem is that
>> plpgsql is keeping "simple expression" execution trees around longer
>> than it should. Your patch masks the problem by forcing those trees to
>> be rebuilt, but it's the execution trees not the plan trees that contain
>> stale data.

> Ahh, this probably explains why I wasn't been able to reproduce the
> problem without involving subxacts, or prepared plans, that seemed to
> follow mostly the same paths around plancache cleanup.

> It's also the likely cause that this hasn't ben reported earlier.

I traced through the details and found that the proximate cause of the
crash is that this bit in fmgr_sql() gets confused:

/*
* Convert params to appropriate format if starting a fresh execution. (If
* continuing execution, we can re-use prior params.)
*/
if (es && es->status == F_EXEC_START)
postquel_sub_params(fcache, fcinfo);

After the error in the first subtransaction, the execution state tree
for the "public.dummy(p_name_table)" expression has a fn_extra link
that is pointing to a SQLFunctionCache that's in F_EXEC_RUN state.
So when the second call of broken() tries to re-use the state tree,
fmgr_sql() thinks it's continuing the execution of a set-returning
function, and doesn't bother to re-initialize its ParamListInfo
struct. So it merrily tries to execute using a text datum that's
pointing at long-since-pfree'd storage for the original
'nonexistant.stuffs' argument string.

I had always felt a tad uncomfortable with the way that plpgsql re-uses
execution state trees for simple expressions; it seemed to me that it
was entirely unsafe in recursive usage. But I'd never been able to
prove that it was broken. Now that I've seen this example, I know how
to break it: recurse indirectly through a SQL function. For instance,
this will dump core immediately:

create or replace function recurse(float8) returns float8 as
$$
begin
raise notice 'recurse(%)', $1;
if ($1 < 10) then
return sql_recurse($1 + 1);
else
return $1;
end if;
end
$$ language plpgsql;

-- "limit" is to prevent this from being inlined
create or replace function sql_recurse(float8) returns float8 as
$$ select recurse($1) limit 1; $$ language sql;

select recurse(0);

Notice the lack of any subtransaction or error condition. The reason
this fails is *not* plancache misfeasance or failure to clean up after
error. Rather, it's that the inner execution of recurse() is trying to
re-use an execution state tree that is pointing at an already-active
execution of sql_recurse(). In general, what plpgsql is doing is
entirely unsafe whenever a called function tries to keep changeable
execution state in storage pointed to by fn_extra. We've managed to
miss the problem because plpgsql doesn't try to use this technique on
functions returning set (see exec_simple_check_node), and the vast
majority of non-SRF functions that use fn_extra at all use it to cache
catalog lookup results, which don't change from call to call. But
there's no convention that says a function can't keep execution status
data in fn_extra --- in fact, there isn't anyplace else for it to keep
such data.

Right at the moment I'm not seeing any way that the present
exec_eval_simple_expr approach can be fixed to work safely in the
presence of recursion. What I think we might have to do is give up
on the idea of caching execution state trees across calls, instead
using them just for the duration of a single plpgsql function call.
I'm not sure what sort of runtime penalty might ensue. The whole design
predates the plancache, and I think it was mostly intended to prevent
having to re-parse-and-plan simple expressions every time. So a lot of
that overhead has gone away anyway given the plancache, and maybe we
shouldn't sweat too much about paying what remains. (But on the third
hand, what are we gonna do for back-patching to versions without the
plancache?)

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: crash in plancache with subtransactions
Date:	2010-10-22 05:50:14
Message-ID:	4CC12616.6010406@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 22.10.2010 06:10, Tom Lane wrote:
> Right at the moment I'm not seeing any way that the present
> exec_eval_simple_expr approach can be fixed to work safely in the
> presence of recursion. What I think we might have to do is give up
> on the idea of caching execution state trees across calls, instead
> using them just for the duration of a single plpgsql function call.
> I'm not sure what sort of runtime penalty might ensue. The whole design
> predates the plancache, and I think it was mostly intended to prevent
> having to re-parse-and-plan simple expressions every time. So a lot of
> that overhead has gone away anyway given the plancache, and maybe we
> shouldn't sweat too much about paying what remains.

We should test and measure that.

> (But on the third
> hand, what are we gonna do for back-patching to versions without the
> plancache?)

One simple idea is to keep a flag along with the executor state to
indicate that the executor state is currently in use. Set it just before
calling ExecEvalExpr, and reset afterwards. If the flag is already set
in the beginning of exec_eval_simple_expr, we have recursed, and must
create a new executor state.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: crash in plancache with subtransactions
Date:	2010-10-22 15:19:48
Message-ID:	23453.1287760788@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
> On 22.10.2010 06:10, Tom Lane wrote:
>> (But on the third
>> hand, what are we gonna do for back-patching to versions without the
>> plancache?)

> One simple idea is to keep a flag along with the executor state to
> indicate that the executor state is currently in use. Set it just before
> calling ExecEvalExpr, and reset afterwards. If the flag is already set
> in the beginning of exec_eval_simple_expr, we have recursed, and must
> create a new executor state.

Yeah, the same thought occurred to me in the shower this morning.
I'm concerned about possible memory leakage during repeated recursion,
but maybe that can be dealt with. I'll look into this issue soon,
though probably not today (Red Hat work calls :-().

regards, tom lane

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: crash in plancache with subtransactions
Date:	2010-10-26 20:48:46
Message-ID:	2736.1288126126@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

I wrote:
> Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
>> One simple idea is to keep a flag along with the executor state to
>> indicate that the executor state is currently in use. Set it just before
>> calling ExecEvalExpr, and reset afterwards. If the flag is already set
>> in the beginning of exec_eval_simple_expr, we have recursed, and must
>> create a new executor state.

> Yeah, the same thought occurred to me in the shower this morning.
> I'm concerned about possible memory leakage during repeated recursion,
> but maybe that can be dealt with.

I spent some time poking at this today, and developed the attached
patch, which gets rid of all the weird assumptions associated with
"simple expressions" in plpgsql, in favor of just doing another
ExecInitExpr per expression in each call of the function. While this is
a whole lot cleaner than what we have, I'm afraid that it's unacceptably
slow. I'm seeing an overall slowdown of 2X to 3X on function execution
with examples like:

create or replace function speedtest10(x float8) returns float8 as $$
declare
z float8 := x;
begin
z := z * 2 + 1;
z := z * 2 + 1;
z := z * 2 + 1;
z := z * 2 + 1;
z := z * 2 + 1;
z := z * 2 + 1;
z := z * 2 + 1;
z := z * 2 + 1;
z := z * 2 + 1;
z := z * 2 + 1;
return z;
end
$$
language plpgsql immutable;

Now, this is about the worst case for the patch. This function's
runtime depends almost entirely on the speed of simple expressions,
and because there's no internal looping, we only get to use the result
of each ExecInitExpr once per function call. So probably "typical" use
cases wouldn't be quite so bad; but still it seems like we can't go this
route. We need to be able to use the ExecInitExpr results across
successive calls one way or another.

I'll look into creating an in-use flag next.

regards, tom lane

Attachment	Content-Type	Size
simplify-simple-expressions.patch	text/x-patch	21.9 KB

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: crash in plancache with subtransactions
Date:	2010-10-27 21:18:06
Message-ID:	28696.1288214286@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

I wrote:
>> Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
>>> One simple idea is to keep a flag along with the executor state to
>>> indicate that the executor state is currently in use. Set it just before
>>> calling ExecEvalExpr, and reset afterwards. If the flag is already set
>>> in the beginning of exec_eval_simple_expr, we have recursed, and must
>>> create a new executor state.

>> Yeah, the same thought occurred to me in the shower this morning.
>> I'm concerned about possible memory leakage during repeated recursion,
>> but maybe that can be dealt with.

I spent quite a bit of time trying to deal with the memory-leakage
problem without adding still more bookkeeping overhead. It wasn't
looking good, and then I had a sudden insight: if we see that the in-use
flag is set, we can simply return FALSE from exec_eval_simple_expr.
That causes exec_eval_expr to run the expression using the "non simple"
code path, which is perfectly safe because it isn't trying to reuse
state that might be dirty. Thus the attached patch, which fixes
both of the failure cases discussed in this thread.

Advantages:
1. The slowdown for "normal" cases (non-recursive, non-error-inducing)
is negligible.
2. It's simple enough to back-patch without fear.

Disadvantages:
1. Recursive cases get noticeably slower, about 4X slower according
to tests with this example:

create or replace function factorial(n int) returns float8 as $$
begin
if (n > 1) then
return n * factorial(n - 1);
end if;
return 1;
end
$$ language plpgsql strict immutable;

The slowdown is only for the particular expression that actually has
invoked a recursive call, so the above is probably much the worst case
in practice. I doubt many people really use plpgsql this way, but ...

2. Cases where we're re-executing an expression that failed earlier in
the same transaction likewise get noticeably slower. This is only a
problem if you're using subtransactions to catch errors, and the
overhead of the subtransaction is going to be large enough to partially
hide the extra eval cost anyway. So I didn't bother to make a timing
test case --- it's not going to be as bad as the example above.

I currently think that we don't have much choice except to use this
patch for the back branches: any better-performing alternative is
going to require enough surgery that back-patching would be dangerous.
Maybe somebody can come up with a better answer for HEAD, but I don't
have one.

Comments?

regards, tom lane

Attachment	Content-Type	Size
simple-expr-fix-2.patch	text/x-patch	4.5 KB

From:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: crash in plancache with subtransactions
Date:	2010-10-29 15:40:47
Message-ID:	1288366783-sup-3609@alvh.no-ip.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Excerpts from Tom Lane's message of mié oct 27 18:18:06 -0300 2010:
> I wrote:
> >> Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
> >>> One simple idea is to keep a flag along with the executor state to
> >>> indicate that the executor state is currently in use. Set it just before
> >>> calling ExecEvalExpr, and reset afterwards. If the flag is already set
> >>> in the beginning of exec_eval_simple_expr, we have recursed, and must
> >>> create a new executor state.
>
> >> Yeah, the same thought occurred to me in the shower this morning.
> >> I'm concerned about possible memory leakage during repeated recursion,
> >> but maybe that can be dealt with.
>
> I spent quite a bit of time trying to deal with the memory-leakage
> problem without adding still more bookkeeping overhead. It wasn't
> looking good, and then I had a sudden insight: if we see that the in-use
> flag is set, we can simply return FALSE from exec_eval_simple_expr.

I tried the original test cases that were handed to me (quite different
from what I submitted here) and they are fixed also. Thanks.

--
Álvaro Herrera <alvherre(at)commandprompt(dot)com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: crash in plancache with subtransactions
Date:	2010-10-29 15:54:52
Message-ID:	2717.1288367692@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> Excerpts from Tom Lane's message of mi oct 27 18:18:06 -0300 2010:
>> I spent quite a bit of time trying to deal with the memory-leakage
>> problem without adding still more bookkeeping overhead. It wasn't
>> looking good, and then I had a sudden insight: if we see that the in-use
>> flag is set, we can simply return FALSE from exec_eval_simple_expr.

> I tried the original test cases that were handed to me (quite different
> from what I submitted here) and they are fixed also. Thanks.

It'd be interesting to know if there's any noticeable slowdown on
affected real-world cases. (Of course, if they invariably crashed
before, there might not be a way to measure their previous speed...)

regards, tom lane

From:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: crash in plancache with subtransactions
Date:	2010-10-29 20:28:11
Message-ID:	1288383478-sup-8441@alvh.no-ip.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Excerpts from Tom Lane's message of vie oct 29 12:54:52 -0300 2010:
> Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:

> > I tried the original test cases that were handed to me (quite different
> > from what I submitted here) and they are fixed also. Thanks.
>
> It'd be interesting to know if there's any noticeable slowdown on
> affected real-world cases. (Of course, if they invariably crashed
> before, there might not be a way to measure their previous speed...)

Yeah, the cases that were reported failed with one of

ERROR: invalid memory alloc request size 18446744073482534916
ERROR: could not open relation with OID 0
ERROR: buffer 228 is not owned by resource owner Portal

--
Álvaro Herrera <alvherre(at)commandprompt(dot)com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

From:	Jim Nasby <Jim(at)Nasby(dot)net>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: crash in plancache with subtransactions
Date:	2010-11-01 14:14:03
Message-ID:	DBDF579B-36DF-4EBE-A0DC-26C62C550199@Nasby.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Oct 29, 2010, at 10:54 AM, Tom Lane wrote:
> Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
>> Excerpts from Tom Lane's message of mié oct 27 18:18:06 -0300 2010:
>>> I spent quite a bit of time trying to deal with the memory-leakage
>>> problem without adding still more bookkeeping overhead. It wasn't
>>> looking good, and then I had a sudden insight: if we see that the in-use
>>> flag is set, we can simply return FALSE from exec_eval_simple_expr.
>
>> I tried the original test cases that were handed to me (quite different
>> from what I submitted here) and they are fixed also. Thanks.
>
> It'd be interesting to know if there's any noticeable slowdown on
> affected real-world cases. (Of course, if they invariably crashed
> before, there might not be a way to measure their previous speed...)

I should be able to get Alvaro something he can use to test the performance. Our patch framework uses a recursive function to follow patch dependencies (of course that can go away in 8.4 thanks to WITH). I know we've got some other recursive calls but I don't think any are critical (it'd be nice if there was a way to find out if a function was recursive, I guess theoretically that could be discovered during compilation but I don't know how hairy it would be).

One question: What happens if you have multiple paths to the same function within another function? For example, we have an assert function that's used all over the place; it will definitely be called from multiple places in a call stack.

FWIW, I definitely run into cases where recursion makes for cleaner code than looping, so it'd be great to avoid making it slower than it needs to be. But I've always assumed that recursion is slower than looping so I avoid it for anything I know could be performance sensitive.

(looking at original case)... the original bug wasn't actually recursive. It's not clear to me how it actually got into this case. The original error report is:

psql:sql/code.lookup_table_dynamic.sql:23: ERROR: buffer 2682 is not owned by resource owner Portal
CONTEXT: SQL function "table_schema_and_name" statement 1
SQL function "table_full_name" statement 1
PL/pgSQL function "getsert" line 9 during statement block local variable initialization
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.

Line 23 is:

SELECT code.getsert( 'test.stuffs', 'stuff' );

The functions are below. The duplicity of full_name_table and table_full_name is because the function was originally called full_name_table, but I decided to rename it after creating other table functions. In any case, I don't see any obvious recursion or re-entry, unless perhaps tools.table_schema_and_name ends up getting called twice by tools.table_full_name?

-[ RECORD 1 ]-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------
Schema | code
Name | getsert
Result data type | void
Argument data types | p_table_name text, p_lookup text
Volatility | volatile
Owner | cnuadmin
Language | plpgsql
Source code |
: DECLARE
: v_object_class text := 'getsert';
: v_function_name text := p_table_name || '__' || v_object_class;
:
: v_table_full text := tools.full_name_table( p_table_name );
: v_schema text;
: v_table text;
:
: BEGIN
: SELECT INTO v_schema, v_table * FROM tools.split_schema( v_table_full );
:
: PERFORM code_internal.create_object( v_function_name, 'FUNCTION', v_object_class, array[ ['schema', v_schema], ['table', v_table], ['lookup', p_lookup] ] );
: END;
:
Description | Creates a function that will lookup an ID based on a text lookup value (p_lookup). If no record exists, one will be created.
:
: Parameters:
: p_table_name Name of the table to lookup the value in
: p_lookup Name of the field to use for the lookup value
:
: Results:
: Creates function %p_table_name%__getsert( %p_lookup% with a type matching the p_lookup field in p_table_name ). The function returns an ID as an int.
: Revokes all on the function from public and grants execute to cnuapp_role.
:

test_us(at)workbook(dot)local=# \df+ tools.table_full_name
List of functions
-[ RECORD 1 ]-------+-------------------------------------------------------------------------------
Schema | tools
Name | table_full_name
Result data type | text
Argument data types | p_table_name text
Volatility | volatile
Owner | su
Language | sql
Source code | SELECT schema_name || '.' || table_name FROM tools.table_schema_and_name( $1 )
Description |

test_us(at)workbook(dot)local=# \df+ tools.table_schema_and_name
List of functions
-[ RECORD 1 ]-------+------------------------------------------------------------------------------
Schema | tools
Name | table_schema_and_name
Result data type | record
Argument data types | p_table_name text, OUT schema_name text, OUT table_name text
Volatility | volatile
Owner | su
Language | sql
Source code |
: SELECT quote_ident(nspname), quote_ident(relname)
: FROM pg_class c
: JOIN pg_namespace n ON n.oid = c.relnamespace
: WHERE c.oid = $1::regclass
: AND tools.assert( relkind = 'r', 'Relation ' || $1 || ' is not a table' )
:
Description |

--
Jim C. Nasby, Database Architect jim(at)nasby(dot)net
512.569.9461 (cell) http://jim.nasby.net

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Jim Nasby <Jim(at)Nasby(dot)net>
Cc:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: crash in plancache with subtransactions
Date:	2010-11-01 16:25:08
Message-ID:	21519.1288628708@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Jim Nasby <Jim(at)Nasby(dot)net> writes:
> (looking at original case)... the original bug wasn't actually
> recursive.

No, there are two different cases being dealt with here. If the first
call of an expression results in an error, and then we come back and try
to re-use the expression state tree, we can have trouble because the
state tree contains partially-updated internal state for the called
function. This doesn't require any recursion but it leads to pretty
much the same problems as the recursive case.

regards, tom lane