Re: Question about MemoryContexts and functions that returns

Lists: pgsql-hackers
From: Thomas Hallgren <thomas(at)tada(dot)se>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Question about MemoryContexts and functions that returns sets.
Date: 2006-03-20 10:47:41
Message-ID: 441E884D.7010707@tada.se
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,
A PL/Java user reports that his backend runs out of memory when he uses
PL/Java to execute huge queries towards a remote database and return the
result. PL/Java is designed not to collect data in memory when it
returns result sets. Each call to the function handler will be
dispatched to the corresponding 'ResultSet.next()' in order to retrieve
and propagate one row at a time. Yet, it seems the data is collected
somewhere. An excerpt from the user at the time he runs out of memory
looks like this:

SPI Proc: 273670144 total in 67 blocks; 3840 free (29 chunks); 273666304 used
...
ExecutorState: 1141402016 total in 204 blocks; 7923848 free (3633 chunks); 1133478168 used
...

so obviously, I'm doing something wrong in my code. Any advice what I
should be looking for?

Kind Regards,
Thomas Hallgren


From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: Thomas Hallgren <thomas(at)tada(dot)se>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Question about MemoryContexts and functions that returns sets.
Date: 2006-03-20 11:03:40
Message-ID: 20060320110340.GD21428@svana.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Mar 20, 2006 at 11:47:41AM +0100, Thomas Hallgren wrote:
> Hi,
> A PL/Java user reports that his backend runs out of memory when he uses
> PL/Java to execute huge queries towards a remote database and return the
> result. PL/Java is designed not to collect data in memory when it
> returns result sets. Each call to the function handler will be
> dispatched to the corresponding 'ResultSet.next()' in order to retrieve
> and propagate one row at a time. Yet, it seems the data is collected
> somewhere. An excerpt from the user at the time he runs out of memory
> looks like this:

It's not clear exactly what you are doing, but the Datum you return
points to memory allocated *somewhere*. If you have palloc()ed it in
your own MemoryContext then you must free it the next time you are
called. Normally this is acheived by restting your context each time,
although you could free if you wished.

Hope this helps,
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.


From: Thomas Hallgren <thomas(at)tada(dot)se>
To: Martijn van Oosterhout <kleptog(at)svana(dot)org>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Question about MemoryContexts and functions that returns
Date: 2006-03-20 11:36:58
Message-ID: 441E93DA.7040600@tada.se
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Martijn van Oosterhout wrote:
> On Mon, Mar 20, 2006 at 11:47:41AM +0100, Thomas Hallgren wrote:
>
>> Hi,
>> A PL/Java user reports that his backend runs out of memory when he uses
>> PL/Java to execute huge queries towards a remote database and return the
>> result. PL/Java is designed not to collect data in memory when it
>> returns result sets. Each call to the function handler will be
>> dispatched to the corresponding 'ResultSet.next()' in order to retrieve
>> and propagate one row at a time. Yet, it seems the data is collected
>> somewhere. An excerpt from the user at the time he runs out of memory
>> looks like this:
>>
>
> It's not clear exactly what you are doing, but the Datum you return
> points to memory allocated *somewhere*. If you have palloc()ed it in
> your own MemoryContext then you must free it the next time you are
> called. Normally this is acheived by restting your context each time,
> although you could free if you wished.
>
> Hope this helps,
>
The function in question uses the SRF_ family of macros. I'm always
returning datums allocated in the context that was current when the
function was callled.

But, hrrm. I see that I use the durable 'multi_call_memory_ctx'
throughout the whole procedure. I set it up during SRF_IS_FIRSTCALL()
and then I reinstate it for the duration of each call, in effect
preserving every temporary allocation that is made until the set is
completely returned. Oops! That would account for one of the contexts
being filled up, but not both. Exactly what is stored in the
'ExecutorState' and the 'SPI Proc' contexts? What is their life-cycle?

Regards,
Thomas Hallgren


From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: Thomas Hallgren <thomas(at)tada(dot)se>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Question about MemoryContexts and functions that returns sets.
Date: 2006-03-20 12:17:15
Message-ID: 20060320121715.GE21428@svana.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Mar 20, 2006 at 12:36:58PM +0100, Thomas Hallgren wrote:
> The function in question uses the SRF_ family of macros. I'm always
> returning datums allocated in the context that was current when the
> function was callled.
>
> But, hrrm. I see that I use the durable 'multi_call_memory_ctx'
> throughout the whole procedure. I set it up during SRF_IS_FIRSTCALL()
> and then I reinstate it for the duration of each call, in effect
> preserving every temporary allocation that is made until the set is
> completely returned. Oops! That would account for one of the contexts
> being filled up, but not both. Exactly what is stored in the
> 'ExecutorState' and the 'SPI Proc' contexts? What is their life-cycle?

Hmm, without seeing the code it's hard to tell. However, many functions
expect to be called from short-lived contexts and tend to leak small
amounts of memory. If you're allocating everything into a long-lived
context, it's possible these little allocations are getting you.

Consider the string btree comparison function. Normally it could be
sloppy about how it allocates memory, but when PostgreSQL does a CREATE
INDEX, it calls that function a *lot* without any intervening context
reset, causing memory issues. So that particular is coded carefully to
avoid this.

SPI Proc is the context allocated for that procedure, it's is never
reset explicitly AFAICS. SPI calls execute in another context which is
reset regularly. ExecutorState is the tag given to all exec nodes, not
sure how to find out exactly what it's for.

Not sure what else to say. Perhaps tracing the context each allocation
is done in might help.

Have a nice day,
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.


From: Thomas Hallgren <thomas(at)tada(dot)se>
To: David Fetter <david(at)fetter(dot)org>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Question about MemoryContexts and functions that returns
Date: 2006-03-21 18:52:47
Message-ID: 44204B7F.1090005@tada.se
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

David,
Thanks for the tip. A diff on the plperl source was really helpful.

As it turns out, I'm not supposed to allocate the returned tuple in the
caller context. Apparently, PostgreSQL will always make a copy of it. I
find this a bit inconsistent with how other return values are handled.
PL/Java initially had some problems when I trusted that values where
copied when in fact they where not. Has the function call semantics
changed in this respect?

Kind Regards,
Thomas Hallgren

David Fetter wrote:
> On Mon, Mar 20, 2006 at 11:47:41AM +0100, Thomas Hallgren wrote:
>
>> Hi,
>> A PL/Java user reports that his backend runs out of memory when he uses
>> PL/Java to execute huge queries towards a remote database and return the
>> result. PL/Java is designed not to collect data in memory when it
>> returns result sets. Each call to the function handler will be
>> dispatched to the corresponding 'ResultSet.next()' in order to retrieve
>> and propagate one row at a time. Yet, it seems the data is collected
>> somewhere. An excerpt from the user at the time he runs out of memory
>> looks like this:
>>
>
> A similar thing happened in PL/Perl up until recently. Check Neil
> Conway's patches to that for hints :)
>
> Cheers,
> D
>


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Thomas Hallgren <thomas(at)tada(dot)se>
Cc: David Fetter <david(at)fetter(dot)org>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Question about MemoryContexts and functions that returns
Date: 2006-03-21 21:14:32
Message-ID: 29467.1142975672@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Thomas Hallgren <thomas(at)tada(dot)se> writes:
> As it turns out, I'm not supposed to allocate the returned tuple in the
> caller context.

Where do you get that from? plpgsql and plperl both do it that way AFAICS.

Are you testing in an --enable-cassert build? The memory-clobber
behavior that that turns on is really essential for finding
dangling-pointer problems ...

regards, tom lane


From: Thomas Hallgren <thomas(at)tada(dot)se>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: David Fetter <david(at)fetter(dot)org>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Question about MemoryContexts and functions that returns
Date: 2006-03-21 23:45:13
Message-ID: 44209009.3060506@tada.se
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> Thomas Hallgren <thomas(at)tada(dot)se> writes:
>
>> As it turns out, I'm not supposed to allocate the returned tuple in the
>> caller context.
>>
>
> Where do you get that from? plpgsql and plperl both do it that way AFAICS.
>
> Are you testing in an --enable-cassert build? The memory-clobber
> behavior that that turns on is really essential for finding
> dangling-pointer problems ...
>
>

I use --enable-cassert. I don't think my problem is a dangling pointer.

I just created a dummy C-function that short circuits the
java_call_handler. It calls my real java function with the correct
parameters. When I register this function with language C and use it
instead of the normal function that calls via the java call handler,
there's no memory leak. It only leaks memory when I go through the call
handler. The call handler doesn't execute many lines of code and from
what I can tell, it doesn't manipulate contexts at all. Nor does it
allocate anything. Further more, I can prevent the leak by allocating
the returned tuple in a context of my own and free it on the next call.

Is there a difference in how the executor treat a C function and a
function using a call handler that can cause this behavior?

Regards,
Thomas Hallgren


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Thomas Hallgren <thomas(at)tada(dot)se>
Cc: David Fetter <david(at)fetter(dot)org>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Question about MemoryContexts and functions that returns
Date: 2006-03-22 00:06:21
Message-ID: 8240.1142985981@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Thomas Hallgren <thomas(at)tada(dot)se> writes:
> Is there a difference in how the executor treat a C function and a
> function using a call handler that can cause this behavior?

Can't think of one. You'd better take a closer look at your call
handler.

gdb'ing with a watchpoint on writes to CurrentMemoryContext might be
helpful at seeing whether the context is changing unexpectedly.

regards, tom lane


From: Thomas Hallgren <thomas(at)tada(dot)se>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: David Fetter <david(at)fetter(dot)org>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Question about MemoryContexts and functions that returns
Date: 2006-03-22 06:33:32
Message-ID: 4420EFBC.2030700@tada.se
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> Thomas Hallgren <thomas(at)tada(dot)se> writes:
>
>> Is there a difference in how the executor treat a C function and a
>> function using a call handler that can cause this behavior?
>>
>
> Can't think of one. You'd better take a closer look at your call
> handler.
>
> gdb'ing with a watchpoint on writes to CurrentMemoryContext might be
> helpful at seeing whether the context is changing unexpectedly.
>
>

Yes, that was helpful. My fault of course. I had a comment in place that
explained exactly what ought to happen. Then the code did the exact
opposite. An excerpt:

/* a class loader or other mechanism might have connected
already. This
* connection must be dropped since its parent context is wrong.
*/
if(self->isMultiCall && SRF_IS_FIRSTCALL())
Invocation_assertConnect();

The Invocation_assertConnect() performs an SPI_connect(). Sigh...
Comments are dangerous :-)

Thanks for your help.

Kind Regards,
Thomas Hallgren