Re: Performance of aggregates over set-returning functions

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "John Smith" <sodgodofall(at)gmail(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Performance of aggregates over set-returning functions
Date: 2008-01-09 03:51:04
Message-ID: 25822.1199850664@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

"John Smith" <sodgodofall(at)gmail(dot)com> writes:
>> It's pipelined either way. But int8 is a pass-by-reference data type,
>> and it sounds like we have a memory leak for this case.

> Thanks for your reply. How easy is it to fix this? Which portion of
> the code should we look to change?

I was just looking at that. The issue looks to me that nodeResult.c
(and other plan node types that support SRFs in their targetlists)
do this:

/*
* Check to see if we're still projecting out tuples from a previous scan
* tuple (because there is a function-returning-set in the projection
* expressions). If so, try to project another one.
*/
if (node->ps.ps_TupFromTlist)
{
resultSlot = ExecProject(node->ps.ps_ProjInfo, &isDone);
if (isDone == ExprMultipleResult)
return resultSlot;
/* Done with that source tuple... */
node->ps.ps_TupFromTlist = false;
}

/*
* Reset per-tuple memory context to free any expression evaluation
* storage allocated in the previous tuple cycle. Note this can't happen
* until we're done projecting out tuples from a scan tuple.
*/
ResetExprContext(econtext);

whereas there would be no memory leak if these two chunks of code were
in the other order. The question is whether resetting the context first
would throw away any data that we *do* still need for the repeated
ExecProject calls. That second comment block implies there's something
we do need.

I'm not sure why it's like this. Some digging in the CVS history shows
that indeed the code used to be in the other order, and I switched it
(and added the second comment block) in this old commit:

http://archives.postgresql.org/pgsql-committers/2000-08/msg00218.php

I suppose that the SQL-function support at the time required that its
calling memory context be persistent until it returned ExprEndResult,
but I sure don't recall any details. It's entirely possible that that
requirement no longer exists, or could easily be eliminated given all
the other changes that have happened since then. nodeFunctionscan.c
seems to reset the current context for each call of a SRF, so I'd think
that anything that can't cope with that should have been flushed out
by now.

If you feel like poking at this, I *strongly* recommend doing your
testing in an --enable-cassert build. You'll have no idea whether you
freed stuff too early if you don't have CLOBBER_FREED_MEMORY enabled.

regards, tom lane

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Anoo Sivadasan Pillai 2008-01-09 04:03:57 After Vacuum Analyse - Procedure performance not improved - Innner select is faster
Previous Message John Smith 2008-01-09 03:33:33 Re: Performance of aggregates over set-returning functions