Re: proposal: fix corner use case of variadic fuctions usage

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Vik Reykja <vikreykja(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: proposal: fix corner use case of variadic fuctions usage
Date: 2013-01-18 16:55:16
Message-ID: 9185.1358528116@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

Stephen Frost <sfrost(at)snowman(dot)net> writes:
> [ quick review of patch ]

On reflection it seems to me that this is probably not a very good
approach overall. Our general theory for functions taking ANY has
been that the core system just computes the arguments and leaves it
to the function to make sense of them. Why should this be different
in the one specific case of VARIADIC ANY with a variadic array?

The approach is also inherently seriously inefficient. Not only
does ExecMakeFunctionResult have to build a separate phony Const
node for each array element, but the variadic function itself can
have no idea that those Consts are all the same type, which means
it's going to do datatype lookup over again for each array element.
(format(), for instance, would have to look up the same type output
function over and over again.) This might not be noticeable on toy
examples with a couple of array entries, but it'll be a large
performance problem on large arrays.

The patch also breaks any attempt that a variadic function might be
making to cache info about its argument types across calls. I suppose
that the copying of the FmgrInfo is a hack to avoid crashes due to
called functions supposing that their flinfo->fn_extra data is still
valid for the new argument set. Of course that's another pretty
significant performance hit, compounding the per-element hit. Whereas
an ordinary variadic function that is given N arguments in a particular
query will probably do N datatype lookups and cache the info for the
life of the query, a variadic function called with this approach will
do one datatype lookup for each array element in each row of the query;
and there is no way to optimize that.

But large arrays have a worse problem: the approach flat out does
not work for arrays of more than FUNC_MAX_ARGS elements, because
there's no place to put their values in the FunctionCallInfo struct.
(As submitted, if the array is too big the patch will blithely stomp
all over memory with user-supplied data, making it not merely a crash
risk but probably an exploitable security hole.)

This last problem is, so far as I can see, unfixable within this
approach; surely we are not going to accept a feature that only works
for small arrays. So I am going to mark the CF item rejected not just
RWF.

I believe that a workable approach would require having the function
itself act differently when the VARIADIC flag is set. Currently there
is no way for ANY-taking functions to do this because we don't record
the use of the VARIADIC flag in FuncExpr, but it'd be reasonable to
change that, and probably then add a function similar to
get_fn_expr_rettype to dig it out of the FmgrInfo. I don't know if
we could usefully provide any infrastructure beyond that for the case,
but it'd be worth thinking about whether there's any common behavior
for such functions.

regards, tom lane

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Robert James 2013-01-18 17:29:46 Temp table's effect on performance
Previous Message Igor Neyman 2013-01-18 16:37:22 Re: speeding up a join query that utilizes a view

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2013-01-18 17:08:57 Re: Event Triggers: adding information
Previous Message Robert Haas 2013-01-18 16:48:43 Re: logical changeset generation v4