Re: Re: patch: fix performance problems with repated decomprimation of varlena values in plpgsql

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Noah Misch <noah(at)leadboat(dot)com>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: patch: fix performance problems with repated decomprimation of varlena values in plpgsql
Date: 2011-02-04 19:49:53
Message-ID: AANLkTimdLgEs8+1wDH7bb9n6aVcgMQhzscZG=mr_r4XX@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

2011/2/4 Robert Haas <robertmhaas(at)gmail(dot)com>:
> On Fri, Jan 28, 2011 at 2:19 PM, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:
>> 2011/1/28 Robert Haas <robertmhaas(at)gmail(dot)com>:
>>> On Tue, Jan 25, 2011 at 11:29 AM, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:
>>>> variant a) my first patch - detoast on first usage with avoiding to
>>>> useless detoast checking
>>>> variant b) my first patch - detoast on first usage without avoiding to
>>>> useless detoast checking
>>>>
>>>> time for 1 - about 300 ms, a is bout 1.5% faster than b
>>>> time for 2 - about 30000 ms, a is about 3% faster than b
>>>
>>> This makes your approach sound pretty good, but it sounds like we
>>> might need to find a better way to structure the code.
>>>
>>
>> do you have a any idea?
>
> At first blush, the should_be_detoasted flag looks completely
> unnecessary to me.  I don't see why we have to set a flag in one part
> of the code to tell some other part of the code whether or not it
> should detoast.  Why is that necessary or a good idea?  Why can't we
> just make the decision at the point where we're detoasting?

a logic about detoasting isn't exactly simple, and you see on tests,
so "bypass" save a 3%. I can't to say, if it's much or less. Just take
a 3%. Isn't nothing simpler than remove these flags.

>
> Also, I believe I'm agreeing with Tom when I say that
> exec_eval_datum() doesn't look like the right place to do this.  Right
> now that function is a very simple switch, and I've found that it's
> usually a bad idea to stick complex code into the middle of such
> things.  It looks like function only has three callers, so maybe we
> should look at each caller individually and see whether it needs this
> logic or not.  For example, I'm guessing make_tuple_from_row()
> doesn't, because it's only calling exec_eval_datum() so that it can
> pass the results to heap_form_tuple(), which already contains some
> logic to detoast in certain cases.  It's not clear why we'd want
> different logic here than we do anywhere else.  The other callers are
> exec_assign_value(), where this looks like it could be important to
>

first we have to decide when we will do a detoasting - there are two variants

a) when we write a toasted data to variable
b) when we use a toasted data

@a needs to modify: copy parameters and exec_assign_value,
@b needs to modify plpgsql_param_fetch or exec_eval_datum. Really
important is plpgsql_param_fetch, because it is only point, where we
know, so some Datum will be used, but wasn't detoasted yet. Problem is
"wrong" memory context. And plpgsql_param_fetch is callback, that is
emitted from different SPI functions so is hard to move a some lines
to function, that will run under good context.

There isn't necessary to detoast Datum, when value is only copied.

The disadvantage of @a can be so we remove 95% of pathologic cases,
and we create a 5% new. @b is terrible because the most important
point (main work) is done under parser memory context. I don't
understand you what is complex on (this code can be moved to
function). But @b needs only one place for detosting, because it's
nice centralised, that you and Tom dislike.

oldctx = MemoryContextSwitchTo(fcecxt);
if (!var->typbyval && var->typlen == -1)
x = detoast_datum(var);
if (x != var)
{
pfree(var)
var = x;
}
MemoryContextSwitchTo(oldctx);

You can put this code to plpgsql_param_fetch or exec_eval_datum or
you have to copy this code two or three times.

I have not a idea how to design it well to by you and Tom happy :(

When I thinking about it, then I am sure, so best solution is really
global cache of detoasted values. It cannot be well solved on local
area. On local area I can do correct decision only when I know, so I
am inside a cycle and I work with some Datum repeatedly. Detoasting
on assign can has impacts on FOR stmt, iteration over cursors ...

FOR a,b,c IN SELECT * FROM foo
LOOP
if a THEN CONTINUE; END IF;
process b,c -- b and c are toastable values.
END LOOP;

> avoid repeatedly detoasting the array, and plpgsql_param_fetch(),
> which I'm not sure about, but perhaps you have an idea or can
> benchmark it.

It's hard to benchmark it. I can create a use case, where @a win or @b
win. I believe so in almost all cases @a or @b will be better than
current state. And I afraid, so there can be situation where implicit
detoast can be bad.

Regards

Pavel Stehule

p.s. as one idea is a flag for function, that can to specify, if
inside function can be detoasting lazy or aggressive

.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dimitri Fontaine 2011-02-04 19:51:38 Re: Use a separate pg_depend.deptype for extension membership?
Previous Message Tom Lane 2011-02-04 19:48:44 Re: Extensions support for pg_dump, patch v27