Re: PATCH: decreasing memory needlessly consumed by array_agg

From: Ali Akbar <the(dot)apaan(at)gmail(dot)com>
To: Tomas Vondra <tv(at)fuzzy(dot)cz>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PATCH: decreasing memory needlessly consumed by array_agg
Date: 2015-01-08 07:53:44
Message-ID: CACQjQLoYZeBBH3noRoRxafoMbxXpreBCe6JiyYN6iLTSm3Ep2A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

In the CF, the status becomes "Needs Review". Let's continue our discussion
of makeArrayResult* behavior if subcontext=false and release=true (more
below):
2014-12-22 8:08 GMT+07:00 Ali Akbar <the(dot)apaan(at)gmail(dot)com>:

>
> With this API, i think we should make it clear if we call initArrayResult
> with subcontext=false, we can't call makeArrayResult, but we must use
> makeMdArrayResult directly.
>
> Or better, we can modify makeArrayResult to release according to
> astate->private_cxt:
>
>> @@ -4742,7 +4742,7 @@ makeArrayResult(ArrayBuildState *astate,
>> dims[0] = astate->nelems;
>> lbs[0] = 1;
>>
>> - return makeMdArrayResult(astate, ndims, dims, lbs, rcontext, true);
>> + return makeMdArrayResult(astate, ndims, dims, lbs, rcontext,
>> astate->private_cxt);
>>
>
> Or else we implement what you suggest below (more comments below):
>
> Thinking about the 'release' flag a bit more - maybe we could do this
>> instead:
>>
>> if (release && astate->private_cxt)
>> MemoryContextDelete(astate->mcontext);
>> else if (release)
>> {
>> pfree(astate->dvalues);
>> pfree(astate->dnulls);
>> pfree(astate);
>> }
>>
>> i.e. either destroy the whole context if possible, and just free the
>> memory when using a shared memory context. But I'm afraid this would
>> penalize the shared memory context, because that's intended for cases
>> where all the build states coexist in parallel and then at some point
>> are all converted into a result and thrown away. Adding pfree() calls is
>> no improvement here, and just wastes cycles.
>>
>
> As per Tom's comment, i'm using "parent memory context" instead of "shared
> memory context" below.
>
> In the future, if some code writer decided to use subcontext=false, to
> save memory in cases where there are many array accumulation, and the
> parent memory context is long-living, current code can cause memory leak.
> So i think we should implement your suggestion (pfreeing astate), and warn
> the implication in the API comment. The API user must choose between
> release=true, wasting cycles but preventing memory leak, or managing memory
> from the parent memory context.
>
> In one possible use case, for efficiency maybe the caller will create a
> special parent memory context for all array accumulation. Then uses
> makeArrayResult* with release=false, and in the end releasing the parent
> memory context once for all.
>

As for the v6 patch:
- the patch applies cleanly to master
- make check is successfull
- memory benefit is still there
- performance benefit i think is negligible

Reviewing the code, found this:

> @@ -573,7 +578,22 @@ array_agg_array_transfn(PG_FUNCTION_ARGS)
> elog(ERROR, "array_agg_array_transfn called in non-aggregate
> context");
> }
>
> - state = PG_ARGISNULL(0) ? NULL : (ArrayBuildStateArr *)
> PG_GETARG_POINTER(0);
> +
> + if (PG_ARGISNULL(0))
> + {
> + Oid element_type = get_element_type(arg1_typeid);
> +
> + if (!OidIsValid(element_type))
> + ereport(ERROR,
> + (errcode(ERRCODE_DATATYPE_MISMATCH),
> + errmsg("data type %s is not an array type",
> + format_type_be(arg1_typeid))));
>

digging more, it looks like those code required because accumArrayResultArr
checks the element type:

> /* First time through --- initialize */
> Oid element_type = get_element_type(array_type);
>
> if (!OidIsValid(element_type))
> ereport(ERROR,
> (errcode(ERRCODE_DATATYPE_MISMATCH),
> errmsg("data type %s is not an array type",
> format_type_be(array_type))));
> astate = initArrayResultArr(array_type, element_type, rcontext,
> true);
>

I think it should be in initArrayResultArr, because it is an
initialization-only check, shouldn't it?

Regards,
--
Ali Akbar

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ashutosh Bapat 2015-01-08 08:07:27 Re: Transactions involving multiple postgres foreign servers
Previous Message Michael Paquier 2015-01-08 05:03:35 Re: Fillfactor for GIN indexes