Re: Function array_agg(array)

From: Ali Akbar <the(dot)apaan(at)gmail(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Function array_agg(array)
Date: 2014-10-23 02:00:58
Message-ID: CACQjQLqn27V32jBibu-m=6WhGSXEDBSjLny1=MSDkVVCSOwo=w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

2014-10-22 22:48 GMT+07:00 Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>:

> 2014-10-22 16:58 GMT+02:00 Ali Akbar <the(dot)apaan(at)gmail(dot)com>:
>
>> Thanks for the review
>>
>> 2014-10-22 20:51 GMT+07:00 Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>:
>>
>>> I agree with your proposal. I have a few comments to design:
>>>
>>
>>> 1. patch doesn't hold documentation and regress tests, please append it.
>>>
>> OK, i'll add the documentation and regression test
>>
>>
>>> 2. this functionality (multidimensional aggregation) can be interesting
>>> more times, so maybe some interface like array builder should be preferred.
>>>
>> We already have accumArrayResult and makeArrayResult/makeMdArrayResult,
>> haven't we?
>>
>> Actually array_agg(anyarray) can be implemented by using accumArrayResult
>> and makeMdArrayResult, but in the process we will deconstruct the array
>> data and null bit-flags into ArrayBuildState->dvalues and dnulls. And we
>> will reconstruct it through makeMdArrayResult. I think this way it's not
>> efficient, so i decided to reimplement it and memcpy the array data and
>> null flags as-is.
>>
>
> aha, so isn't better to fix a performance for accumArrayResult ?
>

Ok, i'll go this route. While reading the code of array(subselect) as you
pointed below, i think the easiest way to implement support for both
array_agg(anyarray) and array(subselect) is to change accumArrayResult so
it operates both with scalar datum(s) and array datums, with performance
optimization for the latter.

In other places, i think it's clearer if we just use accumArrayResult and
>> makeArrayResult/makeMdArrayResult as API for building (multidimensional)
>> arrays.
>>
>>
>>> 3. array_agg was consistent with array(subselect), so it should be fixed
>>> too
>>>
>>> postgres=# select array_agg(a) from test;
>>> array_agg
>>> -----------------------
>>> {{1,2,3,4},{1,2,3,4}}
>>> (1 row)
>>>
>>> postgres=# select array(select a from test);
>>> ERROR: could not find array type for data type integer[]
>>>
>>
>> I'm pretty new in postgresql development. Can you point out where is
>> array(subselect) implemented?
>>
>
> where you can start?
>
> postgres=# explain select array(select a from test);
> ERROR: 42704: could not find array type for data type integer[]
> LOCATION: exprType, nodeFuncs.c:116
>
> look to code ... your magic keyword is a ARRAY_SUBLINK .. search in
> postgresql sources this keyword
>

Found it. Thanks.

On a side note in postgres array type for integer[] is still integer[], but
in pg_type, integer[] has no array type. I propose we consider to change it
so array type of anyarray is itself (not in this patch, of course, because
it is a big change). Consider the following code in nodeFuncs.c:109

if (sublink->subLinkType == ARRAY_SUBLINK)
{
type = get_array_type(type);
if (!OidIsValid(type))
ereport(ERROR,
(errcode(ERRCODE_UNDEFINED_OBJECT),
errmsg("could not find array type for data type %s",
format_type_be(exprType((Node *) tent->expr)))));
}

to implement array(subselect) you pointed above, we must specially checks
for array types. Quick search on get_array_type returns 10 places.

I'll think more about this later. For this patch, i'll go without changes
in pg_type.h.

4. why you use a magic constant (64) there?
>>>
>>> + astate->abytes = 64 * (ndatabytes == 0 ? 1 : ndatabytes);
>>> + astate->aitems = 64 * nitems;
>>>
>>> + astate->nullbitmap = (bits8 *)
>>> + repalloc(astate->nullbitmap, (astate->aitems + 7) / 8);
>>>
>>
>> just follow the arbitrary size choosen in accumArrayResult (arrayfuncs.c):
>> astate->alen = 64; /* arbitrary starting array size */
>>
>> it can be any number not too small and too big. Too small, and we will
>> realloc shortly. Too big, we will end up wasting memory.
>>
>
> you can try to alloc 1KB instead as start -- it is used more times in
> Postgres. Then a overhead is max 1KB per agg call - what is acceptable.
>
> You take this value from accumArrayResult, but it is targeted for shorted
> scalars - you should to expect so any array will be much larger.
>

Ok.

Regards,
--
Ali Akbar

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2014-10-23 04:59:03 Re: Function array_agg(array)
Previous Message Peter Eisentraut 2014-10-23 01:45:52 Re: Simplify calls of pg_class_aclcheck when multiple modes are used