Re: Function array_agg(array)

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Ali Akbar <the(dot)apaan(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Function array_agg(array)
Date: 2014-10-22 15:48:17
Message-ID: CAFj8pRDurJzmwuBBvOKY7DQUE4fvGXFNbuT2VfLTdraQS53PdQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

2014-10-22 16:58 GMT+02:00 Ali Akbar <the(dot)apaan(at)gmail(dot)com>:

> Thanks for the review
>
> 2014-10-22 20:51 GMT+07:00 Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>:
>
>> I agree with your proposal. I have a few comments to design:
>>
>
>> 1. patch doesn't hold documentation and regress tests, please append it.
>>
> OK, i'll add the documentation and regression test
>
>
>> 2. this functionality (multidimensional aggregation) can be interesting
>> more times, so maybe some interface like array builder should be preferred.
>>
> We already have accumArrayResult and makeArrayResult/makeMdArrayResult,
> haven't we?
>
> Actually array_agg(anyarray) can be implemented by using accumArrayResult
> and makeMdArrayResult, but in the process we will deconstruct the array
> data and null bit-flags into ArrayBuildState->dvalues and dnulls. And we
> will reconstruct it through makeMdArrayResult. I think this way it's not
> efficient, so i decided to reimplement it and memcpy the array data and
> null flags as-is.
>

aha, so isn't better to fix a performance for accumArrayResult ?

>
> In other places, i think it's clearer if we just use accumArrayResult and
> makeArrayResult/makeMdArrayResult as API for building (multidimensional)
> arrays.
>
>
>> 3. array_agg was consistent with array(subselect), so it should be fixed
>> too
>>
>> postgres=# select array_agg(a) from test;
>> array_agg
>> -----------------------
>> {{1,2,3,4},{1,2,3,4}}
>> (1 row)
>>
>> postgres=# select array(select a from test);
>> ERROR: could not find array type for data type integer[]
>>
>
> I'm pretty new in postgresql development. Can you point out where is
> array(subselect) implemented?
>

where you can start?

postgres=# explain select array(select a from test);
ERROR: 42704: could not find array type for data type integer[]
LOCATION: exprType, nodeFuncs.c:116

look to code ... your magic keyword is a ARRAY_SUBLINK .. search in
postgresql sources this keyword

>
>
>
>> 4. why you use a magic constant (64) there?
>>
>> + astate->abytes = 64 * (ndatabytes == 0 ? 1 : ndatabytes);
>> + astate->aitems = 64 * nitems;
>>
>> + astate->nullbitmap = (bits8 *)
>> + repalloc(astate->nullbitmap, (astate->aitems + 7) / 8);
>>
>
> just follow the arbitrary size choosen in accumArrayResult (arrayfuncs.c):
> astate->alen = 64; /* arbitrary starting array size */
>
> it can be any number not too small and too big. Too small, and we will
> realloc shortly. Too big, we will end up wasting memory.
>

you can try to alloc 1KB instead as start -- it is used more times in
Postgres. Then a overhead is max 1KB per agg call - what is acceptable.

You take this value from accumArrayResult, but it is targeted for shorted
scalars - you should to expect so any array will be much larger.

> Regards,
> --
> Ali Akbar
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2014-10-22 16:21:40 idea: allow AS label inside ROW constructor
Previous Message Kevin Grittner 2014-10-22 15:46:42 Re: Question about RI checks