Re: Function array_agg(array)

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Ali Akbar <the(dot)apaan(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Function array_agg(array)
Date: 2014-10-24 08:48:03
Message-ID: CAFj8pRCue-7X+gBZBXx6MZOQq7+Qe2+Q2PejoyfK6i-NiEFt9A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi

it looks well

doc:
http://www.postgresql.org/docs/9.4/static/sql-expressions.html#SQL-SYNTAX-ARRAY-CONSTRUCTORS
it should be fixed too

Regards

Pavel

2014-10-24 10:24 GMT+02:00 Ali Akbar <the(dot)apaan(at)gmail(dot)com>:

> Updated patch attached.
>
> 2014-10-22 20:51 GMT+07:00 Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>:
>>>>>
>>>>>> I agree with your proposal. I have a few comments to design:
>>>>>>
>>>>>
>>>>>> 1. patch doesn't hold documentation and regress tests, please append
>>>>>> it.
>>>>>>
>>>>>
> i've added some regression tests in arrays.sql and aggregate.sql.
>
> I've only found the documentation of array_agg. Where is the doc for
> array(subselect) defined?
>
>
>> 2. this functionality (multidimensional aggregation) can be interesting
>>>>>> more times, so maybe some interface like array builder should be preferred.
>>>>>>
>>>>> We already have accumArrayResult and
>>>>> makeArrayResult/makeMdArrayResult, haven't we?
>>>>>
>>>>> Actually array_agg(anyarray) can be implemented by using
>>>>> accumArrayResult and makeMdArrayResult, but in the process we will
>>>>> deconstruct the array data and null bit-flags into ArrayBuildState->dvalues
>>>>> and dnulls. And we will reconstruct it through makeMdArrayResult. I think
>>>>> this way it's not efficient, so i decided to reimplement it and memcpy the
>>>>> array data and null flags as-is.
>>>>>
>>>>
>>>> aha, so isn't better to fix a performance for accumArrayResult ?
>>>>
>>>
>>> Ok, i'll go this route. While reading the code of array(subselect) as
>>> you pointed below, i think the easiest way to implement support for both
>>> array_agg(anyarray) and array(subselect) is to change accumArrayResult so
>>> it operates both with scalar datum(s) and array datums, with performance
>>> optimization for the latter.
>>>
>>
> implemented it by modifying ArrayBuildState to ArrayBuildStateArray and
> ArrayBuildStateScalar (do you have any idea for better struct naming?)
>
>
>> In other places, i think it's clearer if we just use accumArrayResult and
>>>>> makeArrayResult/makeMdArrayResult as API for building (multidimensional)
>>>>> arrays.
>>>>>
>>>>>
>>>>>> 3. array_agg was consistent with array(subselect), so it should be
>>>>>> fixed too
>>>>>>
>>>>>> postgres=# select array_agg(a) from test;
>>>>>> array_agg
>>>>>> -----------------------
>>>>>> {{1,2,3,4},{1,2,3,4}}
>>>>>> (1 row)
>>>>>>
>>>>>> postgres=# select array(select a from test);
>>>>>> ERROR: could not find array type for data type integer[]
>>>>>>
>>>>>
>>>>> I'm pretty new in postgresql development. Can you point out where is
>>>>> array(subselect) implemented?
>>>>>
>>>>
>>>> where you can start?
>>>>
>>>> postgres=# explain select array(select a from test);
>>>> ERROR: 42704: could not find array type for data type integer[]
>>>> LOCATION: exprType, nodeFuncs.c:116
>>>>
>>>> look to code ... your magic keyword is a ARRAY_SUBLINK .. search in
>>>> postgresql sources this keyword
>>>>
>>>
>> attention: probably we don't would to allow arrays everywhere.
>>
>
> I've changed the places where i think it's appropriate.
>
>
>> 4. why you use a magic constant (64) there?
>>>>>>
>>>>>> + astate->abytes = 64 * (ndatabytes == 0 ? 1 : ndatabytes);
>>>>>> + astate->aitems = 64 * nitems;
>>>>>>
>>>>>> + astate->nullbitmap = (bits8 *)
>>>>>> + repalloc(astate->nullbitmap, (astate->aitems + 7) /
>>>>>> 8);
>>>>>>
>>>>>
>>>>> just follow the arbitrary size choosen in accumArrayResult
>>>>> (arrayfuncs.c):
>>>>> astate->alen = 64; /* arbitrary starting array size */
>>>>>
>>>>> it can be any number not too small and too big. Too small, and we will
>>>>> realloc shortly. Too big, we will end up wasting memory.
>>>>>
>>>>
>>>> you can try to alloc 1KB instead as start -- it is used more times in
>>>> Postgres. Then a overhead is max 1KB per agg call - what is acceptable.
>>>>
>>>> You take this value from accumArrayResult, but it is targeted for
>>>> shorted scalars - you should to expect so any array will be much larger.
>>>>
>>>
> this patch allocates 1KB (1024 bytes) if the ndatabytes is < 512bytes. If
> it is larger, it allocates 4 * size. For nullbitmap, it allocates 4 *
> number of items in array.
>
> Regards,
> --
> Ali Akbar
>
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ali Akbar 2014-10-24 09:24:32 Re: Function array_agg(array)
Previous Message Borodin Vladimir 2014-10-24 08:40:42 Re: ExclusiveLock on extension of relation with huge shared_buffers