Re: Patch: propose to include 3 new functions into intarray and intagg

Lists: pgsql-hackers
From: Gregory Stark <stark(at)enterprisedb(dot)com>
To: "Dmitry Koterov" <dmitry(at)koterov(dot)ru>
Cc: Postgres <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Patch: propose to include 3 new functions into intarray and intagg
Date: 2008-09-05 11:16:08
Message-ID: 87hc8u4zd2.fsf@oxford.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Regarding the patch listed on the commitfest "3 new functions into intarray
and intagg" (which I just noticed has a reviewer listed -- doh):

http://archives.postgresql.org/message-id/d7df81620808130429l2a75c895g5dd6fe8ae64cc23e@mail.gmail.com

I definitely like the int_array_append_aggregate function but I don't see
anything int[] specific about it. We should be able to have a generic
array_union() aggregate which uses the same IsA(fcinfo->context, AggState)
trick to scribble on its state variable. It don't even see any reason it
couldn't work for arrays of varlenas, though it would take a bit of
restructuring.

So I would be definitely for a adding this to core if it were rewritten to
work with generic arrays which, unless there are problems I'm not seeing, I
don't think would be very hard.

As far as detailed code commentary the only thing which jumps out at me is
that it's using MemoryContextAlloc to grow the array instead of repalloc which
seems like a waste. This isn't a new thing though, it was how intagg was
written and this patch just didn't change it.

I'm not against putting more functions into intagg and intarray and bidx and
the grouping/counting thing seem like they might be useful functionality. but
I have a feeling others might feel differently.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's RemoteDBA services!


From: Markus Wanner <markus(at)bluegap(dot)ch>
To: Gregory Stark <stark(at)enterprisedb(dot)com>
Cc: Dmitry Koterov <dmitry(at)koterov(dot)ru>, Postgres <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Patch: propose to include 3 new functions into intarray and intagg
Date: 2008-09-05 11:25:56
Message-ID: 48C11744.2010103@bluegap.ch
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

Gregory Stark wrote:
> Regarding the patch listed on the commitfest "3 new functions into intarray
> and intagg" (which I just noticed has a reviewer listed -- doh):

..well, just add your name as well, no?

> I definitely like the int_array_append_aggregate function but I don't see
> anything int[] specific about it. We should be able to have a generic
> array_union() aggregate which uses the same IsA(fcinfo->context, AggState)
> trick to scribble on its state variable. It don't even see any reason it
> couldn't work for arrays of varlenas, though it would take a bit of
> restructuring.

Yeah, the same idea was bugging me. Doesn't such code already exist?

> So I would be definitely for a adding this to core if it were rewritten to
> work with generic arrays which, unless there are problems I'm not seeing, I
> don't think would be very hard.
>
> As far as detailed code commentary the only thing which jumps out at me is
> that it's using MemoryContextAlloc to grow the array instead of repalloc which
> seems like a waste. This isn't a new thing though, it was how intagg was
> written and this patch just didn't change it.

Oh, good catch.

> I'm not against putting more functions into intagg and intarray and bidx and
> the grouping/counting thing seem like they might be useful functionality. but
> I have a feeling others might feel differently.

The naming 'bidx' seems a bit weired to me, but otherwise I'm also
optimistic about it.

Regards

Markus Wanner


From: Gregory Stark <stark(at)enterprisedb(dot)com>
To: Markus Wanner <markus(at)bluegap(dot)ch>
Cc: Dmitry Koterov <dmitry(at)koterov(dot)ru>, Postgres <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Patch: propose to include 3 new functions into intarray and intagg
Date: 2008-09-05 12:17:54
Message-ID: 87bpz24wi5.fsf@oxford.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Markus Wanner <markus(at)bluegap(dot)ch> writes:

> Hi,
>
> Gregory Stark wrote:
>> Regarding the patch listed on the commitfest "3 new functions into intarray
>> and intagg" (which I just noticed has a reviewer listed -- doh):
>
> ..well, just add your name as well, no?

Yeah, people should feel free to comment on items even if other people have or
are as well. It would have just been more useful for me to pick one that
didn't already have someone reading it is all.

>> As far as detailed code commentary the only thing which jumps out at me is
>> that it's using MemoryContextAlloc to grow the array instead of repalloc which
>> seems like a waste. This isn't a new thing though, it was how intagg was
>> written and this patch just didn't change it.
>
> Oh, good catch.

Actually on further thought what's going on is that it's resizing the array by
doubling its size when necessary. palloc/repalloc does that as well, so you're
getting two layers of extra space to handle reallocations. I think it's
simpler and cleaner to just repalloc just enough space at each growth and let
repalloc handle allocating extra space to handle future growth. I think that
would be necessary for handling varlenas also.

> The naming 'bidx' seems a bit weired to me, but otherwise I'm also optimistic
> about it.

If we wanted to put that in core it would make more sense to have a flag on
the array indicating whether it's sorted or not which is maintained whenever
we construct or alter an array. Then just have the regular _int_contains()
(which is an operator @>) take advantage of it if it's set and the data type
is fixed-size.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's On-Demand Production Tuning


From: Markus Wanner <markus(at)bluegap(dot)ch>
To: Gregory Stark <stark(at)enterprisedb(dot)com>
Cc: Dmitry Koterov <dmitry(at)koterov(dot)ru>, Postgres <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Patch: propose to include 3 new functions into intarray and intagg
Date: 2008-09-05 12:35:24
Message-ID: 48C1278C.2020504@bluegap.ch
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

Gregory Stark wrote:
>> The naming 'bidx' seems a bit weired to me, but otherwise I'm also optimistic
>> about it.
>
> If we wanted to put that in core

Uh.. ATM it's just a patch against contrib. I don't think 'bidx' needs
to go into core. Otherwise we'd have to move the whole intarr contrib
module as well, no?

> it would make more sense to have a flag on
> the array indicating whether it's sorted or not which is maintained whenever
> we construct or alter an array. Then just have the regular _int_contains()
> (which is an operator @>) take advantage of it if it's set and the data type
> is fixed-size.

Yeah, that sounds like a good thing. Currently, the intarr module
doesn't provide the optimized functions for the outside world
(_int_inter_inner() and such.. the _inner appendix really means "inside
intarr module only). I've ended up copy'n'pasting that code into my own
module, where I take care about ordering myself. But still want to
maintain the optimization.

However, that's probably not within the scope of this patch.

Regards

Markus Wanner


From: Markus Wanner <markus(at)bluegap(dot)ch>
To: Gregory Stark <stark(at)enterprisedb(dot)com>
Cc: Dmitry Koterov <dmitry(at)koterov(dot)ru>, Postgres <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Patch: propose to include 3 new functions into intarray and intagg
Date: 2008-09-05 12:40:26
Message-ID: 48C128BA.5050503@bluegap.ch
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

Gregory Stark wrote:
> I definitely like the int_array_append_aggregate function but I don't see
> anything int[] specific about it. We should be able to have a generic
> array_union() aggregate which uses the same IsA(fcinfo->context, AggState)
> trick to scribble on its state variable. It don't even see any reason it
> couldn't work for arrays of varlenas, though it would take a bit of
> restructuring.

I've just noticed that this already is a todo item:

"Add array_accum() and array_to_set() functions for arrays

The standards specify array_agg() and UNNEST."

See also:
http://archives.postgresql.org/pgsql-hackers/2007-08/msg00464.php

Regards

Markus Wanner