proposal for 8.5, listagg aggregate function, WIP patch

Lists: pgsql-hackers
From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: proposal for 8.5, listagg aggregate function, WIP patch
Date: 2009-12-25 18:29:50
Message-ID: 162867790912251029h2d0a7c5co13ae2a683ea521be@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hello

I am returning to discussion
http://archives.postgresql.org/pgsql-hackers/2009-12/msg01378.php

I propose a new aggregate function - listagg. This function
concatenate values to string. If this function is used with two
parameters, then second parameter is used as delimiter. NULL input
values are ignored like other aggregates. If all values are NULL, then
result is NULL. When delimiter is omitted, then values are
concatenated without any delimiter.

This function could be replaced with
array_to_string(array_agg(),delimiter). It has same functionality, but
different implementation. listagg should be (and it is) faster,
because hasn't array overhead. In my tests - listagg is about 25%
faster. Mainly, listagg is shorter. Because we cannot well wrap
aggregates, I propose integrate this function. There are precedent -
function generate_sequences. It should be replaced by
generate_series(array_lower(), array_upper()), but it hasn't same
effectiveness.

Using:

postgres=# select * from country ;
town | state
------------+-------
Prague | cs
Brno | cs
Bratislava | sk
Kosice | sk
(4 rows)

postgres=# select listagg(town,',') from country group by state;
listagg
-------------------
Bratislava,Kosice
Prague,Brno
(2 rows)

Comments?

Regards
Pavel Stehule

Attachment Content-Type Size
listagg.diff text/x-patch 10.7 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: proposal for 8.5, listagg aggregate function, WIP patch
Date: 2009-12-25 21:29:16
Message-ID: 10960.1261776556@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> writes:
> I propose a new aggregate function - listagg. This function
> concatenate values to string. If this function is used with two
> parameters, then second parameter is used as delimiter. NULL input
> values are ignored like other aggregates. If all values are NULL, then
> result is NULL. When delimiter is omitted, then values are
> concatenated without any delimiter.

The main objection I have to this proposal is the name: listagg seems
pretty horrid. It's got nothing to do with lists, and it doesn't even
attempt to be consistent with other existing function names.

I can see a couple of different approaches that might make sense
for choosing a better name. One is "something_to_string", though
I'm not sure what "something" should be --- maybe "rows" or "set"?
The other approach is to name it something based on concat() on the
grounds that it's a form of concatenation, and we do have "concat"
in the standard in the guise of XMLCONCAT. For instance there's some
case for concat_agg() by analogy to array_agg(); though personally
I think array_agg() is a horrid name too and not one of the SQL
committee's better efforts.

regards, tom lane


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: proposal for 8.5, listagg aggregate function, WIP patch
Date: 2009-12-25 22:10:05
Message-ID: 162867790912251410i1a54ad25ib2757dd714697e@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2009/12/25 Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>:
> Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> writes:
>> I propose a new aggregate function - listagg. This function
>> concatenate values to string. If this function is used with two
>> parameters, then second parameter is used as delimiter. NULL input
>> values are ignored like other aggregates. If all values are NULL, then
>> result is NULL. When delimiter is omitted, then values are
>> concatenated without any delimiter.
>
> The main objection I have to this proposal is the name: listagg seems
> pretty horrid.  It's got nothing to do with lists, and it doesn't even
> attempt to be consistent with other existing function names.

It is list - on application level - without arrays - was used list of
values separated by comma. I don't have a experience with Oracle. But
I know this term in this meaning from MSSQL.

>
> I can see a couple of different approaches that might make sense
> for choosing a better name.  One is "something_to_string", though
> I'm not sure what "something" should be --- maybe "rows" or "set"?
> The other approach is to name it something based on concat() on the
> grounds that it's a form of concatenation, and we do have "concat"
> in the standard in the guise of XMLCONCAT.  For instance there's some
> case for concat_agg() by analogy to array_agg(); though personally
> I think array_agg() is a horrid name too and not one of the SQL
> committee's better efforts.

I don't know, who though up named this function in Oracle. This
functionality is known as "listagg" (Oracle) or "group_concat"
(MySQL). I don't thing we need a third name for it. group_concat has
enhanced syntax:

GROUP_CONCAT([DISTINCT] expr [,expr ...]
[ORDER BY {unsigned_integer | col_name | expr}
[ASC | DESC] [,col_name ...]]
[SEPARATOR str_val])

there are keyword SEPARATOR. Oracle's syntax is nearer to standard
PostgreSQL's syntax

http://download.oracle.com/docs/cd/E11882_01/server.112/e10592/functions087.htm

But Oracle has own syntax too. After some experience today, I see some
advantage Oracle's syntax over standard :).

This function isn't standardised now, so we can use any name. I vote
name used in Oracle db.

Regards
Pavel Stehule

>
>                        regards, tom lane
>