Re: multiset patch review

Lists: pgsql-hackers
From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Cc: Takahiro Itagaki <itagaki(dot)takahiro(at)gmail(dot)com>
Subject: multiset patch review
Date: 2011-01-09 19:13:58
Message-ID: AANLkTimpPMC8Qt6v9eeMs7b_MC5+zakTmuCubiQsYy+H@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Patching:

patching file doc/src/sgml/func.sgml
Hunk #6 succeeded at 10567 (offset 1 line).
Hunk #7 succeeded at 10621 (offset 1 line).
Hunk #8 succeeded at 10721 (offset 1 line).
Hunk #9 succeeded at 10775 (offset 1 line).
patching file src/backend/nodes/makefuncs.c
patching file src/backend/parser/gram.y
patching file src/backend/utils/adt/array_userfuncs.c
patching file src/include/catalog/pg_aggregate.h
patching file src/include/catalog/pg_proc.h
patching file src/include/nodes/makefuncs.h
patching file src/include/parser/kwlist.h
patching file src/include/utils/array.h
Hunk #1 succeeded at 281 (offset 1 line).
patching file src/test/regress/expected/arrays.out
Hunk #1 succeeded at 1558 (offset 272 lines).
patching file src/test/regress/sql/arrays.sql
Hunk #1 succeeded at 438 (offset 12 lines).

Compilation:

there are no warning related to patch

regress tests failed

*** 1639,1647 ****
..
SELECT ARRAY[1, 2] SUBMULTISET OF ARRAY[1, NULL],
ARRAY[1, 2] SUBMULTISET OF ARRAY[3, NULL];
! submultiset_of | submultiset_of ^M
! ----------------+----------------^M
! | f^M
(1 row)
..
SELECT ARRAY[1, NULL] SUBMULTISET OF ARRAY[1, NULL];
--- 1639,1647 ----
..
SELECT ARRAY[1, 2] SUBMULTISET OF ARRAY[1, NULL],
ARRAY[1, 2] SUBMULTISET OF ARRAY[3, NULL];
! submultiset_of | submultiset_of.
! ----------------+----------------
! | f
(1 row)
..
SELECT ARRAY[1, NULL] SUBMULTISET OF ARRAY[1, NULL];

There is often used a fragment

+ <----><------>fn.arg[0] = values1[n1];
+ <----><------>fn.arg[1] = values2[n2];
+ <----><------>fn.argnull[0] = false;
+ <----><------>fn.argnull[1] = false;
+ <----><------>fn.isnull = false;
+ <----><------>r = DatumGetInt32(FunctionCallInvoke(&fn));

it can be moved to procedure?

Doc and regress tests are ok.

I see only one issue. There isn't documented, what is a MULTISET?

Regards

Pavel Stehule


From: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: multiset patch review
Date: 2011-01-12 04:52:12
Message-ID: AANLkTin_xGK8V9=ik8nRDaSoxxbbyr_Y3w2cEsiaZhew@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Thank you for the review.

On Mon, Jan 10, 2011 at 04:13, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:
> regress tests failed
Fixed.

> There is often used a fragment
> + <----><------>fn.arg[0] = values1[n1];
> + <----><------>fn.arg[1] = values2[n2];
> + <----><------>fn.argnull[0] = false;
> + <----><------>fn.argnull[1] = false;
> + <----><------>fn.isnull = false;
> + <----><------>r = DatumGetInt32(FunctionCallInvoke(&fn));
> it can be moved to procedure?

Agreed. I use FunctionCall2() instead of the fragments.

> I see only one issue. There isn't documented, what is a MULTISET?

I added a short description about MULTISET and example of operators
in "Arrays > 8.14.7. Multiset Support" section in the docs.
Is it enough? or what kind of information do you want?

Separate patches for src and doc attached. It includes a few bug fixes
and cleanup. I changed the error code in trim_array() to
ERRCODE_ARRAY_ELEMENT_ERROR according to the spec.

--
Itagaki Takahiro

Attachment Content-Type Size
multiset-doc-20110112.patch application/octet-stream 14.3 KB
multiset-src-20110112.patch application/octet-stream 46.2 KB

From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: multiset patch review
Date: 2011-01-12 06:21:10
Message-ID: 1294813270.4389.1.camel@vanquo.pezone.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On ons, 2011-01-12 at 13:52 +0900, Itagaki Takahiro wrote:
> I added a short description about MULTISET and example of operators
> in "Arrays > 8.14.7. Multiset Support" section in the docs.
> Is it enough? or what kind of information do you want?
>
> Separate patches for src and doc attached. It includes a few bug fixes
> and cleanup. I changed the error code in trim_array() to
> ERRCODE_ARRAY_ELEMENT_ERROR according to the spec.
>

You may want to read this thread about the cardinality function are you
trying to add:

http://archives.postgresql.org/pgsql-hackers/2009-02/msg01388.php

Also, what happened to the idea of a separate MULTISET type?


From: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: multiset patch review
Date: 2011-01-12 10:05:47
Message-ID: AANLkTi=HC3dJapgLqGSGbHeXVWNZnqvYXBYWswjm_Txm@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Jan 12, 2011 at 15:21, Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:
> You may want to read this thread about the cardinality function are you
> trying to add:
>
> http://archives.postgresql.org/pgsql-hackers/2009-02/msg01388.php

Since our archive is split per month, this might be more readable:
http://postgresql.1045698.n5.nabble.com/cardinality-td2003172.html

We've discussed what number should cardinality() returns:
#1. The total number of elements. (It's currently implemented.)
#2. The length of the first dimension.
It's as same as array_length(array, 1) .

I prefer #1 because we have no easy way to retrieve the number.
array_dims() returns similar numbers, but calculate the total
number is a bit complex.

If we will support array of arrays (jugged array), cardinality()
can return the number of elements in the most outer array.
It's similar definition in multi-dimensional arrays in C#,
that has both array of arrays and multi-dimensional arrays.

http://msdn.microsoft.com/library/system.array.length(v=VS.100).aspx

We can compare those SQL functions and C# array methods:
* cardinality(array) <--> array.Length
* array_length(array. dim) <--> array.GetLength(dim)

> Also, what happened to the idea of a separate MULTISET type?

I don't have any plans to implement dedicated MULTISET type for now
because almost all functions and operators can work also for arrays.
If we have a true MULTISET data type, we can overload them with
MULTISET arguments.

One exception might be collect() aggregate function because
we might need to change the result type from array to multiset.
collect(anyelement) => anyarray for now
Note that fusion() won't be an issue because we can overload it:
fusion(anyarray) => anyarray and (anymultiset) => anymultiset

--
Itagaki Takahiro


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: multiset patch review
Date: 2011-01-12 11:18:45
Message-ID: AANLkTimVmho=VG_VTuOeQTdMQnvynuOsaVpNMtcmPX+d@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hello

there is one issue - probably useless checking a type equality in
function check_comparable and check_concatinatable, because when your
function is registrated with arguments (anyarray, anyarray), then is
guaranteed so type of array1 is same as type of array2, and then you
don't need to check.

Regards

Pavel Stehule

2011/1/12 Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>:
> Thank you for the review.
>
> On Mon, Jan 10, 2011 at 04:13, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:
>> regress tests failed
> Fixed.
>
>> There is often used a fragment
>> + <----><------>fn.arg[0] = values1[n1];
>> + <----><------>fn.arg[1] = values2[n2];
>> + <----><------>fn.argnull[0] = false;
>> + <----><------>fn.argnull[1] = false;
>> + <----><------>fn.isnull = false;
>> + <----><------>r = DatumGetInt32(FunctionCallInvoke(&fn));
>> it can be moved to procedure?
>
> Agreed. I use FunctionCall2() instead of the fragments.
>
>> I see only one issue. There isn't documented, what is a MULTISET?
>
> I added a short description about MULTISET and example of operators
> in "Arrays > 8.14.7. Multiset Support" section in the docs.
> Is it enough? or what kind of information do you want?
>
> Separate patches for src and doc attached. It includes a few bug fixes
> and cleanup. I changed the error code in trim_array() to
> ERRCODE_ARRAY_ELEMENT_ERROR according to the spec.
>
> --
> Itagaki Takahiro
>


From: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: multiset patch review
Date: 2011-01-12 11:35:50
Message-ID: AANLkTimJZcQpGua=3snpX4vMisgwWke7Bxk3OP-GRc_T@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Jan 12, 2011 at 20:18, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:
> there is one issue - probably useless checking a type equality in
> function check_comparable and check_concatinatable, because when your
> function is registrated with arguments (anyarray, anyarray), then is
> guaranteed so type of array1 is same as type of array2, and then you
> don't need to check.

It's true for almost all cases, but we have "anyarray" columns in
pg_statistic.stavaluesN. When we pass them to those array functions,
element types of two anyarrays could be different.
I guess they are protections only for them.

=# SELECT A.stavalues1 SUBMULTISET OF B.stavalues1
FROM pg_statistic A, pg_statistic B
WHERE A.stakind1 = 2 AND B.stakind1 = 2;
ERROR: cannot compare incompatible arrays
DETAIL: Arrays with element types name and oid[] are not compatible
for comparison.

--
Itagaki Takahiro


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: multiset patch review
Date: 2011-01-12 12:07:22
Message-ID: AANLkTimTX=OscfZ1_CWUVcGCLAh6R+8JpGsYpQiHOjih@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2011/1/12 Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>:
> On Wed, Jan 12, 2011 at 20:18, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:
>> there is one issue - probably useless checking a type equality in
>> function check_comparable and check_concatinatable, because when your
>> function is registrated with arguments (anyarray, anyarray), then is
>> guaranteed so type of array1 is same as type of array2, and then you
>> don't need to check.
>
> It's true for almost all cases, but we have "anyarray" columns in
> pg_statistic.stavaluesN. When we pass them to those array functions,
> element types of two anyarrays could be different.
> I guess they are protections only for them.
>
> =# SELECT A.stavalues1 SUBMULTISET OF B.stavalues1
>   FROM pg_statistic A, pg_statistic B
>   WHERE A.stakind1 = 2 AND B.stakind1 = 2;
> ERROR:  cannot compare incompatible arrays
> DETAIL:  Arrays with element types name and oid[] are not compatible
> for comparison.
>

ook

Pavel

> --
> Itagaki Takahiro
>


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: multiset patch review
Date: 2011-01-12 12:10:52
Message-ID: AANLkTi=oTXK==+AWH9tTrVNz281qOFXbXiHgEGhwK16d@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2011/1/12 Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>:
> 2011/1/12 Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>:
>> On Wed, Jan 12, 2011 at 20:18, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:
>>> there is one issue - probably useless checking a type equality in
>>> function check_comparable and check_concatinatable, because when your
>>> function is registrated with arguments (anyarray, anyarray), then is
>>> guaranteed so type of array1 is same as type of array2, and then you
>>> don't need to check.
>>
>> It's true for almost all cases, but we have "anyarray" columns in
>> pg_statistic.stavaluesN. When we pass them to those array functions,
>> element types of two anyarrays could be different.
>> I guess they are protections only for them.
>>
>> =# SELECT A.stavalues1 SUBMULTISET OF B.stavalues1
>>   FROM pg_statistic A, pg_statistic B
>>   WHERE A.stakind1 = 2 AND B.stakind1 = 2;
>> ERROR:  cannot compare incompatible arrays
>> DETAIL:  Arrays with element types name and oid[] are not compatible
>> for comparison.
>>
>
> ook

so I think it is ready for commit

Regards

Pavel Stehule
>
> Pavel
>
>> --
>> Itagaki Takahiro
>>
>


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: multiset patch review
Date: 2011-01-12 14:29:37
Message-ID: 1294842472-sup-230@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Excerpts from Itagaki Takahiro's message of mié ene 12 01:52:12 -0300 2011:

> Separate patches for src and doc attached. It includes a few bug fixes
> and cleanup. I changed the error code in trim_array() to
> ERRCODE_ARRAY_ELEMENT_ERROR according to the spec.

Two small nitpicks:

+ static void
+ check_concatinatable(Oid element_type1, Oid element_type2)
+ {
+ if (element_type1 != element_type2)
+ ereport(ERROR,
+ (errcode(ERRCODE_DATATYPE_MISMATCH),
+ errmsg("cannot concatenate incompatible arrays"),
+ errdetail("Arrays with element types %s and %s are not "
+ "compatible for concatenation.",
+ format_type_be(element_type1),
+ format_type_be(element_type2))));
+ }

I think the word is either "concatenable" or "concatenatable". Also
please don't break up the string in errdetail() even if it's longer than
80 chars. (The function below this one has this too)

I didn't read the patch in much more detail.

--
Álvaro Herrera <alvherre(at)commandprompt(dot)com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: multiset patch review
Date: 2011-01-13 01:40:49
Message-ID: AANLkTinw9=_Xdog2Q6-EECkBudXhM2fW+yTmbc=eSTbE@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Jan 12, 2011 at 23:29, Alvaro Herrera
<alvherre(at)commandprompt(dot)com> wrote:
> Two small nitpicks:
> + check_concatinatable(Oid element_type1, Oid element_type2)
> +         ereport(ERROR,
> +                 (errcode(ERRCODE_DATATYPE_MISMATCH),
> +                  errmsg("cannot concatenate incompatible arrays"),
> +                  errdetail("Arrays with element types %s and %s are not "
> +                            "compatible for concatenation.",
>
> I think the word is either "concatenable" or "concatenatable".  Also
> please don't break up the string in errdetail() even if it's longer than
> 80 chars.  (The function below this one has this too)

OK, I'll fix them,
but the broken up messages come from the existing code.

--
Itagaki Takahiro


From: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: multiset patch review
Date: 2011-01-18 08:39:37
Message-ID: AANLkTi=Z6hBnsTTq3Py_UqwpZP9uzAVL_Mo+OubODKBY@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Jan 13, 2011 at 10:40, Itagaki Takahiro
<itagaki(dot)takahiro(at)gmail(dot)com> wrote:
>> I think the word is either "concatenable" or "concatenatable".  Also
>> please don't break up the string in errdetail() even if it's longer than
>> 80 chars.  (The function below this one has this too)
>
> OK, I'll fix them,
> but the broken up messages come from the existing code.

The attached is a fixed version.

BTW, should we use an "operator" to represent SUBMULTISET OF ?
It is mapped to submultiset_of "function" for now. If GIN and GiST
indexes will support the predicate in the future in addition to <@,
SUBMULTISET OF should be mapped to an operator rather than a function.
We need to choose different operator from <@ and @> for the case
because the semantics are not the same. (ex. <& and &>)

Note that MEMBER OF is represented as "ANY =".

--
Itagaki Takahiro

Attachment Content-Type Size
multiset-20110118.patch application/octet-stream 60.2 KB

From: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Subject: Re: multiset patch review
Date: 2011-01-24 07:45:49
Message-ID: AANLkTikya-j2Ty-aN+pQ4eFr8rxbh83aNdR8xoLk4TpX@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Jan 18, 2011 at 17:39, Itagaki Takahiro
<itagaki(dot)takahiro(at)gmail(dot)com> wrote:
> BTW, should we use an "operator" to represent SUBMULTISET OF ?

I did it in the attached patch. Also, I fixed a bug of NULL checks
in SUBMULTISET OF operator.

Now SUBMULTISET OF is an alias to the new <& operator. We also have
&> operator as the commutator. They are different from <@ and @>
operators because they considers the number of elements.

For example:
=# SELECT ARRAY[1,1] <@ ARRAY[1], ARRAY[1,1] <& ARRAY[1];
?column? | ?column?
----------+----------
t | f
(1 row)

GIN still doesn't support <& and &> operators because of NULL handling.
In the spec, all values including NULLs should be returned for an empty
array key (i.e, "WHERE ARRAY[] SUBMULTISET OF array_col" returns everything),
but the current GIN implementation won't return NULL values for non-NULL keys.
Since it requires changes in GIN, I'd like to postpone gin support to the next
development cycle for 9.2.

--
Itagaki Takahiro

Attachment Content-Type Size
multiset-20110124.patch application/octet-stream 62.5 KB

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Subject: Re: multiset patch review
Date: 2011-01-24 11:49:58
Message-ID: AANLkTik84=sWJNtUuZL0VrqdM-bATT1Cy=8fcZ=2wqga@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Jan 24, 2011 at 2:45 AM, Itagaki Takahiro
<itagaki(dot)takahiro(at)gmail(dot)com> wrote:
> [ latest patch ]

I notice that this is adding keywords and syntax support for what is
basically a PostgreSQL extension (since we certainly can't possibly be
following the SQL standards given that we're not implementing a new
datatype. Is that really a good idea?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Subject: Re: multiset patch review
Date: 2011-01-24 12:27:48
Message-ID: AANLkTim7S-3v3DmeRiYt01oZnXd5oTT2tV3vovUmayUK@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Jan 24, 2011 at 20:49, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> I notice that this is adding keywords and syntax support for what is
> basically a PostgreSQL extension (since we certainly can't possibly be
> following the SQL standards given that we're not implementing a new
> datatype.  Is that really a good idea?

As I wrote here,
http://archives.postgresql.org/pgsql-hackers/2011-01/msg00829.php
I think we can follow the SQL standard incrementally because we
have function overloads.

One exception is the result type of collect() aggregate function.
It returns an array for now, but will return a multiset when we
support true multiset data type.

--
Itagaki Takahiro


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Subject: Re: multiset patch review
Date: 2011-01-30 16:52:39
Message-ID: AANLkTi=_QjKU7L5AXLFVBZXS3H0Fqhtee4S+GJA4V6--@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Jan 24, 2011 at 7:27 AM, Itagaki Takahiro
<itagaki(dot)takahiro(at)gmail(dot)com> wrote:
> On Mon, Jan 24, 2011 at 20:49, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> I notice that this is adding keywords and syntax support for what is
>> basically a PostgreSQL extension (since we certainly can't possibly be
>> following the SQL standards given that we're not implementing a new
>> datatype.  Is that really a good idea?
>
> As I wrote here,
> http://archives.postgresql.org/pgsql-hackers/2011-01/msg00829.php
> I think we can follow the SQL standard incrementally because we
> have function overloads.
>
> One exception is the result type of collect() aggregate function.
> It returns an array for now, but will return a multiset when we
> support true multiset data type.

So, the plan is to add this now with non-standard semantics and then
change the semantics later if and when we implement what the standard
requires? That's not something we usually do, and I don't see why
it's a better idea in this case than it is in general. It's OK to
have non-standard behavior with non-standard syntax, but I think
non-standard behavior with standard syntax is something we want to try
hard to avoid.

I'm in favor of rejecting this patch in its entirety. The
functionality looks useful, but once you remove the syntax support, it
could just as easily be distributed as a contrib module rather than in
core.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Subject: Re: multiset patch review
Date: 2011-01-30 17:16:51
Message-ID: AANLkTimop9q3cyNya=zb+Mxa6ogJWvt2tSFXKmC4Dyx6@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2011/1/30 Robert Haas <robertmhaas(at)gmail(dot)com>:
> On Mon, Jan 24, 2011 at 7:27 AM, Itagaki Takahiro
> <itagaki(dot)takahiro(at)gmail(dot)com> wrote:
>> On Mon, Jan 24, 2011 at 20:49, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>>> I notice that this is adding keywords and syntax support for what is
>>> basically a PostgreSQL extension (since we certainly can't possibly be
>>> following the SQL standards given that we're not implementing a new
>>> datatype.  Is that really a good idea?
>>
>> As I wrote here,
>> http://archives.postgresql.org/pgsql-hackers/2011-01/msg00829.php
>> I think we can follow the SQL standard incrementally because we
>> have function overloads.
>>
>> One exception is the result type of collect() aggregate function.
>> It returns an array for now, but will return a multiset when we
>> support true multiset data type.
>
> So, the plan is to add this now with non-standard semantics and then
> change the semantics later if and when we implement what the standard
> requires?  That's not something we usually do, and I don't see why
> it's a better idea in this case than it is in general.  It's OK to
> have non-standard behavior with non-standard syntax, but I think
> non-standard behavior with standard syntax is something we want to try
> hard to avoid.
>
> I'm in favor of rejecting this patch in its entirety.  The
> functionality looks useful, but once you remove the syntax support, it
> could just as easily be distributed as a contrib module rather than in
> core.

Hello

It must not be a significant problem with compatibility, because
implemented operators and functions are implemented for arrays.
Functions from this patch are very useful - there are lot of
implementations in SQL language, and this implementation means a
significant speed. I can't to believe so there can be situation, when
pg will has a true support of collection and operations with arrays
will not offer similar functionality. I propose a remove collect()
aggregate, but all others functions and operators can stay.

And if this isn't acceptable for Robert, then I like implementation
of these functions without parser's changes as minimum. Function like
array_sort, array_distinct and some variants array_union are really
missing (should be in core).

Regards
Pavel

>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Subject: Re: multiset patch review
Date: 2011-01-30 17:16:57
Message-ID: 1745.1296407817@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> So, the plan is to add this now with non-standard semantics and then
> change the semantics later if and when we implement what the standard
> requires? That's not something we usually do, and I don't see why
> it's a better idea in this case than it is in general. It's OK to
> have non-standard behavior with non-standard syntax, but I think
> non-standard behavior with standard syntax is something we want to try
> hard to avoid.

> I'm in favor of rejecting this patch in its entirety. The
> functionality looks useful, but once you remove the syntax support, it
> could just as easily be distributed as a contrib module rather than in
> core.

+1 ... if we're going to provide nonstandard behavior, it should be with
a different syntax. Also, with a contrib module we could keep on
providing the nonstandard behavior for people who still need it, even
after implementing the standard properly.

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Subject: Re: multiset patch review
Date: 2011-01-30 17:34:47
Message-ID: AANLkTincCNFQPx_qq11Azbek3hFsgUeaQOp8VLDzkGsv@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Jan 30, 2011 at 12:16 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> So, the plan is to add this now with non-standard semantics and then
>> change the semantics later if and when we implement what the standard
>> requires?  That's not something we usually do, and I don't see why
>> it's a better idea in this case than it is in general.  It's OK to
>> have non-standard behavior with non-standard syntax, but I think
>> non-standard behavior with standard syntax is something we want to try
>> hard to avoid.
>
>> I'm in favor of rejecting this patch in its entirety.  The
>> functionality looks useful, but once you remove the syntax support, it
>> could just as easily be distributed as a contrib module rather than in
>> core.
>
> +1 ... if we're going to provide nonstandard behavior, it should be with
> a different syntax.  Also, with a contrib module we could keep on
> providing the nonstandard behavior for people who still need it, even
> after implementing the standard properly.

Good point.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Subject: Re: multiset patch review
Date: 2011-01-30 18:46:15
Message-ID: AANLkTimj6WQeR+Vbop_nVG8nQd_m1hAyVUv0OdPCSgTq@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Jan 31, 2011 at 02:34, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>>> I'm in favor of rejecting this patch in its entirety.  The
>>> functionality looks useful, but once you remove the syntax support, it
>>> could just as easily be distributed as a contrib module rather than in
>>> core.
>>
>> +1 ... if we're going to provide nonstandard behavior, it should be with
>> a different syntax.  Also, with a contrib module we could keep on
>> providing the nonstandard behavior for people who still need it, even
>> after implementing the standard properly.
>
> Good point.

I agree for collect() function, that is the only function we cannot
provide compatibility when we have MULTISET. But others are still
reasonable because they won't provide nonstandard behavior.

The SQL standard seems to have abstract COLLECTION data type as a
super class of ARRAY and MULTISET. So, it's reasonable that
functions and operators that accept MULTISETs also accept ARRAYs.
For example, we will have cardinality(ARRAY) even if we have
cardinality(MULTISET). Also, trim_array() is in the SQL standard.

I can remove some parts in the patch, especially for parser changes,
but others should be still in the core.

--
Itagaki Takahiro


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Subject: Re: multiset patch review
Date: 2011-01-30 19:11:01
Message-ID: AANLkTinO6P=MSLSQU5YeKJwSohkkQG3GMGqmOGDxRFtF@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Jan 30, 2011 at 1:46 PM, Itagaki Takahiro
<itagaki(dot)takahiro(at)gmail(dot)com> wrote:
> On Mon, Jan 31, 2011 at 02:34, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>>>> I'm in favor of rejecting this patch in its entirety.  The
>>>> functionality looks useful, but once you remove the syntax support, it
>>>> could just as easily be distributed as a contrib module rather than in
>>>> core.
>>>
>>> +1 ... if we're going to provide nonstandard behavior, it should be with
>>> a different syntax.  Also, with a contrib module we could keep on
>>> providing the nonstandard behavior for people who still need it, even
>>> after implementing the standard properly.
>>
>> Good point.
>
> I agree for collect() function, that is the only function we cannot
> provide compatibility when we have MULTISET. But others are still
> reasonable because they won't provide nonstandard behavior.
>
> The SQL standard seems to have abstract COLLECTION data type as a
> super class of ARRAY and MULTISET. So, it's reasonable that
> functions and operators that accept MULTISETs also accept ARRAYs.
> For example, we will have cardinality(ARRAY) even if we have
> cardinality(MULTISET). Also, trim_array() is in the SQL standard.
>
> I can remove some parts in the patch, especially for parser changes,
> but others should be still in the core.

Well, do you want to revise this and submit a stripped-down version?
I'm not averse to adding things that are required by the standard and
won't cause backward compatibility problems later.

The documentation for trim_array() in the current patch version is
pretty terrible. The documentation describes it

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Subject: Re: multiset patch review
Date: 2011-01-30 19:12:12
Message-ID: AANLkTinwBGGSko9dDBKcu9Ot46da0niCAxNz=fK0AT3W@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Jan 30, 2011 at 2:11 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Sun, Jan 30, 2011 at 1:46 PM, Itagaki Takahiro
> <itagaki(dot)takahiro(at)gmail(dot)com> wrote:
>> On Mon, Jan 31, 2011 at 02:34, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>>>>> I'm in favor of rejecting this patch in its entirety.  The
>>>>> functionality looks useful, but once you remove the syntax support, it
>>>>> could just as easily be distributed as a contrib module rather than in
>>>>> core.
>>>>
>>>> +1 ... if we're going to provide nonstandard behavior, it should be with
>>>> a different syntax.  Also, with a contrib module we could keep on
>>>> providing the nonstandard behavior for people who still need it, even
>>>> after implementing the standard properly.
>>>
>>> Good point.
>>
>> I agree for collect() function, that is the only function we cannot
>> provide compatibility when we have MULTISET. But others are still
>> reasonable because they won't provide nonstandard behavior.
>>
>> The SQL standard seems to have abstract COLLECTION data type as a
>> super class of ARRAY and MULTISET. So, it's reasonable that
>> functions and operators that accept MULTISETs also accept ARRAYs.
>> For example, we will have cardinality(ARRAY) even if we have
>> cardinality(MULTISET). Also, trim_array() is in the SQL standard.
>>
>> I can remove some parts in the patch, especially for parser changes,
>> but others should be still in the core.
>
> Well, do you want to revise this and submit a stripped-down version?
> I'm not averse to adding things that are required by the standard and
> won't cause backward compatibility problems later.
>
> The documentation for trim_array() in the current patch version is
> pretty terrible.  The documentation describes it

Argh, sorry.

The documentation describe it as having the prototype
trim_array(anyarray), but it's called in the example as
trim(integer[], integer) - two arguments vs. one. Also the docs don't
say how it decides how many elements to remove, or what happens to a
multi-dimensional array.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Subject: Re: multiset patch review
Date: 2011-01-30 19:34:54
Message-ID: AANLkTi==t6mc3QxHfrhaSZC3Z+=+oZztO05sdL+uvcQJ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Jan 31, 2011 at 04:12, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> Well, do you want to revise this and submit a stripped-down version?
>> I'm not averse to adding things that are required by the standard and
>> won't cause backward compatibility problems later.

Sure. I'll remove collect() function. I can also remove syntax support,
but it requires rework for documentation (for, IS A SET operator to
is_a_set function), so I'd like to wait for the final consensus a bit more.

> The documentation for trim_array() in the current patch version is
> pretty terrible.  The documentation describe it as having the prototype
> trim_array(anyarray), but it's called in the example as
> trim(integer[], integer) - two arguments vs. one.

Oops, it's just my mistake. trim(anyarray, integer) is correct.

> Also the docs don't
> say how it decides how many elements to remove, or what happens to a
> multi-dimensional array.

I wrote the description below in the patch:
+ In <function>array_sort</>, <function>set</>, and <function>trim_array</>
+ functions, input arrays are always flattened into one-dimensional arrays.
+ In addition, the lower bounds of the arrays are adjusted to 1.

I'm not sure what is the consistent behavior for MD arrays. For example,
array_concat() is very strict, but <@ and && operators don't care about
the dimensions. I interpreted the second argument for trim_array() as
a number of "elements", but of course we can redefine it as a number of
"rows" for MD arrays.

--
Itagaki Takahiro


From: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Subject: Re: multiset patch review
Date: 2011-01-31 06:49:07
Message-ID: AANLkTinXNEWcPnwfauAt6QPPCQmQeUp4Jz2R4AxR2q7x@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Jan 31, 2011 at 04:34, Itagaki Takahiro
<itagaki(dot)takahiro(at)gmail(dot)com> wrote:
> On Mon, Jan 31, 2011 at 04:12, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>>> Well, do you want to revise this and submit a stripped-down version?
>>> I'm not averse to adding things that are required by the standard and
>>> won't cause backward compatibility problems later.

I removed collect() aggregate function because the result type is MULTISET
in the spec. I keep all of other functions and operators because they won't
break anything in the standard. When we will have true MULTISET data type,
we can overload those functions and operators for MULTISET and ARRAY.

>> The documentation for trim_array() in the current patch version is
>> pretty terrible.  The documentation describe it as having the prototype
>> trim_array(anyarray), but it's called in the example as
>> trim(integer[], integer) - two arguments vs. one.
>
> Oops, it's just my mistake. trim(anyarray, integer) is correct.

Fixed and add an example for a MD array.

>> Also the docs don't
>> say how it decides how many elements to remove, or what happens to a
>> multi-dimensional array.

Now it removes supplied number of slices at the end of array.
trim_array( ARRAY[[1,2],[3,4]], 1) ==> ARRAY[[1,2]]
Also, it keep lower-bounds of the input array, that is an advantage
over slice syntax. Slice syntax always reset lower-bounds to 1.

--
Itagaki Takahiro

Attachment Content-Type Size
multiset-20110131.patch application/octet-stream 62.1 KB

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Subject: Re: multiset patch review
Date: 2011-02-04 17:29:59
Message-ID: AANLkTinzoU78_rp7hYD1Q87Z_g9As6iGs_OQ69f4_Ejd@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Jan 31, 2011 at 1:49 AM, Itagaki Takahiro
<itagaki(dot)takahiro(at)gmail(dot)com> wrote:
> I removed collect() aggregate function because the result type is MULTISET
> in the spec. I keep all of other functions and operators because they won't
> break anything in the standard. When we will have true MULTISET data type,
> we can overload those functions and operators for MULTISET and ARRAY.

I am still not in favor of adding this syntax. I'd be in favor of
adding array_cardinality(), trim_array(), array_sort(), and
array_flatten(). [It sure is awkward that trim_array has the word
array second and all of our other array functions have it first - is
this required by the spec?]

array_to_set() and array_is_set() seem possibly useful, but I probably
would have called them array_remove_dups() and array_has_dups(). I
might be in the minority on that one, though.

I think array_subset(), array_union(), array_intersect(), and
array_except() are useful, but I think they should just be regular
functions, without any special syntax.

fusion() and intersection() also seem useful, but maybe it would be
more consistent to all them array_fusion() and array_intersection().

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Subject: Re: multiset patch review
Date: 2011-02-04 18:02:51
Message-ID: AANLkTikc_MnCLtJW0ooB9sDQxrKqZNFkLr8RCFTN0-e0@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Feb 5, 2011 at 02:29, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> I am still not in favor of adding this syntax.

I also don't like the syntax, but unfortunately, almost all of
them are in the SQL standard :-(. In addition, Oracle already
uses the same feature with the special syntax, though true multiset
data types are supported in it.

> [It sure is awkward that trim_array has the word
> array second and all of our other array functions have it first - is
> this required by the spec?]

Yes, again, but we could have array_trim() for the alias.

> array_to_set() and array_is_set() seem possibly useful, but I probably
> would have called them array_remove_dups() and array_has_dups().  I
> might be in the minority on that one, though.

array_to_set is the only function we can rename freely.
Another candidates might be array_unique (contrib/intarray uses uniq).

array_is_set() is an internal representation of "IS A SET" operator.
So, the name is not so important (and not documented.)

> I think array_subset(), array_union(), array_intersect(), and
> array_except() are useful, but I think they should just be regular
> functions, without any special syntax.

All of the special syntax are in the spec, except the argument
types should be multisets rather than arrays.

> fusion() and intersection() also seem useful, but maybe it would be
> more consistent to all them array_fusion() and array_intersection().

Both of the names are the standard... We could have array_fusion()
for array types and fusion() for multiset types, but I prefer
overloaded fusion() to have both names.

--
Itagaki Takahiro


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Subject: Re: multiset patch review
Date: 2011-02-04 18:04:10
Message-ID: AANLkTimNfhF+xJFydqi5j_uTYuazp8aMD6ZjAPvDUiKj@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Feb 4, 2011 at 1:02 PM, Itagaki Takahiro
<itagaki(dot)takahiro(at)gmail(dot)com> wrote:
> On Sat, Feb 5, 2011 at 02:29, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> I am still not in favor of adding this syntax.
>
> I also don't like the syntax, but unfortunately, almost all of
> them are in the SQL standard :-(.

The standard specifies this syntax for arrays, or for multisets?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Subject: Re: multiset patch review
Date: 2011-02-04 18:15:53
Message-ID: AANLkTikT4E9wcHwaxkwEKb1zQMOT==OM_ZpE_bL253fa@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Feb 5, 2011 at 03:04, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>>> I am still not in favor of adding this syntax.
>>
>> I also don't like the syntax, but unfortunately, almost all of
>> them are in the SQL standard :-(.
>
> The standard specifies this syntax for arrays, or for multisets?

Multisets. But I chose the same function name and syntax because
arrays *are* multisets by definition.

--
Itagaki Takahiro


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Subject: Re: multiset patch review
Date: 2011-02-04 18:18:28
Message-ID: AANLkTinoMDfgyJTouKu-pp+qvoZvf=mBcQEVu6nndG6v@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Feb 4, 2011 at 1:15 PM, Itagaki Takahiro
<itagaki(dot)takahiro(at)gmail(dot)com> wrote:
> On Sat, Feb 5, 2011 at 03:04, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>>>> I am still not in favor of adding this syntax.
>>>
>>> I also don't like the syntax, but unfortunately, almost all of
>>> them are in the SQL standard :-(.
>>
>> The standard specifies this syntax for arrays, or for multisets?
>
> Multisets. But I chose the same function name and syntax because
> arrays *are* multisets by definition.

In math class, maybe. But in programming, no. Multiset is a
datatype. Array is a different datatype. There is no reason why we
need to clutter our parser with extra keywords to support a
non-standard feature extension.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Subject: Re: multiset patch review
Date: 2011-02-04 19:24:33
Message-ID: m2mxmb38ni.fsf@2ndQuadrant.fr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Fri, Feb 4, 2011 at 1:15 PM, Itagaki Takahiro
> <itagaki(dot)takahiro(at)gmail(dot)com> wrote:
>> Multisets. But I chose the same function name and syntax because
>> arrays *are* multisets by definition.
>
> In math class, maybe. But in programming, no. Multiset is a
> datatype. Array is a different datatype. There is no reason why we
> need to clutter our parser with extra keywords to support a
> non-standard feature extension.

My understanding is that we will have to have those functions defined
and user visible, and that we benefit from function overloading which is
not in the standard. So there's no reason not to provide those function
for arrays already, then extend to full multiset support.

Given PostgreSQL overloading, yes, arrays are multisets as far as
defining those standard compliant APIs is concerned. AFAIUI.

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


From: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>
To: Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Subject: Re: multiset patch review
Date: 2011-02-05 02:11:22
Message-ID: AANLkTi=vJ6DDSDs6BUECzKnx8Xy1AMsd3KCHgL7FN5UW@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Feb 5, 2011 at 04:24, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr> wrote:
>> In math class, maybe.  But in programming, no.  Multiset is a
>> datatype.  Array is a different datatype.  There is no reason why we
>> need to clutter our parser with extra keywords to support a
>> non-standard feature extension.
>
> My understanding is that we will have to have those functions defined
> and user visible, and that we benefit from function overloading which is
> not in the standard.  So there's no reason not to provide those function
> for arrays already, then extend to full multiset support.
>
> Given PostgreSQL overloading, yes, arrays are multisets as far as
> defining those standard compliant APIs is concerned.  AFAIUI.

Yes, I'd like to use overloading.
Choosing arbitrary names increases learning costs for users.

--
Itagaki Takahiro


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>
Cc: Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Subject: Re: multiset patch review
Date: 2011-02-11 15:35:39
Message-ID: AANLkTina7S-_3kuEFEPyaVc8cHquOxB-S1e8WFU=r2KD@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Feb 4, 2011 at 9:11 PM, Itagaki Takahiro
<itagaki(dot)takahiro(at)gmail(dot)com> wrote:
> On Sat, Feb 5, 2011 at 04:24, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr> wrote:
>>> In math class, maybe.  But in programming, no.  Multiset is a
>>> datatype.  Array is a different datatype.  There is no reason why we
>>> need to clutter our parser with extra keywords to support a
>>> non-standard feature extension.
>>
>> My understanding is that we will have to have those functions defined
>> and user visible, and that we benefit from function overloading which is
>> not in the standard.  So there's no reason not to provide those function
>> for arrays already, then extend to full multiset support.
>>
>> Given PostgreSQL overloading, yes, arrays are multisets as far as
>> defining those standard compliant APIs is concerned.  AFAIUI.
>
> Yes, I'd like to use overloading.
> Choosing arbitrary names increases learning costs for users.

Right, but making the parser slower has a cost, too.
ScanKeywordLookup() is already a hotspot in some workloads, and
there's overhead buried in the bison parser, too. I think it's a big
mistake to get into the business of adding keywords just so we can
provide an alternative syntax to call a function. Many people who use
these functions will never even have heard of the MULTISET stuff that
is part of the spec, and even those that have can figure out our
alternatives by spending five minutes with the documentation. I find
it really difficult to accept that it is worth slowing down parsing
for the 95% of users who are not going to use these functions to
provide a slightly nicer API for the ones that do.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Subject: Re: multiset patch review
Date: 2011-02-11 15:50:19
Message-ID: 28774.1297439419@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> Right, but making the parser slower has a cost, too.
> ScanKeywordLookup() is already a hotspot in some workloads, and
> there's overhead buried in the bison parser, too.

Yeah. Keep in mind that a bison parser fundamentally runs off a
two-dimensional array: one axis is parser state and the other is token
number. They have some tricks to compress the array a bit, but adding
keywords contributes directly to a bigger array, which means slower
parsing (more L1 cache misses). The parser's inner loop frequently
shows up as a hotspot in profiles I do, and I think that has to be more
about the amount of data it's touching than the cost of the loop per se.

Adding unnecessary keywords is something to be avoided.

regards, tom lane


From: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Subject: Re: multiset patch review
Date: 2011-02-11 16:17:35
Message-ID: AANLkTikcN0Y8UH9koFHBSRny0u2QiDXscZX1nctS_PKf@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Feb 12, 2011 at 00:50, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> Right, but making the parser slower has a cost, too.
>> ScanKeywordLookup() is already a hotspot in some workloads, and
>> there's overhead buried in the bison parser, too.
>
> Yeah.  Keep in mind that a bison parser fundamentally runs off a
> two-dimensional array: one axis is parser state and the other is token
> number.  They have some tricks to compress the array a bit, but adding
> keywords contributes directly to a bigger array, which means slower
> parsing (more L1 cache misses).  The parser's inner loop frequently
> shows up as a hotspot in profiles I do, and I think that has to be more
> about the amount of data it's touching than the cost of the loop per se.

Did you measure the actual cost in the real world? If we are using
such a sensitive parser, it should be a problem even without the patch.

> Adding unnecessary keywords is something to be avoided.

Our conclusion is "we never support multiset syntax in the SQL standard",
right? If we think they are unnecessary, we cannot support it.

I will remove parser changes from the patch; it will add only a few array
functions. Then, please let me know functions you don't want to include
in the core, if any. I'll remove them at the same time.

--
Itagaki Takahiro


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Subject: Re: multiset patch review
Date: 2011-02-11 16:31:16
Message-ID: AANLkTik7vB55WWExUqdS2GFqtosOcyLa3wV_McTokT_e@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Feb 11, 2011 at 11:17 AM, Itagaki Takahiro
<itagaki(dot)takahiro(at)gmail(dot)com> wrote:
> Did you measure the actual cost in the real world? If we are using
> such a sensitive parser, it should be a problem even without the patch.

It *is* a problem without the patch!

>> Adding unnecessary keywords is something to be avoided.
>
> Our conclusion is "we never support multiset syntax in the SQL standard",
> right?  If we think they are unnecessary, we cannot support it.

No, my conclusion is that if we're not really doing what the standard
says anyway, it's not worth the cost of the new keywords. Whether or
not we'd be willing to pay the cost for an implementation that
actually conformed with the standard is not something I believe we've
decided. But it's not going to happen any time soon, because adding
actual multiset types would bloat pg_type by another 50%, which is a
cost I doubt that we will be willing to pay. I really hope someone
will eventually fix things so that we don't need to add a new copy of
every composite type definition for every collection type we want to
support, but until that happens, there is not much chance of
implementing this feature in a way that is actually
standard-conforming. And until then, paying the extra parsing cost
for something that isn't going to be standard behavior anyway is not a
good trade-off.

We have spent countless hours figuring out how to redesign bits of
syntax so that they could use already-existing keywords instead of
adding new ones; and many more hours angsting about whether there is
any way to get rid of some of the keywords we already have. The new
options syntax for EXPLAIN and VACUUM exists *primarily* to reduce the
number of future keywords we'll need to create. This is a seriously
annoying problem, but it's not one we made up just for this patch. We
deal with it all the time. Do we sometimes add keywords? Sure, of
course we do. But we try to minimize it. It isn't free.

> I will remove parser changes from the patch; it will add only a few array
> functions. Then, please let me know functions you don't want to include
> in the core, if any. I'll remove them at the same time.

I posted my thoughts on this topic a week ago.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Subject: Re: multiset patch review
Date: 2011-02-11 20:01:31
Message-ID: 20110211200131.GU4116@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

* Itagaki Takahiro (itagaki(dot)takahiro(at)gmail(dot)com) wrote:
> I will remove parser changes from the patch; it will add only a few array
> functions. Then, please let me know functions you don't want to include
> in the core, if any. I'll remove them at the same time.

Seems like this should be 'waiting on author', but since it was marked
'needs review', I started taking a look at it. Without the grammar
changes, it's just adding a couple of functions. In general, I'm all
for that, but I don't like this:

Input arrays are always flattened into one-dimensional arrays.

That just strikes me as completely broken when it comes to PG Arrays.
What does the spec say about this, if anything? Is that required by
spec, or is the spec not relevant to this because MULTISETs are only one
dimensional..?

I would think that we would have a way to define what dimension or piece
of the array that would be sorted or returned as a set, etc. I could
see having a 'flatten' function which could be called first, if that's
really what you want.. Or maybe we just need a slice function whose
result could then be passed in to these functions, perhaps..

In my view, we should be throwing an error if we get called on a
multi-dim array and we can't perform the operation on that in an
obviously correct way, not forcing the input to match something we can
make work.

From above, that makes me not thrilled with the 'flatten' boolean for
array_cat_internal(), nor with how it was implemented, or how those
changes apparently didn't justfiy *any* comment updates..

I'm not thrilled with called ArrayGetNItems array_cardinality(). Why
not just provide a function with a name like "array_getnitems()"
instead?

trim_array() suffers from two problems: lack of comments, and no spell
checking done on those that are there.

Should get_type_cache() really live in array_userfuncs.c ?

There's more, primairly lack of comments and what I consider poor
function naming ("sort_or_unique()" ? really?), but in the end my
feeling is that this could survive just fine, for now, as a contrib
module, and that it really isn't close enough to being committable to
make it into 9.1 in any case.

I'll mark it waiting on author, since I think the formal 'returned with
feedback' needs to come from someone else, but that's where I think this
is headed.

Thanks,

Stephen


From: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Subject: Re: multiset patch review
Date: 2011-02-12 02:15:11
Message-ID: AANLkTinPmXF5nRWZAhNoJ_Gdr8ogC2-tpakjC6uMgaqF@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Feb 12, 2011 at 05:01, Stephen Frost <sfrost(at)snowman(dot)net> wrote:
> Input arrays are always flattened into one-dimensional arrays.
> That just strikes me as completely broken when it comes to PG Arrays.

Contains operators (<@, &&, @>) ignore multi-dimensions.
Array slice operator ([lo:hi]) always reset the index offsets.

> What does the spec say about this, if anything?  Is that required by
> spec, or is the spec not relevant to this because MULTISETs are only one
> dimensional..?

Yes. The SQL standard has only one-dimensional ARRAYs and MULTISETs ,
though it supports "collections of collections", that we don't have.

> I would think that we would have a way to define what dimension or piece
> of the array that would be sorted or returned as a set, etc.

It would be consistent if we return (ARRAY[][])[] => ARRAY[],
but we throw errors actually.

> In my view, we should be throwing an error if we get called on a
> multi-dim array and we can't perform the operation on that in an
> obviously correct way, not forcing the input to match something we can
> make work.

Agreed, I'll do so. We could also keep lower-bounds of arrays,
at least on sort.

> I'm not thrilled with called ArrayGetNItems array_cardinality().  Why
> not just provide a function with a name like "array_getnitems()"
> instead?

We must use the name, because it is in the SQL standard.
FUNCTION cardinality(collection) => number
We would have overloaded two versions for ARRAYs andMULTISETs.

> trim_array() suffers from two problems: lack of comments, and no spell
> checking done on those that are there.

What comments do you want?

> Should get_type_cache() really live in array_userfuncs.c ?

Do you find codes using the same operation in other files?

> There's more, primairly lack of comments and what I consider poor
> function naming ("sort_or_unique()" ?  really?),

Could you suggest better names?

--
Itagaki Takahiro


From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Subject: Re: multiset patch review
Date: 2011-02-12 13:27:18
Message-ID: 20110212132718.GC4116@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Itagaki,

* Itagaki Takahiro (itagaki(dot)takahiro(at)gmail(dot)com) wrote:
> On Sat, Feb 12, 2011 at 05:01, Stephen Frost <sfrost(at)snowman(dot)net> wrote:
> > What does the spec say about this, if anything?  Is that required by
> > spec, or is the spec not relevant to this because MULTISETs are only one
> > dimensional..?
>
> Yes. The SQL standard has only one-dimensional ARRAYs and MULTISETs ,
> though it supports "collections of collections", that we don't have.

Yeah, I was afraid of that.. :(

> > In my view, we should be throwing an error if we get called on a
> > multi-dim array and we can't perform the operation on that in an
> > obviously correct way, not forcing the input to match something we can
> > make work.
>
> Agreed, I'll do so. We could also keep lower-bounds of arrays,
> at least on sort.

Sounds good. I'm also fine with providing a 'flatten' function, I just
don't agree w/ doing it automatically.

> > I'm not thrilled with called ArrayGetNItems array_cardinality().  Why
> > not just provide a function with a name like "array_getnitems()"
> > instead?
>
> We must use the name, because it is in the SQL standard.

If we use the name, then we have to throw an error when it's not a
single dimension array, since that's what's in the SQL standard. In
that case, we might as well have another function which gives us
ArrayGetNItems anyway.

> > trim_array() suffers from two problems: lack of comments, and no spell
> > checking done on those that are there.
>
> What comments do you want?

Uhm, how about ones that explain what's going on in each paragraph of
code..? And in other places, commenting the functions, what they do,
what they're used for, and documenting each of the arguments that are
passed in..

> > Should get_type_cache() really live in array_userfuncs.c ?
>
> Do you find codes using the same operation in other files?

Not yet, but logically it's about gathering information about types and
could be needed beyond just arrays..

> > There's more, primairly lack of comments and what I consider poor
> > function naming ("sort_or_unique()" ?  really?),
>
> Could you suggest better names?

How about 'array_sort()' or similar? With appropriate arguments that
can be used to request unique'ing or not? Or is there a "just unique
it, but don't sort it" option for this function?

Thanks,

Stephen


From: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: multiset patch review
Date: 2011-02-15 09:31:19
Message-ID: AANLkTim4i+pvU0g_q+XFm6SRzWgot0y5C4b_woJwiVSi@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Here is a cleanup version of multiset function patch. But all syntax
for multiset has been removed, so the patch just adds additional array
functions and aggregations.

=== The standard-compatible functions ===
- cardinality(anyarray)
- trim_array(anyarray, integer)
=== Renamed version of MULTISET functions ===
- array_trim(anyarray, integer) <= alias to trim_array()
- array_sort(anyarray)
- array_unique(anyarray)
- array_is_unique(anyarray)
- array_union(anyarray, anyarray)
- array_union_all(anyarray, anyarray) <= alias to array_cat()
- array_intersect(anyarray, anyarray)
- array_intersect_all(anyarray, anyarray)
- array_except(anyarray, anyarray)
- array_except_all(anyarray, anyarray)
- array_fusion(SETOF anyarray)
- array_intersection(SETOF anyarray)

On Sat, Feb 5, 2011 at 02:29, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> I am still not in favor of adding this syntax.  I'd be in favor of
> adding array_cardinality(), trim_array(), array_sort(), and
> array_flatten().  [It sure is awkward that trim_array has the word
> array second and all of our other array functions have it first - is
> this required by the spec?]

array_flatten() no longer exists. I added array_trim() as an alias
to trim_array() because it would be a FAQ.

> array_to_set() and array_is_set() seem possibly useful, but I probably
> would have called them array_remove_dups() and array_has_dups().  I
> might be in the minority on that one, though.

I prefer array_unique() and array_is_unique(), but will change
if objections.

> I think array_subset(), array_union(), array_intersect(), and
> array_except() are useful, but I think they should just be regular
> functions, without any special syntax.

I removed array_subset(). It might be re-added with index support in
the future. We need more discussions to have different definition of
contains/subset operators (<@ vs. SUBMULTISET OF-like operator) .

All special syntax are removed. I split those set-operation functions
into "all" and "non-all" versions. So, we will have:
array_union[_all], array_intersect[_all], and array_except[_all].

> fusion() and intersection() also seem useful, but maybe it would be
> more consistent to all them array_fusion() and array_intersection().

Renamed. One issue might be array_intersect (normal function) and
array_intersection (aggregate function) have similar names.

On Sat, Feb 12, 2011 at 22:27, Stephen Frost <sfrost(at)snowman(dot)net> wrote:
>> > I'm not thrilled with called ArrayGetNItems array_cardinality().
>> We must use the name, because it is in the SQL standard.
> If we use the name, then we have to throw an error when it's not a
> single dimension array, since that's what's in the SQL standard.

Hmmm, in my understanding, we can define cardinality()
as "number of elements", that is reasonable for multi-dimensional arrays.

Other functions except cardinality() and trim_array() raise errors
if multi-dimensional arrays are passed. Also, trim_array(), array_sort(),
and array_unique() now keep lower bounds of the input arrays.

> > Should get_type_cache() really live in array_userfuncs.c ?
I think storing cache information in fcinfo is not so general solution.
So, the subroutine is still kept as a static function. Other comments
and variable names you suggested in the revised patch. Thanks.

--
Itagaki Takahiro

Attachment Content-Type Size
array_functions-20110215.patch application/octet-stream 46.8 KB

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: multiset patch review
Date: 2011-02-15 12:13:21
Message-ID: AANLkTikU_AX+v1FoTrmiHuO8F=X9iekSDRAGTBGhnVmB@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Feb 15, 2011 at 4:31 AM, Itagaki Takahiro
<itagaki(dot)takahiro(at)gmail(dot)com> wrote:
> array_flatten() no longer exists. I added array_trim() as an alias
> to trim_array() because it would be a FAQ.

I don't like the alias thing - let's add one name or the other, not both.

Similarly, let's NOT add array_union_all as an alias for array_concat.

'cannot use multi-dimensional arrays' reads awkwardly to me. I think
it should say something like "sorting of multi-dimensional arrays is
not supported".

multi-demensional -> multi-dimensional

slaces -> slices

The formula in the trim_array comment is apparently misparenthesized.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: multiset patch review
Date: 2011-02-16 15:38:43
Message-ID: AANLkTi=crjK=xzZoFna45Gy-q3+42LRTomQzciTwY85D@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Feb 15, 2011 at 7:13 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Tue, Feb 15, 2011 at 4:31 AM, Itagaki Takahiro
> <itagaki(dot)takahiro(at)gmail(dot)com> wrote:
>> array_flatten() no longer exists. I added array_trim() as an alias
>> to trim_array() because it would be a FAQ.
>
> I don't like the alias thing - let's add one name or the other, not both.
>
> Similarly, let's NOT add array_union_all as an alias for array_concat.
>
> 'cannot use multi-dimensional arrays' reads awkwardly to me.  I think
> it should say something like "sorting of multi-dimensional arrays is
> not supported".
>
> multi-demensional -> multi-dimensional
>
> slaces -> slices
>
> The formula in the trim_array comment is apparently misparenthesized.

I think we're out of time to keep bikeshedding this. Let's revisit it for 9.2.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company