Re: MULTISET patch

From: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: MULTISET patch
Date: 2011-01-06 11:54:38
Message-ID: AANLkTikhWYXKX1+J4LLqcGRdwXpQekk=nDDxyBSfJ2R6@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Here is an updated patch for MULTISET functions support.
- Fixed a bug in MULTISET EXCEPT ALL.
- Fixed a bug of NULL handling in SUBMULTISET OF.
- Removed array_flatten() from exported function list. It is still
used internally, but I doubt it would be useful in SQL level.

On Mon, Dec 27, 2010 at 21:27, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:
>>> * trim_array - you use a deconstruct_array. It unpack  all fields and
>>> it could not be effective. Can we limit a unpacked array?

I modified the logic to use existing array_get_slice(),
that uses memcpy to make a sliced array without unpacking.

> DB2 use a both names.  Almost all array functions has a name array_* .
> Sure, standard is primary target. It's only my idea, so we can have a
> both names. It's not significant.

We also use array_xxx for array function names, but TRIM_ARRAY()
is the standard... Since slice syntax ARRAY[lo:hi] should be more
useful than trim_array(), I don't have a plan to add the alias.

> * three state boolean - true, false, -1. I am not sure, if this is
> correct style. Using a second variable can be a more clean

I modified the code to use two booleans.

> * you doesn't a realese a deconstructed array

I don't think it needs to be fixed because functions are executed
in per-tuple memory context in normal cases.

On Mon, Dec 27, 2010 at 17:05, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> I think collect() is non-identical to fusion() but the tests don't
> highlight that, so I think we need more tests to highlight null
> handling.

collect() exactly equals to array_agg() in the current implementation.
It should return a MULTISET value instead of ARRAY in the spec,
but we don't have MULTISET data type for now.

> You mention that SQL standard doesn't handle multi-dim arrays,
> so I think we need some tests to define and check that behaviour.

I added additional tests for multi-dimensional arrays and NULL elements.

> Without any docs the only point of reference to understand the
> PostgreSQL implementation is by reading the tests. IMHO docs explain a
> patch and make a review possible; they aren't something that can be
> written after a review. Perhaps the nag doesn't apply as much when you
> supply external references, but reviewer shouldn't have to read all of
> those again. Of course, no need for docs to be perfect, just enough to
> understand.

I added some comments mainly in "Array Functions and Operators" section.
The most debatable part is whether arrays should be flattened or not.
I wrote the patch to flatten always -- for example,
trim_array( [[1,2],[3,4]], 1) => [1,2,3]
but we could design it to keep the dimension:
trim_array( [[1,2],[3,4]], 1) => [[1,2]]

However, it's not so easy for comparison function (sort, set, ...) to
decide what we should do. Note that the existing <@ operator also ignored
dimensions in arrays:

=# SELECT ARRAY[[1,3],[2,5]] <@ ARRAY[[1,2,3],[4,5,6]];
?column?
----------
t

--
Itagaki Takahiro

Attachment Content-Type Size
multiset-20110106.patch application/octet-stream 56.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2011-01-06 14:31:17 Re: sepgsql contrib module
Previous Message Marti Raudsepp 2011-01-06 11:47:20 Re: Streaming base backups