Re: Built-in binning functions

From: Petr Jelinek <petr(at)2ndquadrant(dot)com>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Petr Jelinek <petr(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Built-in binning functions
Date: 2014-08-25 14:15:07
Message-ID: 53FB44EB.4010800@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

I finally had some time to get back to this.

I attached version3 of the patch which "fixes" Tom's complaint about
int8 version by removing the int8 version as it does not seem necessary
(the float8 can handle integers just fine).

This patch now basically has just one optimized function for float8 and
one for numeric datatypes, just like width_bucket.

> On 08/07/14 02:14, Tom Lane wrote:
> Also, I'm not convinced by this business of throwing an error for a
> NULL array element. Per spec, null arguments to width_bucket()
> produce a null result, not an error --- shouldn't this flavor act
> similarly? In any case I think the test needs to use
> array_contains_nulls() not just ARR_HASNULL.

I really think there should be difference between NULL array and NULL
inside array and that NULL inside the array is wrong input. I changed
the check to array_contains_nulls() though.

> On 08/07/14 02:14, Tom Lane wrote:
> Lastly, the spec defines behaviors for width_bucket that allow either
> ascending or descending buckets. We could possibly do something
> similar

I am not sure it's worth it here as we require input to be sorted
anyway. It might be worthwhile if we decided to do this as an aggregate
(since there input would not be required to be presorted) instead of
function but I am not sure if that's desirable either as that would
limit usability to only the single main use-case (group by and count()).

On 20/07/14 11:01, Simon Riggs wrote:
> On 16 July 2014 20:35, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:
>>
>>> On 08/07/14 02:14, Tom Lane wrote:
>>>>
>>>> I didn't see any discussion of the naming question in this thread.
>>>> I'd like to propose that it should be just "width_bucket()"; we can
>>>> easily determine which function is meant, considering that the
>>>> SQL-spec variants don't take arrays and don't even have the same
>>>> number of actual arguments.
>>>
>>> I did mention in submission that the names are up for discussion, I am all
>>> for naming it just width_bucket.
>>
>> I had this idea too - but I am not sure if it is good idea. A distance
>> between ANSI SQL with_bucket and our enhancing is larger than in our
>> implementation of "median" for example.
>>
>> I can live with both names, but current name I prefer.
>
>
> So I suggest that we use the more generic function name bin(), with a
> second choice of discretize() (though that seems annoyingly easy to
> spell incorrectly)
>

I really don't think bin() is good choice here, bucket has same meaning
in this context and bin is often used as shorthand for binary which
would be confusing here.

Anyway I currently left the name as it is, I would not be against using
width_bucket() or even just bucket(), not sure about the discretize()
option.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment Content-Type Size
binning-fns-v3.patch text/x-diff 13.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2014-08-25 14:41:19 Re: Allow multi-byte characters as escape in SIMILAR TO and SUBSTRING
Previous Message Tom Lane 2014-08-25 13:48:59 Re: Allow multi-byte characters as escape in SIMILAR TO and SUBSTRING