Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] Negative Transition Aggregate Functions (WIP)
Date: 2013-12-16 09:55:50
Message-ID: CAApHDvrzxh2v9SUyVZKK19NQx+E-1bjgCCZ9dbLQ+XsyCSA_uw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Dec 15, 2013 at 12:17 PM, David Rowley <dgrowleyml(at)gmail(dot)com> wrote:

>
> One thing that is currently on my mind is what to do when passing volatile
> functions to the aggregate. Since the number of times we execute a volatile
> function will much depend on the window frame options, I think we should
> include some sort of warning in the documentation that "The number of times
> that the expression is evaluated within the aggregate function when used in
> the context of a WINDOW is undefined". The other option would be to disable
> this optimisation if the aggregate expression contained a volatile
> function, but doing this, to me seems a bit weird as is anyone actually
> going to be depending on a volatile function being executed so many times?
>
>
>
I just wanted to bring this back into people's minds.
During writing this patch I found and removed a comment which was a todo
item to implement these negative transition functions. This comment said
about maybe disallowing the use of these if the expression of the function
contained a volatile function. I wondered why this was important and I
still don't really see why we would disallow this only to enforce that we
call that function an undefined number of times anyway.

nextval was the only volatile function that I could think of that would
allow me to give an example which was easy to understand what is going on
here.

CREATE SEQUENCE test_seq;
SELECT currval('test_seq'),
COUNT(*) OVER (ORDER BY x.x ROWS BETWEEN CURRENT ROW AND UNBOUNDED
FOLLOWING),
SUM(nextval('test_seq')) OVER (ORDER BY x.x ROWS BETWEEN CURRENT ROW
AND UNBOUNDED FOLLOWING)
FROM generate_series(1,2) x (x);
DROP SEQUENCE test_seq;

The results are:
currval | count | sum
---------+-------+-----
2 | 2 | 3
3 | 1 | 3

I've not looked to see if the spec has anything about this but, the first
row will have sum as 1+2 then the 2nd row will just have 1 row to aggregate
and the value will be 3 due to nextval returning 3, I could see an argument
that the results for this should actually be:

currval | count | sum
---------+-------+-----
2 | 2 | 3
3 | 1 | 2

If it was then the solution would have to be to materalise the expression
by evaluating it once for each tuple which sounds like a big change. I
thought maybe if we're going to be playing around with the number of times
these expressions are evaluated then we should stick a node in the docs to
tell our users not to depend on this.

Something like the attached maybe.

Attachment Content-Type Size
aggfuncwindow_note_doc.patch application/octet-stream 789 bytes

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2013-12-16 10:12:57 Re: autovacuum_work_mem
Previous Message David Rowley 2013-12-16 09:07:19 Re: [PATCH] Negative Transition Aggregate Functions (WIP)