Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
To: Florian Pflug <fgp(at)phlo(dot)org>
Cc: David Rowley <dgrowleyml(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Greg Stark <stark(at)mit(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] Negative Transition Aggregate Functions (WIP)
Date: 2014-03-04 20:09:50
Message-ID: CAEZATCX3aBPmTp=go+4f-Q9i0Ko11HZtoL3a2r6CA62vdt-03g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 3 March 2014 23:00, Florian Pflug <fgp(at)phlo(dot)org> wrote:
>> * In show_windowagg_info(), this calculation looks suspicious to me:
>>
>> double tperrow = winaggstate->aggfwdtrans /
>> (inst->nloops * inst->ntuples);
>>
>> If the node is executed multiple times, aggfwdtrans will be reset in
>> each loop, so the transitions per row figure will be under-estimated.
>> ISTM that if you want to report on this, you'd need aggfwdtrans to be
>> reset once per query, but I'm not sure exactly how to do that.
>>
>> ...
>>
>> Actually, I think it's misleading to only count forward transition
>> function calls, because a call to the inverse transition function
>> still represents a state transition, and is likely to be around the
>> same cost. For a window of size 2, there would not be much advantage
>> to using inverse transition functions, because it would be around 2
>> transitions per row either way.
>
> True. In fact, I pondered whether to avoid using the inverse transition
> function for windows of 2 rows. In the end, I didn't because I felt that
> it makes custom aggregates harder to test.
>
> On the question of whether to count inverse transition function calls -
> the idea of the EXPLAIN VERBOSE ANALYZE output isn't really to show the
> number of state transitions, but rather to show whether the aggregation
> has O(n) or O(n^2) behaviour. The idea being that a value close to "1"
> means "inverse transition function works as expected", and larger values
> mean "not working so well".
>
> Regarding multiple evaluations - I think I based the behaviour on how
> ntuples works, which also only reports the value of the last evaluation
> I think. But maybe I'm confused about this.
>

No, it doesn't look like that's correct for multiple loops. Consider
this example:

explain (verbose, analyse)
select * from (values (10), (20), (30), (40)) v(x),
lateral
(select sum(i) over (rows between 4 preceding and current row)
from generate_series(1, x) i) t;

QUERY
PLAN
--------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.00..170.06 rows=4000 width=12) (actual
time=0.027..0.414 rows=100 loops=1)
Output: "*VALUES*".column1, (sum(i.i) OVER (?))
-> Values Scan on "*VALUES*" (cost=0.00..0.05 rows=4 width=4)
(actual time=0.002..0.006 rows=4 loops=1)
Output: "*VALUES*".column1
-> WindowAgg (cost=0.00..22.50 rows=1000 width=4) (actual
time=0.019..0.094 rows=25 loops=4)
Output: sum(i.i) OVER (?)
Transitions Per Row: 0.2
-> Function Scan on pg_catalog.generate_series i
(cost=0.00..10.00 rows=1000 width=4) (actual time=0.010..0.015 rows=25
loops=4)
Output: i.i
Function Call: generate_series(1, "*VALUES*".column1)

It turns out that show_windowagg_info() is only called once at the
end, with ntuples=100, nloops=4 and aggfwdtrans=100, so it's computing
tperrow=100/(4*100)=0.25, which then gets truncated to 0.2. So to get
1, you'd have to use this formula:

double tperrow = winaggstate->aggfwdtrans / inst->ntuples;

I'm still not convinced that's the most useful thing to report though.
Personally, I'd prefer to just see the separate counts, e.g.:

-> WindowAgg (cost=0.00..22.50 rows=1000 width=4) (actual
time=0.019..0.094 rows=25 loops=4)
Output: sum(i.i) OVER (?)
Forward transitions: 25 Inverse transitions: 25

IMO that gives a clearer picture of what's going on.

Thoughts?

Regards,
Dean

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fabrízio de Royes Mello 2014-03-04 20:33:50 GSoC propostal - "CREATE SCHEMA ... LIKE ..."
Previous Message Tom Lane 2014-03-04 20:08:37 Re: ALTER TABLE lock strength reduction patch is unsafe Reply-To: