Re: TODO items for window functions

From: "David Rowley" <dgrowley(at)gmail(dot)com>
To: "'Tom Lane'" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "'Hitoshi Harada'" <umi(dot)tanuki(at)gmail(dot)com>
Cc: <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Re: TODO items for window functions
Date: 2008-12-28 21:29:24
Message-ID: 3CDAD71E9D70417290FCF66F0178D1E1@amd64
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tom Lane Wrote:
> The core window-functions patch is now committed and ready for wider
> testing. However, there are a number of unfinished items, at least
> some of which I'd like to see addressed before 8.4 release. In rough
> order of importance:
>
> * Support creation of user-defined window functions. I think this is
> a "must have" for 8.4 --- we are not in the habit of building
> nonextensible basic features. It doesn't seem that hard either.
> I think all we need do is to allow "WINDOW" as an attribute keyword
> in CREATE FUNCTION. Does anyone have an objection or a better idea?
>
> * Implement support for non-default window framing clauses. Most likely
> it's too late to consider getting the whole feature done for 8.4, but
> I wonder whether we could support the restricted case of allowing just
> these two combinations
> BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW (which is default)
> BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
> I said earlier that we didn't really need to address this for 8.4,
> but my thinking was flawed. The case I think is really important
> is to allow last_value() to do something useful, and you can hardly
> argue that it's useful without an ORDER BY to define which row in the
> partition is "last".
>
> * Investigate whether we should prohibit window functions in recursive
> terms; check whether any of the committed prohibitions are unnecessary.
>
> * Look at tuplestore performance issues. The tuplestore_in_memory()
> thing is just a band-aid, we ought to try to solve it properly.
> tuplestore_advance seems like a weak spot as well.
>
> * Do we really need so much duplicated code between Agg and WindowAgg?

Hitoshi and I did briefly talk about these two a few months back, but there
were other priorities at the time.

Unsure how difficult it is, maybe another one for a TODO, 8.4 or 8.5 I'm not
sure:

* Minimise sorts in a query such as:

david=# explain SELECT depname,
david-# SUM(salary) OVER (ORDER BY salary),
david-# SUM(salary) OVER (ORDER BY empno)
david-# FROM empsalary
david-# ORDER BY salary;
QUERY PLAN
----------------------------------------------------------------------------
----------------
Sort (cost=213.15..215.75 rows=1040 width=44)
Sort Key: salary
-> WindowAgg (cost=142.83..161.03 rows=1040 width=44)
-> Sort (cost=142.83..145.43 rows=1040 width=44)
Sort Key: empno
-> WindowAgg (cost=72.52..90.72 rows=1040 width=44)
-> Sort (cost=72.52..75.12 rows=1040 width=44)
Sort Key: salary
-> Seq Scan on empsalary (cost=0.00..20.40
rows=1040 width=44)

In the above case it would be more efficient to evaluate windows with the
same order by clause as the final results last to eliminate the final sort.

In the above query Oracle 10g performs 2 sorts, DB2 and Sybase perform 3
sorts. We also perform 3.

Also perhaps more difficult? Maybe 8.5...

* Teach planner to decide which window to evaluate first based on costs.
Currently the first window in the query is evaluated first, there may be no
index to help sort the first window, but perhaps there are for other windows
in the query. This may allow an index scan instead of a seqscan -> sort.

I Oracle 10g seems to have the above capability but Sybase seems evaluate
the first window first. I've yet to look at DB2.

This would stop performance critical queries from looking like:

SELECT id,sum_value2,sum_value
FROM (
SELECT id,
SUM(value) OVER (ORDER BY idxcol) AS sum_value,
SUM(value2) OVER (ORDER BY no_idxcol) AS sum_value2
FROM some_table
) t; -- Allow index scan by putting the indexed column as the 1st window

When they could look like:

SELECT id,
SUM(value2) OVER (ORDER BY no_idxcol) AS sum_value2,
SUM(value) OVER (ORDER BY idxcol) AS sum_value
FROM some_table; -- Planner has the option to eval 2nd window first due to
index.

David.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jaime Casanova 2008-12-28 21:30:00 Re: WIP: Automatic view update rules
Previous Message Tom Lane 2008-12-28 20:21:16 ecpg regression test failures caused by window functions patch