Patch to add functionality to specify ORDER BY in CREATE FUNCTION for SRFs

From: Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
To: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Patch to add functionality to specify ORDER BY in CREATE FUNCTION for SRFs
Date: 2015-01-05 15:12:25
Message-ID: CAOeZVifqzpj5yafnB9OqXdAypvBrFhV3Am=Q2GAwjk2JgctyVw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi All,

Please forgive if this is a repost.

Please find attached patch for supporting ORDER BY clause in CREATE
FUNCTION for SRFs. Specifically:

CREATE OR REPLACE FUNCTION func1(OUT e int, OUT f int) returns setof record
as ' SELECT a,b FROM table1 ORDER BY a; ' language 'sql' ORDER BY e;

This shall allow for capturing information about existing preorder that
might be present inherently in the SRF's input or algorithm (the above
example and think generate_series).

This allows for eliminating sorts that can be based off the known existing
preorder. For eg:

SELECT * FROM correct_order_singlecol() ORDER BY e; # Does not need to sort
by e since existing preorder is known.

Eliminating such sorts can be a huge gain, especially if the expected input
to needed Sort node is large.

The obvious question that comes is what happens if specified ORDER BY
clause is false. For checking the order, a new node is added which is top
node of the plan and is responsible for projecting result rows. It tracks
the previous row seen and given a sort order, ensures that the current
tuple to be projected is in the required sort order.

So, for above example

EXPLAIN (COSTS OFF) SELECT * FROM correct_order_multicol() ORDER BY e;
QUERY PLAN
-----------------------------------------------
OrderCheck
-> Function Scan on correct_order_multicol
(2 rows)

If order of result rows is not the same as required, an error is raised:

SELECT * FROM incorrect_order_nulls() ORDER BY e NULLS LAST;
ERROR: Order not same as specified

Preorder columns are first transformed into SortGroupClauses first and then
stored directly in pg_proc.

This functionality is a user case seen functionality, and is especially
useful when SRF inputs are large and/or might be pipelined from another
function (SRFs are used in pipelines in analytical systems many times, with
large data).

The overhead of this patch is small. A new path is added for the preorder
keys, and OrderCheck node's additional cost is pretty low, given that it
only compares two rows and stores only a single row (previous row seen),
hence the memory footprint is minuscule.

In the inner joins thread, Tom mentioned having a new node which has
multiple plans and executor can decide which plan to execute given runtime
conditions. I played around with the idea, and am open to experiment having
a new node which has a Sort based plan and is executed in case OrderCheck
node sees that the inherent order of result tuples is not correct. Feedback
here would be very welcome.

I will add the patch to current commitfest.

Thoughts?

Regards,

Atri

Attachment Content-Type Size
orderbycreatefuncver1.patch.gz application/x-gzip 97.9 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Vladimir Borodin 2015-01-05 15:15:52 Check that streaming replica received all data after master shutdown
Previous Message Robert Haas 2015-01-05 15:01:07 Re: Parallel Seq Scan