Re: Changed SRF in targetlist handling

From: Andres Freund <andres(at)anarazel(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Changed SRF in targetlist handling
Date: 2016-08-01 08:23:46
Message-ID: 20160801082346.nfp2g7mg74alifdc@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2016-05-25 16:55:23 -0400, Tom Lane wrote:
> Andres Freund <andres(at)anarazel(dot)de> writes:
> > On 2016-05-25 15:20:03 -0400, Tom Lane wrote:
> >> We could certainly make a variant behavior in nodeFunctionscan.c that
> >> emulates that, if we feel that being exactly bug-compatible on the point
> >> is actually what we want. I'm dubious about that though, not least
> >> because I don't think *anyone* actually believes that that behavior isn't
> >> broken. Did you read my upthread message suggesting assorted compromise
> >> choices?
>
> > You mean https://www.postgresql.org/message-id/21076.1464034513@sss.pgh.pa.us ?
> > If so, yes.
>
> > If we go with rewriting this into LATERAL, I'd vote for 2.5 (trailed by
> > option 1), that'd keep most of the functionality, and would break
> > visibly rather than invisibly in the cases where not.
>
> 2.5 would be okay with me.
>
> > I guess you're not planning to work on this?
>
> Well, not right now, as it's clearly too late for 9.6. I might hack on
> it later if nobody beats me to it.

FWIW, as it's blocking my plans for executor related rework (expression
evaluation, batch processing) I started to hack on this.

I've an implementation that

1) turns all targetlist SRF (tSRF from now on) into ROWS FROM
expressions. If there's tSRFs in the argument of a tSRF those becomes
a separate, lateral, ROWS FROM expression.

2) If grouping/window functions are present, the entire query is wrapped
in a subquery RTE, except for the set-returning function. All
referenced Var|Aggref|GroupingFunc|WindowFunc|Param nodes in the
original targetlist are made to reference that subquery, which gets a
TargetEntry for them.

3) If sortClause does *not* reference any tSRFs the sorting is evaluated
in a subquery, to preserve the output ordering of SRFs in queries
like
SELECT id, generate_series(1,3) FROM (VALUES(1),(2)) d(id) ORDER BY id DESC;
if in contrast sortClause does reference the tSRF output, it's
evaluated in the outer SRF.

this seems to generally work, and allows to remove considerable amounts
of code.

So far I have one problem without an easy solution: Historically queries
like
=# SELECT id, generate_series(1,2) FROM (VALUES(1),(2)) few(id);
┌────┬─────────────────┐
│ id │ generate_series │
├────┼─────────────────┤
│ 1 │ 1 │
│ 1 │ 2 │
│ 2 │ 1 │
│ 2 │ 2 │
└────┴─────────────────┘
have preserved the SRF output ordering. But by turning the SRF into a
ROWS FROM, there's no guarantee that the cross join between "few" and
generate_series(1,3) above is implemented in that order. I.e. we can get
something like
┌────┬─────────────────┐
│ id │ generate_series │
├────┼─────────────────┤
│ 1 │ 1 │
│ 2 │ 1 │
│ 1 │ 2 │
│ 2 │ 2 │
└────┴─────────────────┘
because it's implemented as
┌──────────────────────────────────────────────────────────────────────────────┐
│ QUERY PLAN │
├──────────────────────────────────────────────────────────────────────────────┤
│ Nested Loop (cost=0.00..35.03 rows=2000 width=8) │
│ -> Function Scan on generate_series (cost=0.00..10.00 rows=1000 width=4) │
│ -> Materialize (cost=0.00..0.04 rows=2 width=4) │
│ -> Values Scan on "*VALUES*" (cost=0.00..0.03 rows=2 width=4) │
└──────────────────────────────────────────────────────────────────────────────┘

I right now see no easy and nice-ish way to constrain that.

Besides that I'm structurally wondering whether turning the original
query into a subquery is the right thing to do. It requires some kind of
ugly munching of Query->*, and has the above problem. One alternative
would be to instead perform the necessary magic in grouping_planner(),
by "manually" adding nestloop joins before/after create_ordered_paths()
(depending on SRFs being referenced in the sort clause). That'd create
plans we'd not have created so far, by layering NestLoop and
FunctionScan nodes above the normal query - that'd allow us to to easily
force the ordering of SRF evaluation.

If we go the subquery route, I'm wondering about where to tackle the
restructuring. So far I'm doing it very early in subquery_planner() -
otherwise the aggregation/sorting/... behaviour is easier to handle.
Perhaps doing it in standard_planner() itself would be better though.
An alternative approach would be to do this during parse-analysis, but I
think that might end up being confusing, because stored rules would
suddenly have a noticeably different structure, and it'd tie us a lot
more to the details of that transformation than I'd like.

Besides the removal of the least-common-multiple behaviour of tSRF queries,
there's some other consequences that using function scans have:
Previously if a tSRF was never evaluated, it didn't cause the number of
rows from being increased. E.g.
SELECT id, COALESCE(1, generate_series(1,2)) FROM (VALUES(1),(2)) few(id);
only produced two rows. But using joins means that a simple
implementation of using ROWS FROM returns four rows. We could try to
inject sufficient join conditions in that type of query, to prune down
the number of rows again, but I really don't want to go there - it's
kinda hard in the general case...

Comments?

Regards,

Andres

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2016-08-01 08:39:34 Re: pg_basebackup wish list
Previous Message Shay Rojansky 2016-08-01 07:32:52 Re: Slowness of extended protocol