Re: Changed SRF in targetlist handling

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Changed SRF in targetlist handling
Date: 2016-08-02 23:02:38
Message-ID: 11228.1470178958@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Andres Freund <andres(at)anarazel(dot)de> writes:
> I've an implementation that

> 1) turns all targetlist SRF (tSRF from now on) into ROWS FROM
> expressions. If there's tSRFs in the argument of a tSRF those becomes
> a separate, lateral, ROWS FROM expression.

> 2) If grouping/window functions are present, the entire query is wrapped
> in a subquery RTE, except for the set-returning function. All
> referenced Var|Aggref|GroupingFunc|WindowFunc|Param nodes in the
> original targetlist are made to reference that subquery, which gets a
> TargetEntry for them.

FWIW, I'd be inclined to do the subquery RTE all the time, adding some
optimization fence to ensure it doesn't get folded back. That fixes
your problem here:

> So far I have one problem without an easy solution: Historically queries
> like
> =# SELECT id, generate_series(1,2) FROM (VALUES(1),(2)) few(id);
>
> id generate_series
>
> 1 1
> 1 2
> 2 1
> 2 2
>
> have preserved the SRF output ordering. But by turning the SRF into a
> ROWS FROM, there's no guarantee that the cross join between "few" and
> generate_series(1,3) above is implemented in that order.

> Besides that I'm structurally wondering whether turning the original
> query into a subquery is the right thing to do. It requires some kind of
> ugly munching of Query->*, and has the above problem.

It does not seem like it should be that hard, certainly no worse than
subquery pullup. Want to show code?

> An alternative approach would be to do this during parse-analysis, but I
> think that might end up being confusing, because stored rules would
> suddenly have a noticeably different structure, and it'd tie us a lot
> more to the details of that transformation than I'd like.

-1 on that; we do not want this transformation visible in stored rules.

> Besides the removal of the least-common-multiple behaviour of tSRF queries,
> there's some other consequences that using function scans have:
> Previously if a tSRF was never evaluated, it didn't cause the number of
> rows from being increased. E.g.
> SELECT id, COALESCE(1, generate_series(1,2)) FROM (VALUES(1),(2)) few(id);
> only produced two rows. But using joins means that a simple
> implementation of using ROWS FROM returns four rows.

Hmm. I don't mind changing behavior in that sort of corner case.
If we're prepared to discard the LCM behavior, this seems at least
an order of magnitude less likely to be worth worrying about.

Having said that, I do seem to recall a bug report about misbehavior when
a SRF was present in just one arm of a CASE statement. That would have
the same type of behavior as you describe here, and evidently there's at
least one person out there depending on it.

Would it be worth detecting SRFs below CASE/COALESCE/etc and throwing
an error? It would be easier to sell throwing an error than silently
changing behavior, I think.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2016-08-02 23:16:15 Re: [sqlsmith] Failed assertion in joinrels.c
Previous Message Tom Lane 2016-08-02 22:40:37 Re: Double invocation of InitPostmasterChild in bgworker with -DEXEC_BACKEND