Re: Changed SRF in targetlist handling

From: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Merlin Moncure <mmoncure(at)gmail(dot)com>, David Fetter <david(at)fetter(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Changed SRF in targetlist handling
Date: 2016-05-23 20:32:07
Message-ID: CAKFQuwb7UQg5qztjx=0NkND4qWTycMcWFQohx--gbTzZE4nd5g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, May 23, 2016 at 4:15 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Merlin Moncure <mmoncure(at)gmail(dot)com> writes:
> > On Mon, May 23, 2016 at 2:13 PM, David Fetter <david(at)fetter(dot)org> wrote:
> >> On Mon, May 23, 2016 at 01:28:11PM -0500, Merlin Moncure wrote:
> >>> +1 on removing LCM.
>
> >> As a green field project, that would make total sense. As a thing
> >> decades in, it's not clear to me that that would break less stuff or
> >> break it worse than simply disallowing SRFs in the target list, which
> >> has been rejected on bugward-compatibility grounds. I suspect it
> >> would be even worse because disallowing SRFs in target lists would at
> >> least be obvious and localized when it broke code.
>
> > If I'm reading this correctly, it sounds to me like you are making the
> > case that removing target list SRF completely would somehow cause less
> > breakage than say, rewriting it to a LATERAL based implementation for
> > example. With more than a little forbearance, let's just say I don't
> > agree.
>
> We should consider single and multiple SRFs in a targetlist as distinct
> use-cases; only the latter has got weird properties.
>
> There are several things we could potentially do with multiple SRFs in
> the same targetlist. In increasing order of backwards compatibility and
> effort required:
>
> 1. Throw error if there's more than one SRF.
>
> 2. Rewrite into LATERAL ROWS FROM (srf1(), srf2(), ...). This would
> have the same behavior as before if the SRFs all return the same number
> of rows, and otherwise would behave differently.
>
> 3. Rewrite into some other construct that still ends up as a FunctionScan
> RTE node, but has the old LCM behavior if the SRFs produce different
> numbers of rows. (Perhaps we would not need to expose this construct
> as something directly SQL-visible.)
>
> It's certainly arguable that the common use-cases for SRF-in-tlist
> don't have more than one SRF per tlist, and thus that implementing #1
> would be an appropriate amount of effort. It's worth noting also that
> the LCM behavior has been repeatedly reported as a bug, and therefore
> that if we do #3 we'll be expending very substantial effort to be
> literally bug-compatible with ancient behavior that no one in the
> current development group thinks is well-designed. As far as #2 goes,
> it would have the advantage that code depending on the same-number-of-
> rows case would continue to work as before. David has a point that it
> would silently break application code that's actually depending on the
> LCM behavior, but how much of that is there likely to be, really?
>
> [ reflects a bit... ] I guess there is room for an option 2-and-a-half:
>
> 2.5. Rewrite into LATERAL ROWS FROM (srf1(), srf2(), ...), but decorate
> the FunctionScan RTE to tell the executor to throw an error if the SRFs
> don't all return the same number of rows, rather than silently
> null-padding. This would have the same behavior as before for the sane
> case, and would be very not-silent about cases where apps actually invoked
> the LCM behavior. Again, we wouldn't necessarily have to expose such an
> option at the SQL level. (Though it strikes me that such a restriction
> could have value in its own right, analogous to the STRICT options that
> we've invented in some other places to allow insisting on the expected
> numbers of rows being returned. ROWS FROM STRICT (srf1(), srf2(), ...),
> anybody?)
>

​I'd let the engineers decide between 1, 2.5, and 3 - but if we accomplish
our goals while implementing #3 I'd say that would be the best outcome for
the community as whole.

We don't have the luxury of providing a safe-usage mode where people
writing new queries get the error but pre-existing queries are considered
OK. We will have to rely upon education and deal with the occasional bug
report but our long-time customers, even if only a minority would be
affected, will appreciate the effort taken to not break code that has been
working for a long time.

The minority is likely small enough to at least make options 1 and 2.5
viable though I'd think we make an effort to avoid #1.

​David J.​

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message David G. Johnston 2016-05-23 20:33:41 Re: Changed SRF in targetlist handling
Previous Message Alvaro Herrera 2016-05-23 20:24:45 Re: Changed SRF in targetlist handling