Re: Allowing join removals for more join types

From: David Rowley <dgrowley(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: David Rowley <dgrowleyml(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Noah Misch <noah(at)leadboat(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Dilip kumar <dilip(dot)kumar(at)huawei(dot)com>
Subject: Re: Allowing join removals for more join types
Date: 2014-07-09 00:23:49
Message-ID: CAHoyFK_ef80mS2Jgk=GkoJNVoux3OwrHNdcBWw5VfG5fak=Njw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 9 July 2014 09:27, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> David Rowley <dgrowleyml(at)gmail(dot)com> writes:
> > On Tue, Jul 8, 2014 at 4:28 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >> I'm a bit skeptical as to whether testing for that case is actually
> worth
> >> any extra complexity. Do you have a compelling use-case? But anyway,
> >> if we do want to allow it, why does it take any more than adding a check
> >> for Consts to the loops in query_is_distinct_for? It's the targetlist
> >> entries where we'd want to allow Consts, not the join conditions.
>
> > I don't really have a compelling use-case, but you're right, it's just a
> > Const check in query_is_distinct_for(), it seems simple enough so I've
> > included that in my refactor of the patch to use query_is_distinct_for().
> > This allows the regression tests all to pass again.
>
> Meh. "I wrote a regression test that expects it" is a pretty poor
> rationale for adding logic. If you can't point to a real-world case
> where this is important, I'm inclined to take it out.
>
>
Ok, I'll pull that logic back out when I get home tonight.

> If we were actually serious about exploiting such cases, looking for
> bare Consts would be a poor implementation anyhow, not least because
> const-folding has not yet been applied to the sub-select. I think we'd
> want to do it for any pseudoconstant expression (no Vars, no volatile
> functions); which is a substantially more expensive test.
>
> > 1. The fast path code that exited in join_is_removable() for subquery's
> > when the subquery had no group or distinct clause is now gone. I wasn't
> too
> > sure that I wanted to assume too much about what query_is_distinct_for
> may
> > do in the future and I thought if I included some logic in
> > join_is_removable() to fast path, that one day it may fast path wrongly.
>
> Or put a quick-check subroutine next to query_is_distinct_for(). The code
> we're skipping here is not so cheap that I want to blow off skipping it.
>

Ok, good idea. I'll craft something up tonight along those lines.

> On review it looks like analyzejoins.c would possibly benefit from an
> earlier fast-path check as well.
>
>
Do you mean for non-subqueries? There already is a check to see if the
relation has no indexes.

> > 2. The patch I submitted here
> >
> http://www.postgresql.org/message-id/CAApHDvrfVkH0P3FAooGcckBy7feCJ9QFanKLkX7MWsBcxY2Vcg@mail.gmail.com
> > if that gets accepted then it makes the check for set returning functions
> > in join_is_removable void.
>
> Right (and done, if you didn't notice already).
>
>
Thanks, I noticed that this morning. I'll remove the (now) duplicate check
from the patch

> TBH I find the checks for FOR UPDATE and volatile functions to be
> questionable as well. We have never considered those things to prevent
> pushdown of quals into a subquery (cf subquery_is_pushdown_safe). I think
> what we're talking about here is pretty much equivalent to pushing an
> always-false qual into the subquery; if people haven't complained about
> that, why should they complain about this? Or to put it in slightly more
> principled terms, we've attempted to prevent subquery optimization from
> causing volatile expressions to be evaluated *more* times than the naive
> reading of the query would suggest, but we have generally not felt that
> we needed to prevent them from happening *fewer* times.
>
>
Well, that's a real tough one for me as I only added that based on what you
told me here:

On 20 May 2014 23:22, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

>
> I doubt you should drop a subquery containing FOR UPDATE, either.
> That's a side effect, just as much as a volatile function would be.
>
> regards, tom lane
>

As far as I know the FOR UPDATE check is pretty much void as of now anyway,
since the current state of query_is_distinct_for() demands that there's
either a DISTINCT, GROUP BY or just a plain old aggregate without any
grouping, which will just return a single row, neither of these will allow
FOR UPDATE anyway. I really just added the check just to protect the code
from possible future additions to query_is_distinct_for() which may add
logic to determine uniqueness by some other means.

So the effort here should be probably be more focused on if we should allow
the join removal when the subquery contains volatile functions. We should
probably think fairly hard on this now as I'm still planning on working on
INNER JOIN removals at some point and I'm thinking we should likely be
consistent between the 2 types of join for when it comes to FOR UPDATE and
volatile functions, and I'm thinking right now that for INNER JOINs that,
since they're INNER that we could remove either side of the join. In that
case maybe it would be harder for the user to understand why their volatile
function didn't get executed.

Saying that... off the top of my head I can't remember what we'd do in a
case like:

create view v_a as select a,volatilefunc(a) AS funcresult from a;

select a from v_a;

Since we didn't select funcresult, do we execute the function?

Perhaps we should base this on whatever that does?

I can't give much more input on that right now. I'll have a look at the
docs later to see if when mention anything about any guarantees about
calling volatile functions.

Regards

David Rowley

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2014-07-09 00:59:00 Re: Allowing join removals for more join types
Previous Message Alvaro Herrera 2014-07-08 21:52:15 Re: [bug fix or improvement?] Correctly place DLLs for ECPG apps in bin folder