Re: [PATCH] Simplify EXISTS subqueries containing LIMIT

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Marti Raudsepp <marti(at)juffo(dot)org>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] Simplify EXISTS subqueries containing LIMIT
Date: 2014-10-27 00:56:12
Message-ID: CAApHDvq2HKj_PArjsSPfZX3bnpnC3Lt1U=uwuX04gYt6RKjK1A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Oct 27, 2014 at 1:56 AM, Marti Raudsepp <marti(at)juffo(dot)org> wrote:

> On Wed, Oct 22, 2014 at 1:37 PM, David Rowley <dgrowleyml(at)gmail(dot)com>
> wrote:
> > I've had a bit of a look at this and here's a couple of things:
>
> Thanks. Sorry I didn't back to you earlier, I almost forgot about the
> review.
>
> > /*
> > + * LIMIT clause can be removed if it's a positive constant or
> ALL, to
> > + * prevent it from being an optimization barrier. It's a common
> meme to put
> > + * LIMIT 1 within EXISTS subqueries.
> > + */
> >
> > I think this comment may be better explained along the lines of:
> >
> > "A subquery which has a LIMIT clause with a positive value is
> effectively a
> > no-op in this scenario. With EXISTS we only care about the first row
> anyway,
> > so any positive limit value will have no behavioral change to the query,
> so
> > we'll simply remove the LIMIT clause. If we're unable to prove that the
> > LIMIT value is a positive number then we'd better not touch it."
>
> That's a bit redundant. Here's what I wrote instead, does this sound
> better?
>
> * A subquery that has a LIMIT clause with a positive value or NULL causes
> * no behavioral change to the query. With EXISTS we only care about the
> * first row anyway, so we'll simply remove the LIMIT clause. If the LIMIT
> * value does not reduce to a constant that's positive or NULL then do not
> * touch it.
>
>
Sounds better.

> > + /* Checking for negative values is done later; 0 is just silly */
> > + if (! limit->constisnull && DatumGetInt64(limit->constvalue) <= 0)
> >
> > I'm a bit confused by the comment here. You're saying that we'll check
> for
> > negatives later... but you're checking for <= 0 on the next line... Did I
> > miss something or is this a mistake?
>
> I meant that the case when a negative value raises an error is checked
> later in the execution pass. But you're right, this code has nothing
> to do with that so I've removed the mention about negative values. It
> now says:
>
> /* If it's not NULL and not positive, keep it as a regular subquery */
>
>
Also better.

> > I guess here you're just testing to ensure that you've not broken this
> and
> > that the new code does not accidentally remove the LIMIT clause. I think
> it
> > also might be worth adding some tests to ensure that co-related EXISTS
> > clauses with a positive LIMIT value get properly changed into a SEMI JOIN
> > instead of remaining as a subplan. And also make sure that a LIMIT 0 or
> > LIMIT -1 gets left as a subplan. I'd say such a test would supersede the
> > above test, and I think it's also the case you were talking about
> optimising
> > anyway?
>
> Sure, this is a planner-only change so it makes sense to test EXPLAIN
> output.
>
> The dilemma I always have is: wouldn't this cause failures due to plan
> instability, if for example autoanalyze gets around to this table
> earlier or later and nudges it from a hash join to merge etc? Or is
> this not a problem?
>
>
I guess it could be a problem. You could write your tests to use tenk1
instead, it seems to be analyzed in copy.sql

I'm reasonably happy with the patch now, the only small niggles are maybe
the Assert() is probably not so much needed as transformLimitClause() seems
to coerce to int8 anyway, and recompute_limits() does not bother with the
Assert() when it does the same thing, either way, it's certainly doing no
harm. The other thing was the regression tests, I'd rather see the tests
use 2 separate tables for the EXISTS checks, but this is likely only
because I'm thinking of doing some work in the future to remove duplicate
joins from queries, so perhaps it's fine for now.

Good work. Marking as ready for committer.

Regards

David Rowley

Patch attached with all above changes.
>
> Regards,
> Marti
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2014-10-27 01:28:49 Re: pset_quoted_string is broken
Previous Message Tom Lane 2014-10-27 00:29:19 Re: pg_dump/pg_restore seem broken on hamerkop