updated join removal patch

Lists: pgsql-hackers
From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: updated join removal patch
Date: 2009-09-06 03:16:43
Message-ID: 603c8f070909052016i4fe4c41ahf5ae4a09480e3648@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Here we go again. Following Tom's advice to not insert crocks in
add_path(), I've instead introduced a noopjoin path type which ignores
its inner side. This could possibly be simplified to just a "noop"
path that doesn't even include an outer side, but the way I've done it
here doesn't really cost anything and might allow debug pprints to
print more useful information. If, in the future, we have some other
use for a noop path, we might want to reconsider, though.

Comments appreciated. This still needs some doc changes, and maybe
some regression tests, but I am not 100% sure that the basic
implementation will meet with general approval as yet, so don't want
to put the cart before the horse.

Thanks,

...Robert

Attachment Content-Type Size
join_removal.2009-09-05.patch text/x-patch 33.2 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: updated join removal patch
Date: 2009-09-16 01:29:58
Message-ID: 26443.1253064598@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> Here we go again. Following Tom's advice to not insert crocks in
> add_path(), I've instead introduced a noopjoin path type which ignores
> its inner side. This could possibly be simplified to just a "noop"
> path that doesn't even include an outer side, but the way I've done it
> here doesn't really cost anything and might allow debug pprints to
> print more useful information. If, in the future, we have some other
> use for a noop path, we might want to reconsider, though.

I took a closer look at this version of the patch. Some comments:

* I'm not really happy with the "NoopJoinPath" representation. In the
first place, it isn't a join and should not be carrying a useless inner
path. The other nasty thing about it is that it violates the rule that
path.pathtype is supposed to be the nodetag for the executable plan node
that corresponds to the path. My original thought about this was to just
use the left input's paths as-is. I am not totally sure that that works
--- up to now, all the members of a reloptinfo's path list have had that
same reloptinfo as their parent --- but it seems less ugly than this.

A possible compromise is to use T_SubqueryScan as the pathtype, with
the idea that we're pretending a SubqueryScan is to be inserted and then
immediately optimized away. But I don't want the inner path in there
in any case.

* Speaking of the left input, it's not good enough to copy up only the
cheapest startup and cheapest total paths. If there are any other paths
there at all, they're probably there because they have interesting sort
orders. With the patch as-is, it'd be possible for join removal to be a
net loss because it forces an extra sort at an upper level. Now it's
possible that some of those sort orderings are only interesting to use for
mergejoining in the same join that we're removing; in which case they
could safely be discarded. But I'm not sure that's worth checking for,
and I am sure that throwing everything away shouldn't be the base
behavior.

* I'm quite unimpressed with the refactoring you did to make a single
function deal with merge clauses, hash clauses, *and* join removal
clauses. I think that makes it close to unmaintainable and is buying
little if anything speedwise. We can afford another iteration over
the join clauses, especially if there's a GUC to let people turn it
off. (BTW, do we really need that GUC, or is it only there for testing
the patch? I'm leaning towards not having it.)

* I'm not sure about this, because surely you would have tested it,
but isn't it looking at the wrong side of the join clauses? I thought
the idea is to prove the nullable (inner) side of the join unique.

* Not entirely sure where to put the code that does the hard work
(relation_is_distinct_for). I see you put it in pathnode.c beside
query_is_distinct_for, but the *only* reason the latter is where it is
is that it has only one caller, namely create_unique_path, which is
(and belongs) in that module. Opening up pathnode's API to include a
function unrelated to its purpose doesn't seem right. So I'm inclined
to think they should both go someplace else, just not sure where.
There doesn't seem to be any planner file whose charter is to export
this type of knowledge; maybe we need to add one.

* It shouldn't be that hard to support expression indexes. I think the
only reason you can't do it right now is that you chose an intermediate
data structure that presumes simple Vars ... but if you refactor to
have bespoke code examining the indexes and joinclauses together,
I think it would work all right.

* I wonder whether all the relevant clauses are really to be found as
join clauses. Consider
tab1 left join tab2 on (tab1.a = tab2.x and tab2.y = 42)
where tab2 has a unique index on (x,y). This would at least suggest
examining the inner rel's baserestrictclauses along with the current
join's clauses.

* I wouldn't really bother with cost_noopjoin. It does not, and never
will, encapsulate any interesting knowledge. In other places where we
have similar issues (eg no-op UniquePaths in create_unique_path), we
just copy up the child path's costs without any folderol.

Do you want to have another go at this, or shall I?

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: updated join removal patch
Date: 2009-09-16 02:10:37
Message-ID: 603c8f070909151910x19c69651xda5b4854991301bd@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Sep 15, 2009 at 9:29 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> Here we go again.  Following Tom's advice to not insert crocks in
>> add_path(), I've instead introduced a noopjoin path type which ignores
>> its inner side.  This could possibly be simplified to just a "noop"
>> path that doesn't even include an outer side, but the way I've done it
>> here doesn't really cost anything and might allow debug pprints to
>> print more useful information.  If, in the future, we have some other
>> use for a noop path, we might want to reconsider, though.
>
> I took a closer look at this version of the patch.  Some comments:
>
> * I'm not really happy with the "NoopJoinPath" representation.  In the
> first place, it isn't a join and should not be carrying a useless inner
> path.  The other nasty thing about it is that it violates the rule that
> path.pathtype is supposed to be the nodetag for the executable plan node
> that corresponds to the path.  My original thought about this was to just
> use the left input's paths as-is.  I am not totally sure that that works
> --- up to now, all the members of a reloptinfo's path list have had that
> same reloptinfo as their parent --- but it seems less ugly than this.
>
> A possible compromise is to use T_SubqueryScan as the pathtype, with
> the idea that we're pretending a SubqueryScan is to be inserted and then
> immediately optimized away.  But I don't want the inner path in there
> in any case.

I don't have strong feelings about it. I thought about making just a
Noop path type, but I couldn't see any clear reason to prefer it one
way or the other. The only reason we need a path type at all is
because you didn't like the crocks I inserted into add_path() to avoid
pfree()-ing paths that might still be pointed to from elsewhere. But
those crocks were VASTLY simpler than this, which feels completely
Rube Goldberg-esque to me. If it were up to me, we'd be either
reinserting those crocks, or looking for some less crock-y variant of
them.

> * Speaking of the left input, it's not good enough to copy up only the
> cheapest startup and cheapest total paths.  If there are any other paths
> there at all, they're probably there because they have interesting sort
> orders.  With the patch as-is, it'd be possible for join removal to be a
> net loss because it forces an extra sort at an upper level.  Now it's
> possible that some of those sort orderings are only interesting to use for
> mergejoining in the same join that we're removing; in which case they
> could safely be discarded.  But I'm not sure that's worth checking for,
> and I am sure that throwing everything away shouldn't be the base
> behavior.

Good point.

> * I'm quite unimpressed with the refactoring you did to make a single
> function deal with merge clauses, hash clauses, *and* join removal
> clauses.  I think that makes it close to unmaintainable and is buying
> little if anything speedwise.  We can afford another iteration over
> the join clauses, especially if there's a GUC to let people turn it
> off.  (BTW, do we really need that GUC, or is it only there for testing
> the patch?  I'm leaning towards not having it.)

You're the committer; I'm not. But I completely disagree. There
isn't any reason at all to duplicate this logic in two separate
places, let alone three. I'd actually be in favor of merging the
existing two cases even if we weren't adding join removal. The
existing code does the identical tests in hash_inner_and_outer() in a
slightly different order than select_mergejoin_clauses() for no reason
at all.

As for a GUC, I think it would be useful to have for debugging, or in
case of bugs. It's really another join strategy, and we have enable_*
flags for all the others. But if you don't want it for some reason,
I'm not in a position to twist your arm.

> * I'm not sure about this, because surely you would have tested it,
> but isn't it looking at the wrong side of the join clauses?  I thought
> the idea is to prove the nullable (inner) side of the join unique.

Grr. I think it's more broken than that. Wow, this is really embarassing.

> * Not entirely sure where to put the code that does the hard work
> (relation_is_distinct_for).  I see you put it in pathnode.c beside
> query_is_distinct_for, but the *only* reason the latter is where it is
> is that it has only one caller, namely create_unique_path, which is
> (and belongs) in that module.  Opening up pathnode's API to include a
> function unrelated to its purpose doesn't seem right.  So I'm inclined
> to think they should both go someplace else, just not sure where.
> There doesn't seem to be any planner file whose charter is to export
> this type of knowledge; maybe we need to add one.

Yeah, it seemed a little weird to me. For a while I was reusing some
of the support functions that query_is_distinct_for() calls, but the
final version doesn't. I wonder if it should just be moved to
joinpath.c; it seems to fit in better with what is going on there.

> * It shouldn't be that hard to support expression indexes.  I think the
> only reason you can't do it right now is that you chose an intermediate
> data structure that presumes simple Vars ... but if you refactor to
> have bespoke code examining the indexes and joinclauses together,
> I think it would work all right.

I'm not sure how to compare exprs, or what you mean by bespoke code,
but I'd be happy to have someone generalize the approach.

> * I wonder whether all the relevant clauses are really to be found as
> join clauses.  Consider
>        tab1 left join tab2 on (tab1.a = tab2.x and tab2.y = 42)
> where tab2 has a unique index on (x,y).  This would at least suggest
> examining the inner rel's baserestrictclauses along with the current
> join's clauses.

Also a good point.

> * I wouldn't really bother with cost_noopjoin.  It does not, and never
> will, encapsulate any interesting knowledge.  In other places where we
> have similar issues (eg no-op UniquePaths in create_unique_path), we
> just copy up the child path's costs without any folderol.

Don't care one way or the other.

> Do you want to have another go at this, or shall I?

Please feel free if you're so inclined.

...Robert


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: updated join removal patch
Date: 2009-09-16 02:25:18
Message-ID: 603c8f070909151925x4c930752wff88ab1b46c0c749@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Sep 15, 2009 at 10:10 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> * I'm not sure about this, because surely you would have tested it,
>> but isn't it looking at the wrong side of the join clauses?  I thought
>> the idea is to prove the nullable (inner) side of the join unique.
>
> Grr.  I think it's more broken than that.  Wow, this is really embarassing.

Well, you're definitely right that it's looking at the wrong side of
the join clauses. Still trying to figure out if there is another bug,
too.

...Robert


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: updated join removal patch
Date: 2009-09-16 03:00:27
Message-ID: 603c8f070909152000h49187b90va14fb6f90efc2f3e@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Sep 15, 2009 at 10:25 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Tue, Sep 15, 2009 at 10:10 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>>> * I'm not sure about this, because surely you would have tested it,
>>> but isn't it looking at the wrong side of the join clauses?  I thought
>>> the idea is to prove the nullable (inner) side of the join unique.
>>
>> Grr.  I think it's more broken than that.  Wow, this is really embarassing.
>
> Well, you're definitely right that it's looking at the wrong side of
> the join clauses.  Still trying to figure out if there is another bug,
> too.

It looks to me like relation_is_distinct_for() is also horribly broken
in my previous version. I think the attached is how it is supposed to
work.

...Robert

Attachment Content-Type Size
join_removal.2009-09-15.fixes application/octet-stream 1.2 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: updated join removal patch
Date: 2009-09-17 20:59:13
Message-ID: 3835.1253221153@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Tue, Sep 15, 2009 at 9:29 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>>> Here we go again. Following Tom's advice to not insert crocks in
>>> add_path(), I've instead introduced a noopjoin path type which ignores
>>> its inner side.

I've committed this with some revisions. As to the points discussed
earlier:

>> * I'm not really happy with the "NoopJoinPath" representation. ...
>> A possible compromise is to use T_SubqueryScan as the pathtype, with
>> the idea that we're pretending a SubqueryScan is to be inserted and then
>> immediately optimized away. But I don't want the inner path in there
>> in any case.

> I don't have strong feelings about it. I thought about making just a
> Noop path type, but I couldn't see any clear reason to prefer it one
> way or the other.

I ended up using a NoOpPath struct with pathtype = T_Join, which is a
dummy superclass nodetag that never really gets instantiated. The
SubqueryScan idea looked too messy after I remembered that createplan.c
likes to switch() on the pathtype, so having two different pathtype
uses of T_SubqueryScan seemed pretty ugly. But T_Join isn't really used
anywhere, despite being nominally a executable plan type. It's pretty
much a judgment call whether that's really cleaner than using the path's
own NodeTag...

>> * I'm quite unimpressed with the refactoring you did to make a single
>> function deal with merge clauses, hash clauses, *and* join removal
>> clauses. I think that makes it close to unmaintainable and is buying
>> little if anything speedwise.

> You're the committer; I'm not. But I completely disagree. There
> isn't any reason at all to duplicate this logic in two separate
> places, let alone three. I'd actually be in favor of merging the
> existing two cases even if we weren't adding join removal.

No, I still think this was a bad idea. There are *parts* of those
tests that are similar, but combining them all into one function is
just a recipe for bugs.

> As for a GUC, I think it would be useful to have for debugging, or in
> case of bugs. It's really another join strategy, and we have enable_*
> flags for all the others. But if you don't want it for some reason,
> I'm not in a position to twist your arm.

I left it out for the moment. If anyone can point to a case where the
join removal logic slows planning noticeably without gaining anything,
we can reconsider.

>> * Not entirely sure where to put the code that does the hard work
>> (relation_is_distinct_for).

> Yeah, it seemed a little weird to me. For a while I was reusing some
> of the support functions that query_is_distinct_for() calls, but the
> final version doesn't. I wonder if it should just be moved to
> joinpath.c; it seems to fit in better with what is going on there.

I ended up putting the index-specific logic in indxpath.c, which aside
from seeming reasonable made it easier to deal with expression indexes.
If we ever add another test method that doesn't depend on indexes, we
can put it in a reasonable place and have joinpath.c call that too.

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: updated join removal patch
Date: 2009-09-17 21:14:23
Message-ID: 603c8f070909171414p5135aa58q186397232ad341e1@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Sep 17, 2009 at 4:59 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Tue, Sep 15, 2009 at 9:29 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>>>> Here we go again.  Following Tom's advice to not insert crocks in
>>>> add_path(), I've instead introduced a noopjoin path type which ignores
>>>> its inner side.
>
> I've committed this with some revisions.  As to the points discussed
> earlier:

Thanks, it looks nice. I wonder if we shouldn't add some documentation
or regression tests, though - any thoughts?

>>> * I'm not really happy with the "NoopJoinPath" representation. ...
>>> A possible compromise is to use T_SubqueryScan as the pathtype, with
>>> the idea that we're pretending a SubqueryScan is to be inserted and then
>>> immediately optimized away.  But I don't want the inner path in there
>>> in any case.
>
>> I don't have strong feelings about it.  I thought about making just a
>> Noop path type, but I couldn't see any clear reason to prefer it one
>> way or the other.
>
> I ended up using a NoOpPath struct with pathtype = T_Join, which is a
> dummy superclass nodetag that never really gets instantiated.  The
> SubqueryScan idea looked too messy after I remembered that createplan.c
> likes to switch() on the pathtype, so having two different pathtype
> uses of T_SubqueryScan seemed pretty ugly.  But T_Join isn't really used
> anywhere, despite being nominally a executable plan type.  It's pretty
> much a judgment call whether that's really cleaner than using the path's
> own NodeTag...

Yeah. I haven't been able to wrap my head around why it's good to
have two tags on some of these things.

>>> * I'm quite unimpressed with the refactoring you did to make a single
>>> function deal with merge clauses, hash clauses, *and* join removal
>>> clauses.  I think that makes it close to unmaintainable and is buying
>>> little if anything speedwise.
>
>> You're the committer; I'm not.  But I completely disagree.  There
>> isn't any reason at all to duplicate this logic in two separate
>> places, let alone three.  I'd actually be in favor of merging the
>> existing two cases even if we weren't adding join removal.
>
> No, I still think this was a bad idea.  There are *parts* of those
> tests that are similar, but combining them all into one function is
> just a recipe for bugs.

Having read your commit, it makes more sense to me. The fact that
we're now looking at innerrel->baserestrictinfo also is a pretty
powerful argument for your way.

>> As for a GUC, I think it would be useful to have for debugging, or in
>> case of bugs.  It's really another join strategy, and we have enable_*
>> flags for all the others.  But if you don't want it for some reason,
>> I'm not in a position to twist your arm.
>
> I left it out for the moment.  If anyone can point to a case where the
> join removal logic slows planning noticeably without gaining anything,
> we can reconsider.

Well, the purpose of such flags is for debugging, not production. The
only argument I can see for not having one for join removal versus
nestloop, mergejoin, etc. is that it's pretty easy to kill removal of
a particular join by tweaking the output list, whereas it is not
comparably easy to get the planner to pick a different join type
otherwise.

>>> * Not entirely sure where to put the code that does the hard work
>>> (relation_is_distinct_for).
>
>> Yeah, it seemed a little weird to me.  For a while I was reusing some
>> of the support functions that query_is_distinct_for() calls, but the
>> final version doesn't.  I wonder if it should just be moved to
>> joinpath.c; it seems to fit in better with what is going on there.
>
> I ended up putting the index-specific logic in indxpath.c, which aside
> from seeming reasonable made it easier to deal with expression indexes.
> If we ever add another test method that doesn't depend on indexes, we
> can put it in a reasonable place and have joinpath.c call that too.

Sounds fine to me. I think that the next project in this area should
be to try to support removal of INNER joins. (Removal of SEMI, ANTI,
and FULL joins seems unlikely ever to be interesting.) That will
require teaching the planner about foreign key constraints, which
interestingly opens up some other interesting optimization
opportunities. An INNER join that can never match zero rows can
alternatively be implemented as a LEFT join, which can be reordered in
some situations where an inner join can't. e.g. A LJ (B IJ C ON Pbc)
ON Pab can be implemented as (A LJ B ON Pab) LJ/IJ C ON Pbc if it is
the case that for every row in B, there will be at least one row in C
such that Pbc holds.

Thanks again for fixing this up, and committing it. I have a TON of
queries that will benefit from this, and it's one of the reasons why I
started reading this mailing list on a regular basis and getting
involved in development. For me, it's worth an upgrade by itself.

...Robert


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: updated join removal patch
Date: 2009-09-18 17:26:51
Message-ID: 3531.1253294811@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Thu, Sep 17, 2009 at 4:59 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>>> You're the committer; I'm not. But I completely disagree. There
>>> isn't any reason at all to duplicate this logic in two separate
>>> places, let alone three. I'd actually be in favor of merging the
>>> existing two cases even if we weren't adding join removal.
>>
>> No, I still think this was a bad idea. There are *parts* of those
>> tests that are similar, but combining them all into one function is
>> just a recipe for bugs.

> Having read your commit, it makes more sense to me. The fact that
> we're now looking at innerrel->baserestrictinfo also is a pretty
> powerful argument for your way.

Looking at it some more, I think that there is some value in factoring
out the tests to see if the clause has the right variable membership,
so I did that.

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: updated join removal patch
Date: 2009-09-18 17:48:02
Message-ID: 603c8f070909181048jac93afaw71a8734d30011466@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Sep 18, 2009 at 1:26 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Thu, Sep 17, 2009 at 4:59 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>>>> You're the committer; I'm not.  But I completely disagree.  There
>>>> isn't any reason at all to duplicate this logic in two separate
>>>> places, let alone three.  I'd actually be in favor of merging the
>>>> existing two cases even if we weren't adding join removal.
>>>
>>> No, I still think this was a bad idea.  There are *parts* of those
>>> tests that are similar, but combining them all into one function is
>>> just a recipe for bugs.
>
>> Having read your commit, it makes more sense to me.  The fact that
>> we're now looking at innerrel->baserestrictinfo also is a pretty
>> powerful argument for your way.
>
> Looking at it some more, I think that there is some value in factoring
> out the tests to see if the clause has the right variable membership,
> so I did that.

Mmm, I like that. Putting that bunch of hairy logic in a subroutine
instead of repeating it in several places definitely seems better. I
don't really like the name "clause_matches_join", though. It's more
like "clause has well-defined sides, and mark which is which as a
side-effect".

...Robert


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: updated join removal patch
Date: 2009-09-18 17:58:23
Message-ID: 4066.1253296703@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> Mmm, I like that. Putting that bunch of hairy logic in a subroutine
> instead of repeating it in several places definitely seems better. I
> don't really like the name "clause_matches_join", though. It's more
> like "clause has well-defined sides, and mark which is which as a
> side-effect".

It was the first thing that came to mind ... got a better idea?

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: updated join removal patch
Date: 2009-09-18 18:25:51
Message-ID: 603c8f070909181125k789d60c3ndce48f40fd088989@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Sep 18, 2009 at 1:58 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> Mmm, I like that.  Putting that bunch of hairy logic in a subroutine
>> instead of repeating it in several places definitely seems better.  I
>> don't really like the name "clause_matches_join", though.  It's more
>> like "clause has well-defined sides, and mark which is which as a
>> side-effect".
>
> It was the first thing that came to mind ... got a better idea?

clause_has_well_defined_sides()?

...Robert


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: updated join removal patch
Date: 2009-09-18 19:06:11
Message-ID: 4752.1253300771@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Fri, Sep 18, 2009 at 1:58 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>>> Mmm, I like that. Putting that bunch of hairy logic in a subroutine
>>> instead of repeating it in several places definitely seems better. I
>>> don't really like the name "clause_matches_join", though.

>> It was the first thing that came to mind ... got a better idea?

> clause_has_well_defined_sides()?

Nah ... they're "well defined" in any case, they might just not be what
we need for the current join. As an example,

(a.f1 + b.f2) = c.f3

would be usable if joining {A B} to {C}, but not when joining
{A} to {B C}.

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: updated join removal patch
Date: 2009-09-18 19:46:06
Message-ID: 603c8f070909181246i235f9a46s361c06fa13f85d51@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Sep 18, 2009 at 3:06 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Fri, Sep 18, 2009 at 1:58 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>>>> Mmm, I like that.  Putting that bunch of hairy logic in a subroutine
>>>> instead of repeating it in several places definitely seems better.  I
>>>> don't really like the name "clause_matches_join", though.
>
>>> It was the first thing that came to mind ... got a better idea?
>
>> clause_has_well_defined_sides()?
>
> Nah ... they're "well defined" in any case, they might just not be what
> we need for the current join.  As an example,
>
>        (a.f1 + b.f2) = c.f3
>
> would be usable if joining {A B} to {C}, but not when joining
> {A} to {B C}.

The clauses are well-defined, but they don't have well-defined sides.
I see now what you're going for with clause_matches_join, but
"matches" is a pretty broad term, IMO.

...Robert


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: updated join removal patch
Date: 2009-09-18 19:52:15
Message-ID: 5254.1253303535@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> The clauses are well-defined, but they don't have well-defined sides.
> I see now what you're going for with clause_matches_join, but
> "matches" is a pretty broad term, IMO.

clause_sides_match_join?

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: updated join removal patch
Date: 2009-09-18 20:26:15
Message-ID: 603c8f070909181326g31b74c6ekbcfdac4ee5e66606@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Sep 18, 2009 at 3:52 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> The clauses are well-defined, but they don't have well-defined sides.
>> I see now what you're going for with clause_matches_join, but
>> "matches" is a pretty broad term, IMO.
>
> clause_sides_match_join?

Yes, that's perfect.

...Robert


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: updated join removal patch
Date: 2009-09-19 17:48:56
Message-ID: 17763.1253382536@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Fri, Sep 18, 2009 at 3:52 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> clause_sides_match_join?

> Yes, that's perfect.

Going once ... going twice ... sold.

regards, tom lane