Re: Join Removal/ Vertical Partitioning

Lists: pgsql-hackers
From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Join Removal/ Vertical Partitioning
Date: 2008-06-26 10:57:07
Message-ID: 1214477827.3845.87.camel@ebony.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

There are common cases where we want to remove unwanted joins from
queries, especially with view and Object Relational Mapping systems such
as Hibernate etc.. (I've mentioned this before on -hackers or -perform,
but I can't find the links)

Typical case is where we have a class that has a subclass present fairly
infrequently, so we want to physically separate the subclass for
performance. It's easy to separate the data, but then much harder to get
the SQL to recognise that is what we have done. The same problem also
occurs in Data Warehousing and is typically known as Vertical
Partitioning in that context.

The attached example shows this for one and two subclasses.
[joinrem.sql]

Why do we care? Because the extra join hurts performance considerably.
pgbench test files attached, executed using dual CPU system with
pgbench -n -f <filename> -c 2 - t 30000

Access to 1 table 14,500 tps
Access to 2 tables 9,000 tps
Access to 3 tables 7,000 tps

We might typically expect there to be many subclasses that follow this
pattern (as well as some subclasses that are so frequent we would want
to merge them with the main class table, but that is not relevant to
discussion here), so the number of subclasses could be quite large.

explain select c_c1 from wholeclass2tables where pk = 6;

QUERY
PLAN
---------------------------------------------------------------------------------------
Nested Loop Left Join (cost=0.00..16.55 rows=1 width=4)
Join Filter: (c.pk = sc.pk)
-> Index Scan using class_pkey on class c (cost=0.00..8.27 rows=1
width=8)
Index Cond: (pk = 6)
-> Index Scan using subclass_pkey on subclass sc (cost=0.00..8.27
rows=1 width=4)
Index Cond: (sc.pk = 6)
(6 rows)

The EXPLAIN shows that we access both tables, even though the access to
"subclass" is not required to correctly resolve the query in *all*
cases. Similar plan for 3+ table access.

The join is removable because

* rows are returned by the query whether or not rows exist in subclass
* every row in "class" matches at *most* one row in "subclass" because
the join columns are unique on both sides of the join.
* the join operator function is IMMUTABLE (and is therefore able to be
optimised away when circumstances allow)

(Note that the FK link between the tables is not required to prove this,
it is just shown because that is the way we would typically do this).

Why can't we do this manually? We can do the "join removal" manually by
creating different sets of views that mention or don't mention the
tables. If we have say 5 sub-classes, then to optimise queries we would
need to produce 5! = 120 views, all defined with the different
combinations of tables that we might want to access. This is a
management nightmare, but so is the idea that we might need to run 5
table joins when only direct access to a single table is required.

We can check for removal of a rel by

1. inspecting the target list for the query to see if there are rels
that do not provide any attributes. (We might also use equivalence
classes to recode the targetlist to minimise the numbers of tables
touched, but I think that might be overkill).

2. checking the rel is on the excluding side of an outer join (i.e. the
right hand table in a "left outer join")

3. checking that the join columns on the rel are all the columns of any
unique index on the rel. (If we join on *more* or less columns than are
in the index then we must still do the join.)

Once we have marked all rels that are removable we then need to check
that we can still make one rel using the remaining tables. In some cases
that may not be possible just yet, because of the limited inference
possibilities across outer join boundaries, a problem discussed
elsewhere on -hackers. But it seems possible to allow removal of rels
mentioned in exactly one join.
i.e. given A <-> B <-> C then B is not removable, A and C are.

We do join removal as part of the prep before join planning takes place.
First we would annotate which rels have attributes that form part of the
targetlist during build_base_rel_tlists(). During or immediately after
deconstruct_jointree() we can attempt to remove joins by testing the
other conditions in order.

It looks to me that there would be little additional planning time for
cases where this optimisation would not apply.

I'm not sure whether it would be best to attempt to remove joins before
we have established equivalence classes or afterwards. In the longer
term, afterwards would be best. ISTM it will be easier to remove joins
before we attempt to establish a join order, yet many of the tests run
during join order selection would need to be run to test join removal.
So some thoughts on where to attempt this would be very useful.

Are there specific problems foreseen with this? Would working on this be
sensible before the outer join equivalence problem has been solved?

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support

Attachment Content-Type Size
joinrem.sql text/x-vhdl 891 bytes
joinrem1.pgb text/plain 75 bytes
joinrem2.pgb text/plain 87 bytes
joinrem3.pgb text/plain 87 bytes

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Join Removal/ Vertical Partitioning
Date: 2008-06-26 17:42:49
Message-ID: 18593.1214502169@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Simon Riggs <simon(at)2ndquadrant(dot)com> writes:
> We can check for removal of a rel by

> 1. inspecting the target list for the query to see if there are rels
> that do not provide any attributes. (We might also use equivalence
> classes to recode the targetlist to minimise the numbers of tables
> touched, but I think that might be overkill).

More to the point, it would be wrong. Equivalence classes do not imply
that two values considered equivalent are equal for all purposes, and
since we don't know what the client is going to do with the returned
data, we can't substitute some other value for the one requested.

> So some thoughts on where to attempt this would be very useful.

The hard part of this is figuring out where to do the work. As you say,
doing it during prepjointree seems the nicest from an abstract code
structure point of view, but it requires a lot of information that is
not derived until later.

It might be possible to treat "ignore the RHS" as a join strategy and
try to apply it while forming join relations, which would be late enough
to have all the needed info available.

regards, tom lane


From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Join Removal/ Vertical Partitioning
Date: 2008-06-27 08:31:43
Message-ID: 1214555503.3845.284.camel@ebony.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On Thu, 2008-06-26 at 13:42 -0400, Tom Lane wrote:
> Simon Riggs <simon(at)2ndquadrant(dot)com> writes:
> > We can check for removal of a rel by
>
> > 1. inspecting the target list for the query to see if there are rels
> > that do not provide any attributes. (We might also use equivalence
> > classes to recode the targetlist to minimise the numbers of tables
> > touched, but I think that might be overkill).
>
> More to the point, it would be wrong. Equivalence classes do not imply
> that two values considered equivalent are equal for all purposes, and
> since we don't know what the client is going to do with the returned
> data, we can't substitute some other value for the one requested.
>
> > So some thoughts on where to attempt this would be very useful.
>
> The hard part of this is figuring out where to do the work. As you say,
> doing it during prepjointree seems the nicest from an abstract code
> structure point of view, but it requires a lot of information that is
> not derived until later.

> It might be possible to treat "ignore the RHS" as a join strategy and
> try to apply it while forming join relations, which would be late enough
> to have all the needed info available.

Oh, actually have a join node that is a no-op, with a path cost of zero?
So we end up with an EXPLAIN like this:

QUERY
PLAN
---------------------------------------------------------------------------------------
Join Removed (cost=0.00..8.27 rows=1 width=4)
-> Index Scan using class_pkey on class c (cost=0.00..8.27 rows=1
width=8)
Index Cond: (pk = 6)
-> No Operation on subclass sc (cost=0.00..0.00 rows=0 width=0)
(4 rows)

That really does help, I think. The code allows us to say a join is
impossible, but not very easily to say a join doesn't exist.

I'll try it this way first. Maybe we'll see other ways as we go.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Join Removal/ Vertical Partitioning
Date: 2008-06-27 21:50:44
Message-ID: 12000.1214603444@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Simon Riggs <simon(at)2ndquadrant(dot)com> writes:
> On Thu, 2008-06-26 at 13:42 -0400, Tom Lane wrote:
>> It might be possible to treat "ignore the RHS" as a join strategy and
>> try to apply it while forming join relations, which would be late enough
>> to have all the needed info available.

> Oh, actually have a join node that is a no-op, with a path cost of zero?

Not even that: just return the best path(s) for the LHS as the paths for
the joinrel.

regards, tom lane


From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Join Removal/ Vertical Partitioning
Date: 2008-06-30 06:58:39
Message-ID: 1214809119.3845.442.camel@ebony.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On Fri, 2008-06-27 at 17:50 -0400, Tom Lane wrote:
> Simon Riggs <simon(at)2ndquadrant(dot)com> writes:
> > On Thu, 2008-06-26 at 13:42 -0400, Tom Lane wrote:
> >> It might be possible to treat "ignore the RHS" as a join strategy and
> >> try to apply it while forming join relations, which would be late enough
> >> to have all the needed info available.
>
> > Oh, actually have a join node that is a no-op, with a path cost of zero?
>
> Not even that: just return the best path(s) for the LHS as the paths for
> the joinrel.

Much neater. Cool.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support


From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Join Removal/ Vertical Partitioning
Date: 2008-08-14 10:10:46
Message-ID: 1218708646.5343.448.camel@ebony.2ndQuadrant
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On Thu, 2008-06-26 at 13:42 -0400, Tom Lane wrote:
> Simon Riggs <simon(at)2ndquadrant(dot)com> writes:
> > We can check for removal of a rel by...

OT comment: I just found a blog about Oracle's optimizermagic, which is
quite interesting. I notice there is a blog there about join removal,
posted about 12 hours later than my original post. Seems to validate the
theory anyway. Our posts have a wider audience than may be apparent :-)

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support


From: Gregory Stark <stark(at)enterprisedb(dot)com>
To: "Simon Riggs" <simon(at)2ndquadrant(dot)com>
Cc: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "pgsql-hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Join Removal/ Vertical Partitioning
Date: 2008-08-14 10:22:24
Message-ID: 87zlnfuc9r.fsf@oxford.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Simon Riggs" <simon(at)2ndquadrant(dot)com> writes:

> On Thu, 2008-06-26 at 13:42 -0400, Tom Lane wrote:
>> Simon Riggs <simon(at)2ndquadrant(dot)com> writes:
>> > We can check for removal of a rel by...
>
> OT comment: I just found a blog about Oracle's optimizermagic, which is
> quite interesting. I notice there is a blog there about join removal,
> posted about 12 hours later than my original post. Seems to validate the
> theory anyway. Our posts have a wider audience than may be apparent :-)

Well turnabout's fair play... what's the URL?

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL training!


From: "Robert Haas" <robertmhaas(at)gmail(dot)com>
To: "Gregory Stark" <stark(at)enterprisedb(dot)com>
Cc: "Simon Riggs" <simon(at)2ndquadrant(dot)com>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Join Removal/ Vertical Partitioning
Date: 2008-08-14 13:27:34
Message-ID: 603c8f070808140627g6c791768n6189939feee0cc3f@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I'm guessing it's this... looks pretty interesting even if not.

http://optimizermagic.blogspot.com/2008/06/why-are-some-of-tables-in-my-query.html

...Robert


From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Gregory Stark <stark(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Join Removal/ Vertical Partitioning
Date: 2008-08-14 14:12:19
Message-ID: 1218723139.5343.486.camel@ebony.2ndQuadrant
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On Thu, 2008-08-14 at 09:27 -0400, Robert Haas wrote:
> I'm guessing it's this... looks pretty interesting even if not.
>
> http://optimizermagic.blogspot.com/2008/06/why-are-some-of-tables-in-my-query.html

Yes, thanks for copying it in.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Join Removal/ Vertical Partitioning
Date: 2008-08-16 00:40:11
Message-ID: 200808160040.m7G0eBm08316@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Simon Riggs wrote:
>
> On Thu, 2008-06-26 at 13:42 -0400, Tom Lane wrote:
> > Simon Riggs <simon(at)2ndquadrant(dot)com> writes:
> > > We can check for removal of a rel by...
>
> OT comment: I just found a blog about Oracle's optimizermagic, which is
> quite interesting. I notice there is a blog there about join removal,
> posted about 12 hours later than my original post. Seems to validate the
> theory anyway. Our posts have a wider audience than may be apparent :-)

Yes, I think more people admire the work we do than we expect.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +