Re: Join optimization for inheritance tables

From: Herodotos Herodotou <Herodotos(dot)Herodotou(at)asterdata(dot)com>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Eric Friedman <Eric(dot)Friedman(at)asterdata(dot)com>, John Cieslewicz <John(dot)Cieslewicz(at)asterdata(dot)com>, Dheeraj Pandey <Dheeraj(dot)Pandey(at)asterdata(dot)com>, "hero(at)cs(dot)duke(dot)edu" <hero(at)cs(dot)duke(dot)edu>, "nedyalko(at)cs(dot)duke(dot)edu" <nedyalko(at)cs(dot)duke(dot)edu>
Subject: Re: Join optimization for inheritance tables
Date: 2009-08-08 00:34:06
Message-ID: 43826FCDC252204EA7823B2E7CF3CCEC25FC2F4B@Pandora.AsterData.local
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


This patch extends the query optimizer to consider joins between child tables when hierarchies are joined together.

Short description: when the optimizer considers join paths between two tables with child tables, it only creates join paths over two append paths. In this patch we extend the optimizer to consider joins between the child tables from the two parent relations, based on the child table constraints. The idea is that only child tables with overlapping constraints need to be joined, and not the entire hierarchies with each other. This optimization can provide significant benefits in terms of execution time. A short motivation example as well as the overall algorithm description could be found in the attached presentation.

Herodotos & Nedyalko

________________________________________
From: gsstark(at)gmail(dot)com [gsstark(at)gmail(dot)com] On Behalf Of Greg Stark [gsstark(at)mit(dot)edu]
Sent: Monday, July 06, 2009 3:14 PM
To: Nedyalko Borisov
Cc: Tom Lane; pgsql-hackers(at)postgresql(dot)org; Herodotos Herodotou; Eric Friedman; John Cieslewicz; Dheeraj Pandey
Subject: Re: Join optimization for inheritance tables

On Mon, Jul 6, 2009 at 10:23 PM, Nedyalko Borisov<nedyalko(at)asterdata(dot)com> wrote:
> For example, typical observed scenario is: optimizer chooses Merge Join for
> two partitioned tables like the plan below:
> Merge Cond: (table1.id = table2.id)
> -> Sort
> Sort Key: id
> -> Append
> -> Seq Scan on table1_part1
> -> Seq Scan on table1_part2
> -> ....
> -> Seq Scan on table1_partN
> -> Sort
> Sort Key: id
> -> Append
> -> Seq Scan on table2_part1
> -> Seq Scan on table2_part2
> -> ....
> -> Seq Scan on table2_partM
>
> This plan ignores the fact there are indexes on the table2 partitions and
> that the pairwise partitions joins (index nested loop or hash join) will be
> faster than scanning all the partitions and sorting them.

To some degree my merge-append patch would mitigate this case. It
would allow the use of indexes on some or all the partitions to avoid
the sorts.

However it would still force all the partitions to be appended on each
side and then merged. If we could match up all the partitions then I
think this plan would be faster with the Append on top and separate
merge joins for each pair of partitions.

Aside from skipping the cpu cost of the merge-append I think it would
win for a few other reasons as well. Each join would be able to come
up with much better statistics which would enable it to pick a better
join when one is available. Even if the planner still picks a merge
join it would be much more likely to finish early and skip the
remainder of a partition on one side or the other.

> OK, implementing a special "abstract"/"parent" table would make more sense.

I had in mind to tackle this in a bit of a roundabout way. If we mark
the parent table "read-only" then notice that all tuples (all 0 of
them) in the table are frozen then we can discard that table from the
plans. Since changing the "read-only" attribute would have to be
treated as a DDL operation which would invalidate any cached plans we
can trust that it won't change as long as the plan lives so no new
tuples can be inserted.

The reason I wanted to take such a roundabout route instead of having
an "abstract" or "empty" property is that a wanted to generalize this.
Once we know a table is read-only then there are lots of properties we
could find useful in planning aside from emptiness. We could have
statistics like the minimum and maximum value for a column which the
planner would be able to trust and exclude partitions without having
to explicitly declare constraints on every column.

This is all just my musings, not any kind of consensus. Does it make
sense to others or is it too baroque when a simple "abstract" flag
would do?

--
greg
http://mit.edu/~gsstark/resume.pdf

Attachment Content-Type Size
join_inheritance_presentation.pdf application/pdf 598.7 KB
join_inheritance_v1.txt text/plain 101.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Paul Matthews 2009-08-08 00:36:53 Re: Fixing geometic calculation
Previous Message Tom Lane 2009-08-08 00:11:31 Re: [PATCH] 2PC state files on shared memory