Re: Join optimization for inheritance tables

Lists: pgsql-hackers
From: Nedyalko Borisov <nedyalko(at)asterdata(dot)com>
To: <pgsql-hackers(at)postgresql(dot)org>
Cc: Herodotos Herodotou <Herodotos(dot)Herodotou(at)asterdata(dot)com>, Eric Friedman <Eric(dot)Friedman(at)asterdata(dot)com>, John Cieslewicz <John(dot)Cieslewicz(at)asterdata(dot)com>, Dheeraj Pandey <Dheeraj(dot)Pandey(at)asterdata(dot)com>
Subject: Join optimization for inheritance tables
Date: 2009-06-26 17:08:58
Message-ID: 4A4500AA.3050206@asterdata.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi all,

We are working with Aster for the summer and we would like to bounce some ideas that we are having for some possible PostgreSQL extensions. In order to describe our ideas we will use the following example:

create table msg(
msg_id int,
msg text );

create table receiver(
msg_id int,
user_id int,
ts timestamp );

create table msg_100(
check ( 1 <= msg_id and msg_id < 100 )
) inherits (msg);

create table msg_200(
check ( 100 <= msg_id and msg_id < 200 )
) inherits (msg);

create table msg_300(
check ( 200 <= msg_id and msg_id < 300 )
) inherits (msg);

create table receiver_100(
check ( 1 <= msg_id and msg_id < 100 )
) inherits (receiver);

create table receiver_200(
check ( 100 <= msg_id and msg_id < 200 )
) inherits (receiver);

create table receiver_300(
check ( 200 <= msg_id and msg_id < 300 )
) inherits (receiver);

When we are issuing queries on one of the parent tables, like,

SELECT * FROM msg WHERE msg_id BETWEEN 50 AND 70;

PostgreSQL is smart enough to filter out child tables with check constraints that are refuted by the filter conditions. In this example, the optimizer will pick a plan that only considers the parent table 'msg' and one of the child tables 'msg_100':

Result
-> Append
-> Seq Scan on msg
Filter: ((msg_id >= 50) AND (msg_id <= 70))
-> Seq Scan on msg_100 msg
Filter: ((msg_id >= 50) AND (msg_id <= 70))

Plan costs are removed for simplicity of the presentation.

Now, if we issue a join query between the two parent tables, like,

SELECT * FROM msg m JOIN receiver r ON m.msg_id = r.msg_id;

the execution plan will be:

Merge Join
Merge Cond: (m.msg_id = r.msg_id)
-> Sort
Sort Key: m.msg_id
-> Append
-> Seq Scan on msg m
-> Seq Scan on msg_100 m
-> Seq Scan on msg_200 m
-> Seq Scan on msg_300 m
-> Sort
Sort Key: r.msg_id
-> Append
-> Seq Scan on receiver r
-> Seq Scan on receiver_100 r
-> Seq Scan on receiver_200 r
-> Seq Scan on receiver_300 r

During the planning phase, the optimizer treats an entire hierarchy as a single entity. Hence, it first considers the most efficient way to create the append paths for the two hierarchies, and then the best way to join them. However, there are some optimizations that are possible here, similar to the table filtering described above. In particular, instead of joining the two appends, we could "push down" the join to the child relations - that is, create pairwise joins between the children and then append the join results together.

Based on the check conditions of the children and the join predicate, it is possible to filter out joins that cannot produce any results. For example, joining 'msg_100' with 'receiver_300' is redundant since the check constraints of these two tables do not overlap. Tuples in 'msg_100' have 'msg_id' between 1 and 100, whereas tuples in 'receiver_300' have 'msg_id' between 200 and 300. Therefore, no tuples can be produce from this join.

A plan with such optimizations could be:

Result
-> Append
-> Hash Join
Hash Cond: (msg.msg_id = receiver.msg_id)
-> Seq Scan on msg msg
-> Hash
-> Seq Scan on receiver receiver
-> Hash Join
Hash Cond: (msg.msg_id = receiver.msg_id)
-> Seq Scan on msg msg
-> Hash
-> Seq Scan on receiver_100 receiver
-> Hash Join
Hash Cond: (msg.msg_id = receiver.msg_id)
-> Seq Scan on msg msg
-> Hash
-> Seq Scan on receiver_200 receiver
-> Hash Join
Hash Cond: (msg.msg_id = receiver.msg_id)
-> Seq Scan on msg msg
-> Hash
-> Seq Scan on receiver_300 receiver
-> Hash Join
Hash Cond: (msg.msg_id = receiver.msg_id)
-> Seq Scan on msg_100 msg
-> Hash
-> Seq Scan on receiver receiver
-> Hash Join
Hash Cond: (msg.msg_id = receiver.msg_id)
-> Seq Scan on msg_100 msg
-> Hash
-> Seq Scan on receiver_100 receiver
-> Hash Join
Hash Cond: (msg.msg_id = receiver.msg_id)
-> Seq Scan on msg_200 msg
-> Hash
-> Seq Scan on receiver receiver
-> Hash Join
Hash Cond: (msg.msg_id = receiver.msg_id)
-> Seq Scan on msg_200 msg
-> Hash
-> Seq Scan on receiver_200 receiver
-> Hash Join
Hash Cond: (msg.msg_id = receiver.msg_id)
-> Seq Scan on msg_300 msg
-> Hash
-> Seq Scan on receiver receiver
-> Hash Join
Hash Cond: (msg.msg_id = receiver.msg_id)
-> Seq Scan on msg_300 msg
-> Hash
-> Seq Scan on receiver_300 receiver

The parent tables don't have any check constraints in the above example and hence will need to join with all the child relations. However, based on usage and feedback among Aster's customers, we believe that it is common practice not to store any data in the parent tables. Therefore, we were thinking of creating an "Empty Check Constraint". This constraint will enforce that a particular table is empty and will prevent the insertion of any rows into it. If the parent tables have the empty check constraint, then the optimizer will be able to further eliminate some of the child joins.

A plan with all optimizations and features suggested could be:

Result
-> Append
-> Hash Join
Hash Cond: (msg.msg_id = receiver.msg_id)
-> Seq Scan on msg_100 msg
-> Hash
-> Seq Scan on receiver_100 receiver
-> Hash Join
Hash Cond: (msg.msg_id = receiver.msg_id)
-> Seq Scan on msg_200 msg
-> Hash
-> Seq Scan on receiver_200 receiver
-> Hash Join
Hash Cond: (msg.msg_id = receiver.msg_id)
-> Seq Scan on msg_300 msg
-> Hash
-> Seq Scan on receiver_300 receiver

In summary, we are making two suggestions:
1. Extend the optimizer to consider joins between child tables when hierarchies are joined together.
2. Add the "Empty Check Constraint", which would enforce that a particular table is to remain empty.

We've created some basic implementation, just to verify the benefits of these ideas. The initial results seem promising as we observed decreased execution time and for large data set the query execution time could be several times faster than before.
We would greatly appreciate your comments, thoughts or suggestions that you might have regarding these ideas.

Regards,
Nedyalko Borisov and Herodotos Herodotou


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: nborisov(at)asterdata(dot)com
Cc: pgsql-hackers(at)postgresql(dot)org, Herodotos Herodotou <Herodotos(dot)Herodotou(at)asterdata(dot)com>, Eric Friedman <Eric(dot)Friedman(at)asterdata(dot)com>, John Cieslewicz <John(dot)Cieslewicz(at)asterdata(dot)com>, Dheeraj Pandey <Dheeraj(dot)Pandey(at)asterdata(dot)com>
Subject: Re: Join optimization for inheritance tables
Date: 2009-06-26 23:25:13
Message-ID: 11854.1246058713@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Nedyalko Borisov <nedyalko(at)asterdata(dot)com> writes:
> In summary, we are making two suggestions:
> 1. Extend the optimizer to consider joins between child tables when hierarchies are joined together.

We already handle this for the case where the join is nestloop with
inner index scan, and I'm not convinced that there's any real gain to be
had for other join types.

> 2. Add the "Empty Check Constraint", which would enforce that a particular table is to remain empty.

The trouble with that is that a constraint that doesn't propagate to its
child tables is a weird beast that I'd just as soon not invent.

We are currently thinking about inventing an explicit notion of
partitioned tables. If we had that, it would be reasonable to have
a special kind of "parent" table for a partitioned set and refuse to
allow any data in that relation. But I'm not excited about contorting
the general constraint mechanism in the way that would be necessary to
express this as a constraint.

regards, tom lane


From: Nedyalko Borisov <nedyalko(at)asterdata(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, "Herodotos Herodotou" <Herodotos(dot)Herodotou(at)asterdata(dot)com>, Eric Friedman <Eric(dot)Friedman(at)asterdata(dot)com>, John Cieslewicz <John(dot)Cieslewicz(at)asterdata(dot)com>, Dheeraj Pandey <Dheeraj(dot)Pandey(at)asterdata(dot)com>
Subject: Re: Join optimization for inheritance tables
Date: 2009-07-06 21:23:36
Message-ID: 4A526B58.8070607@asterdata.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> Nedyalko Borisov <nedyalko(at)asterdata(dot)com> writes:
>
>> In summary, we are making two suggestions:
>> 1. Extend the optimizer to consider joins between child tables when hierarchies are joined together.
>>
>
> We already handle this for the case where the join is nestloop with
> inner index scan, and I'm not convinced that there's any real gain to be
> had for other join types.
>
>
From OLTP perspective this proposal won't introduce any benefits due to
the fact that queries operate on small parts of the data, so we can add
a flag that will disable/enable the inherited join.
However, the OLAP queries process significant amount of data and to
leverage this fact the DB admins partition the data. We think that the
optimizer should take advantage of this partitioning and consider plans
where the joins are performed on small parts of the data.

For example, typical observed scenario is: optimizer chooses Merge Join
for two partitioned tables like the plan below:
Merge Cond: (table1.id = table2.id)
-> Sort
Sort Key: id
-> Append
-> Seq Scan on table1_part1
-> Seq Scan on table1_part2
-> ....
-> Seq Scan on table1_partN
-> Sort
Sort Key: id
-> Append
-> Seq Scan on table2_part1
-> Seq Scan on table2_part2
-> ....
-> Seq Scan on table2_partM

This plan ignores the fact there are indexes on the table2 partitions
and that the pairwise partitions joins (index nested loop or hash join)
will be faster than scanning all the partitions and sorting them.

To see the effect of the pairwise joins we performed some experiments
with the initial implementation. The experiments consisted of joining
two partitioned tables where each of the tables have around 200 children
and the 2 int columns id and count. We generated data of different sizes
and measured the execution times and here are the results:
0.5 million records -> regular plan 0.69s -> modified plan 0.51
1 million records -> regular plan 2.9s -> modified plan 1
2.5 million records -> regular plan 4.17s -> modified plan 2.28
5 million records -> regular plan 11.25s -> modified plan 4.46

Increasing the data size or adding more columns will increase the
difference between the current plan that the database picks and the
proposed modification of the plans. Thus, we thing that it might be
useful if the optimizer considers plans with inherited joins.

>> 2. Add the "Empty Check Constraint", which would enforce that a particular table is to remain empty.
>>
>
> The trouble with that is that a constraint that doesn't propagate to its
> child tables is a weird beast that I'd just as soon not invent.
>
> We are currently thinking about inventing an explicit notion of
> partitioned tables. If we had that, it would be reasonable to have
> a special kind of "parent" table for a partitioned set and refuse to
> allow any data in that relation. But I'm not excited about contorting
> the general constraint mechanism in the way that would be necessary to
> express this as a constraint.
>
>
OK, implementing a special "abstract"/"parent" table would make more
sense. In this line of thoughts could you elaborate on the explicit
notion of partitioned tables or give us some references.

Thanks,
Nedyalko Borisov and Herodotos Herodotou

> regards, tom lane
>


From: Greg Stark <gsstark(at)mit(dot)edu>
To: nborisov(at)asterdata(dot)com
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Herodotos Herodotou <Herodotos(dot)Herodotou(at)asterdata(dot)com>, Eric Friedman <Eric(dot)Friedman(at)asterdata(dot)com>, John Cieslewicz <John(dot)Cieslewicz(at)asterdata(dot)com>, Dheeraj Pandey <Dheeraj(dot)Pandey(at)asterdata(dot)com>
Subject: Re: Join optimization for inheritance tables
Date: 2009-07-06 22:14:13
Message-ID: 407d949e0907061514i5f1a044r691d1d74eaefb067@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Jul 6, 2009 at 10:23 PM, Nedyalko Borisov<nedyalko(at)asterdata(dot)com> wrote:
> For example, typical observed scenario is: optimizer chooses Merge Join for
> two partitioned tables like the plan below:
> Merge Cond: (table1.id = table2.id)
>   -> Sort
>       Sort Key: id
>       -> Append
>           -> Seq Scan on table1_part1
>           -> Seq Scan on table1_part2
>       -> ....
>           -> Seq Scan on table1_partN
>      ->  Sort
>           Sort Key: id
>           -> Append
>              -> Seq Scan on table2_part1
>              -> Seq Scan on table2_part2
>              -> ....
>              -> Seq Scan on table2_partM
>
> This plan ignores the fact there are indexes on the table2 partitions and
> that the pairwise partitions joins (index nested loop or hash join) will be
> faster than scanning all the partitions and sorting them.

To some degree my merge-append patch would mitigate this case. It
would allow the use of indexes on some or all the partitions to avoid
the sorts.

However it would still force all the partitions to be appended on each
side and then merged. If we could match up all the partitions then I
think this plan would be faster with the Append on top and separate
merge joins for each pair of partitions.

Aside from skipping the cpu cost of the merge-append I think it would
win for a few other reasons as well. Each join would be able to come
up with much better statistics which would enable it to pick a better
join when one is available. Even if the planner still picks a merge
join it would be much more likely to finish early and skip the
remainder of a partition on one side or the other.

> OK, implementing a special "abstract"/"parent" table would make more sense.

I had in mind to tackle this in a bit of a roundabout way. If we mark
the parent table "read-only" then notice that all tuples (all 0 of
them) in the table are frozen then we can discard that table from the
plans. Since changing the "read-only" attribute would have to be
treated as a DDL operation which would invalidate any cached plans we
can trust that it won't change as long as the plan lives so no new
tuples can be inserted.

The reason I wanted to take such a roundabout route instead of having
an "abstract" or "empty" property is that a wanted to generalize this.
Once we know a table is read-only then there are lots of properties we
could find useful in planning aside from emptiness. We could have
statistics like the minimum and maximum value for a column which the
planner would be able to trust and exclude partitions without having
to explicitly declare constraints on every column.

This is all just my musings, not any kind of consensus. Does it make
sense to others or is it too baroque when a simple "abstract" flag
would do?

--
greg
http://mit.edu/~gsstark/resume.pdf


From: Herodotos Herodotou <Herodotos(dot)Herodotou(at)asterdata(dot)com>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Eric Friedman <Eric(dot)Friedman(at)asterdata(dot)com>, John Cieslewicz <John(dot)Cieslewicz(at)asterdata(dot)com>, Dheeraj Pandey <Dheeraj(dot)Pandey(at)asterdata(dot)com>, "hero(at)cs(dot)duke(dot)edu" <hero(at)cs(dot)duke(dot)edu>, "nedyalko(at)cs(dot)duke(dot)edu" <nedyalko(at)cs(dot)duke(dot)edu>
Subject: Re: Join optimization for inheritance tables
Date: 2009-08-08 00:34:06
Message-ID: 43826FCDC252204EA7823B2E7CF3CCEC25FC2F4B@Pandora.AsterData.local
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


This patch extends the query optimizer to consider joins between child tables when hierarchies are joined together.

Short description: when the optimizer considers join paths between two tables with child tables, it only creates join paths over two append paths. In this patch we extend the optimizer to consider joins between the child tables from the two parent relations, based on the child table constraints. The idea is that only child tables with overlapping constraints need to be joined, and not the entire hierarchies with each other. This optimization can provide significant benefits in terms of execution time. A short motivation example as well as the overall algorithm description could be found in the attached presentation.

Herodotos & Nedyalko

________________________________________
From: gsstark(at)gmail(dot)com [gsstark(at)gmail(dot)com] On Behalf Of Greg Stark [gsstark(at)mit(dot)edu]
Sent: Monday, July 06, 2009 3:14 PM
To: Nedyalko Borisov
Cc: Tom Lane; pgsql-hackers(at)postgresql(dot)org; Herodotos Herodotou; Eric Friedman; John Cieslewicz; Dheeraj Pandey
Subject: Re: Join optimization for inheritance tables

On Mon, Jul 6, 2009 at 10:23 PM, Nedyalko Borisov<nedyalko(at)asterdata(dot)com> wrote:
> For example, typical observed scenario is: optimizer chooses Merge Join for
> two partitioned tables like the plan below:
> Merge Cond: (table1.id = table2.id)
> -> Sort
> Sort Key: id
> -> Append
> -> Seq Scan on table1_part1
> -> Seq Scan on table1_part2
> -> ....
> -> Seq Scan on table1_partN
> -> Sort
> Sort Key: id
> -> Append
> -> Seq Scan on table2_part1
> -> Seq Scan on table2_part2
> -> ....
> -> Seq Scan on table2_partM
>
> This plan ignores the fact there are indexes on the table2 partitions and
> that the pairwise partitions joins (index nested loop or hash join) will be
> faster than scanning all the partitions and sorting them.

To some degree my merge-append patch would mitigate this case. It
would allow the use of indexes on some or all the partitions to avoid
the sorts.

However it would still force all the partitions to be appended on each
side and then merged. If we could match up all the partitions then I
think this plan would be faster with the Append on top and separate
merge joins for each pair of partitions.

Aside from skipping the cpu cost of the merge-append I think it would
win for a few other reasons as well. Each join would be able to come
up with much better statistics which would enable it to pick a better
join when one is available. Even if the planner still picks a merge
join it would be much more likely to finish early and skip the
remainder of a partition on one side or the other.

> OK, implementing a special "abstract"/"parent" table would make more sense.

I had in mind to tackle this in a bit of a roundabout way. If we mark
the parent table "read-only" then notice that all tuples (all 0 of
them) in the table are frozen then we can discard that table from the
plans. Since changing the "read-only" attribute would have to be
treated as a DDL operation which would invalidate any cached plans we
can trust that it won't change as long as the plan lives so no new
tuples can be inserted.

The reason I wanted to take such a roundabout route instead of having
an "abstract" or "empty" property is that a wanted to generalize this.
Once we know a table is read-only then there are lots of properties we
could find useful in planning aside from emptiness. We could have
statistics like the minimum and maximum value for a column which the
planner would be able to trust and exclude partitions without having
to explicitly declare constraints on every column.

This is all just my musings, not any kind of consensus. Does it make
sense to others or is it too baroque when a simple "abstract" flag
would do?

--
greg
http://mit.edu/~gsstark/resume.pdf

Attachment Content-Type Size
join_inheritance_presentation.pdf application/pdf 598.7 KB
join_inheritance_v1.txt text/plain 101.3 KB

From: Euler Taveira de Oliveira <euler(at)timbira(dot)com>
To: Herodotos Herodotou <Herodotos(dot)Herodotou(at)asterdata(dot)com>
Cc: Greg Stark <gsstark(at)mit(dot)edu>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Eric Friedman <Eric(dot)Friedman(at)asterdata(dot)com>, John Cieslewicz <John(dot)Cieslewicz(at)asterdata(dot)com>, Dheeraj Pandey <Dheeraj(dot)Pandey(at)asterdata(dot)com>, "hero(at)cs(dot)duke(dot)edu" <hero(at)cs(dot)duke(dot)edu>, "nedyalko(at)cs(dot)duke(dot)edu" <nedyalko(at)cs(dot)duke(dot)edu>
Subject: Re: Join optimization for inheritance tables
Date: 2009-08-10 23:41:38
Message-ID: 4A80B032.6030604@timbira.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Herodotos Herodotou escreveu:
> This patch extends the query optimizer to consider joins between child tables when hierarchies are joined together.
>
> Short description: when the optimizer considers join paths between two tables with child tables, it only creates join paths over two append paths. In this patch we extend the optimizer to consider joins between the child tables from the two parent relations, based on the child table constraints. The idea is that only child tables with overlapping constraints need to be joined, and not the entire hierarchies with each other. This optimization can provide significant benefits in terms of execution time. A short motivation example as well as the overall algorithm description could be found in the attached presentation.
>
I added it to the next commitfest [1].

[1] https://commitfest.postgresql.org/action/commitfest_view?id=3

--
Euler Taveira de Oliveira
http://www.timbira.com/


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Herodotos Herodotou <Herodotos(dot)Herodotou(at)asterdata(dot)com>
Cc: Greg Stark <gsstark(at)mit(dot)edu>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Eric Friedman <Eric(dot)Friedman(at)asterdata(dot)com>, John Cieslewicz <John(dot)Cieslewicz(at)asterdata(dot)com>, Dheeraj Pandey <Dheeraj(dot)Pandey(at)asterdata(dot)com>, "hero(at)cs(dot)duke(dot)edu" <hero(at)cs(dot)duke(dot)edu>, "nedyalko(at)cs(dot)duke(dot)edu" <nedyalko(at)cs(dot)duke(dot)edu>
Subject: Re: Join optimization for inheritance tables
Date: 2009-09-18 19:03:12
Message-ID: 4690.1253300592@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Herodotos Herodotou <Herodotos(dot)Herodotou(at)asterdata(dot)com> writes:
> This patch extends the query optimizer to consider joins between child tables when hierarchies are joined together.

I looked over this patch a bit. I am of the opinion that this is a lot
of work towards a dead-end direction :-(. The patch contains a large
amount of rather inefficient code that attempts to reverse-engineer
knowledge about whether an inheritance tree forms a range-partitioned
table. We already know that reasoning from first principles using a set
of arbitrary per-table constraints doesn't scale very well, and this
would make it a lot worse. pgsql-hackers have been talking for some
time about implementing an explicit partitioning feature, which would
give the planner direct knowledge about this type of table structure
without expending nearly so much work (and then expending it all over
again for the next query). I think the right way to go at it is to
implement that first, and then see about teaching the planner to plan
like this.

We could possibly extract a subset of the patch that just deals with
pushing joins down through appends, without the constraint-exclusion
logic that tries to eliminate provably empty join pairs. But I'm
dubious that that would be worthwhile by itself. I think you need the
constraint exclusion to eliminate a good deal of work before this is
actually better than doing a single join above the append.

There are a number of other things I don't like, such as the restrictive
and not-obviously-necessary requirement that all the join clauses be
simple "Var op Var" clauses. But the main thing is that a large
fraction of the patch is doing something I don't think we should try
to do.

regards, tom lane


From: Emmanuel Cecchet <manu(at)asterdata(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Herodotos Herodotou <Herodotos(dot)Herodotou(at)asterdata(dot)com>, Greg Stark <gsstark(at)mit(dot)edu>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Eric Friedman <Eric(dot)Friedman(at)asterdata(dot)com>, John Cieslewicz <John(dot)Cieslewicz(at)asterdata(dot)com>, Dheeraj Pandey <Dheeraj(dot)Pandey(at)asterdata(dot)com>, "hero(at)cs(dot)duke(dot)edu" <hero(at)cs(dot)duke(dot)edu>, "nedyalko(at)cs(dot)duke(dot)edu" <nedyalko(at)cs(dot)duke(dot)edu>
Subject: Re: Join optimization for inheritance tables
Date: 2009-09-22 22:16:33
Message-ID: 4AB94CC1.5040700@asterdata.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom,

If the partitioning implementation does not make progress (and does not
make it for 8.5), don't you think that this could be an interesting
contribution to consider for this release?
I have put on the wiki
(http://wiki.postgresql.org/wiki/Join_optimization_for_inheritance_tables)
the results obtained so far and the use case where it is most used.

As I understand it, partitioning will certainly lead to some significant
changes/enhancements to the planner. Do you think it is realistic to get
that for 8.5?

manu

> Herodotos Herodotou <Herodotos(dot)Herodotou(at)asterdata(dot)com> writes:
>
>> This patch extends the query optimizer to consider joins between child tables when hierarchies are joined together.
>>
>
> I looked over this patch a bit. I am of the opinion that this is a lot
> of work towards a dead-end direction :-(. The patch contains a large
> amount of rather inefficient code that attempts to reverse-engineer
> knowledge about whether an inheritance tree forms a range-partitioned
> table. We already know that reasoning from first principles using a set
> of arbitrary per-table constraints doesn't scale very well, and this
> would make it a lot worse. pgsql-hackers have been talking for some
> time about implementing an explicit partitioning feature, which would
> give the planner direct knowledge about this type of table structure
> without expending nearly so much work (and then expending it all over
> again for the next query). I think the right way to go at it is to
> implement that first, and then see about teaching the planner to plan
> like this.
>
> We could possibly extract a subset of the patch that just deals with
> pushing joins down through appends, without the constraint-exclusion
> logic that tries to eliminate provably empty join pairs. But I'm
> dubious that that would be worthwhile by itself. I think you need the
> constraint exclusion to eliminate a good deal of work before this is
> actually better than doing a single join above the append.
>
> There are a number of other things I don't like, such as the restrictive
> and not-obviously-necessary requirement that all the join clauses be
> simple "Var op Var" clauses. But the main thing is that a large
> fraction of the patch is doing something I don't think we should try
> to do.
>
> regards, tom lane
>
>

--
Emmanuel Cecchet
Aster Data Systems
Web: http://www.asterdata.com


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Emmanuel Cecchet <manu(at)asterdata(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Herodotos Herodotou <Herodotos(dot)Herodotou(at)asterdata(dot)com>, Greg Stark <gsstark(at)mit(dot)edu>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Eric Friedman <Eric(dot)Friedman(at)asterdata(dot)com>, John Cieslewicz <John(dot)Cieslewicz(at)asterdata(dot)com>, Dheeraj Pandey <Dheeraj(dot)Pandey(at)asterdata(dot)com>, "hero(at)cs(dot)duke(dot)edu" <hero(at)cs(dot)duke(dot)edu>, "nedyalko(at)cs(dot)duke(dot)edu" <nedyalko(at)cs(dot)duke(dot)edu>
Subject: Re: Join optimization for inheritance tables
Date: 2009-09-23 00:06:58
Message-ID: 1253664418.28462.123.camel@monkey-cat.sm.truviso.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, 2009-09-22 at 18:16 -0400, Emmanuel Cecchet wrote:
> If the partitioning implementation does not make progress (and does not
> make it for 8.5), don't you think that this could be an interesting
> contribution to consider for this release?
> I have put on the wiki
> (http://wiki.postgresql.org/wiki/Join_optimization_for_inheritance_tables)
> the results obtained so far and the use case where it is most used.

I think you mean that the planning time is in milliseconds, not seconds.
Also, you use "data" in a few places you meant "date".

The results seem good, and trading planning time for execution time
seems like a reasonable idea in the case of partitioned tables. We
already work harder planning when constraint_exclusion='partition', so
there is some precedent (I don't know if that's a good precedent to
follow or not).

How does it compare to using merge-append?

I haven't looked at the actual patch yet.

Regards,
Jeff Davis


From: Josh Berkus <josh(at)agliodbs(dot)com>
To:
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Join optimization for inheritance tables
Date: 2009-09-23 00:35:36
Message-ID: 4AB96D58.1080808@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


> As I understand it, partitioning will certainly lead to some significant
> changes/enhancements to the planner. Do you think it is realistic to get
> that for 8.5?

I don't think that waiting for our plans for a more robust partitioning
implementation is a reason to put off improvements to the implementation
we have. If Simon or Pavel or someone was going all-out on getting the
new partitioning ready, that would be one thing. But to date, nobody
has volunteered to work on it; we just know we need it.

Of course, that completely leaves aside Tom's critique of the
implementation, which sounds like it needs some work. Trying to fit the
target table into a range partitioning mold would break with a lot of
real partitionings; for example I have several client DBs which are
partitioned active/inactive-by-date.

What about simply eliminating joins between partitioned tables by
checking which columns' constraints match exactly or are subsets? A lot
of partitioned DBs partition everything by month, and joining two tables
which were partitioned by month would be useful by itself.

--
Josh Berkus
PostgreSQL Experts Inc.
www.pgexperts.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Join optimization for inheritance tables
Date: 2009-09-23 02:52:37
Message-ID: 11534.1253674357@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Josh Berkus <josh(at)agliodbs(dot)com> writes:
> I don't think that waiting for our plans for a more robust partitioning
> implementation is a reason to put off improvements to the implementation
> we have.

The complaint I had was that this patch consisted largely of code that
we'd want to throw away once a better partitioning implementation is in
place. I don't object to partial patches that move us forward on the
agreed-on path, but this one seemed to be going sideways.

> What about simply eliminating joins between partitioned tables by
> checking which columns' constraints match exactly or are subsets?

If it could be done with a <emphasis>small</> amount of throwaway
code, maybe ...

regards, tom lane


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Join optimization for inheritance tables
Date: 2009-09-24 19:23:05
Message-ID: 4ABBC719.5020908@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


> If it could be done with a <emphasis>small</> amount of throwaway
> code, maybe ...

Well, that's a reasonable goal ...

--
Josh Berkus
PostgreSQL Experts Inc.
www.pgexperts.com


From: Herodotos Herodotou <hero(at)cs(dot)duke(dot)edu>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Emmanuel Cecchet <manu(at)asterdata(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Stark <gsstark(at)mit(dot)edu>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Eric Friedman <Eric(dot)Friedman(at)asterdata(dot)com>, John Cieslewicz <John(dot)Cieslewicz(at)asterdata(dot)com>, Dheeraj Pandey <Dheeraj(dot)Pandey(at)asterdata(dot)com>, "nedyalko(at)cs(dot)duke(dot)edu" <nedyalko(at)cs(dot)duke(dot)edu>
Subject: Re: Join optimization for inheritance tables
Date: 2009-09-24 23:49:35
Message-ID: 48f0b7a60909241649y4922519ajd45b0b3467a3c7c9@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi Jeff,

On Tue, Sep 22, 2009 at 8:06 PM, Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
> I think you mean that the planning time is in milliseconds, not seconds.
>

The planning time is actually in seconds. Even without our feature,
planning takes a few seconds since the optimizer deals with hundreds
or even thousands of child tables. With our feature, planning time
increases by 2-3X but then again, running time improves by 6-10X. I
have added a paragraph under performance evaluation in the wiki page (
http://wiki.postgresql.org/wiki/Join_optimization_for_inheritance_tables
) in order to provide a better insight on what's happening under the
covers.

> The results seem good, and trading planning time for execution time
> seems like a reasonable idea in the case of partitioned tables. We
> already work harder planning when constraint_exclusion='partition', so
> there is some precedent (I don't know if that's a good precedent to
> follow or not).
>

With constraint_exclusion=partition, single child tables that cannot
produce tuples (based on their check constraints and filter
conditions) are filtered out. Our patch goes one step further by
identifying which joins of child tables can/cannot produce tuples
(based on their check constraints and join conditions).

> How does it compare to using merge-append?
>

Merge-append is essentially a special case of our patch. Greg Stark
compared it with our patch and made some good commends in a previous
email thread ( message id =
407d949e0907061514i5f1a044r691d1d74eaefb067(at)mail(dot)gmail(dot)com )

Thank you,

~Hero

--
Herodotos Herodotou
Graduate Student
Department of Computer Science, Duke University
Email: hero(at)cs(dot)duke(dot)edu


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Emmanuel Cecchet <manu(at)asterdata(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Herodotos Herodotou <Herodotos(dot)Herodotou(at)asterdata(dot)com>, Greg Stark <gsstark(at)mit(dot)edu>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Eric Friedman <Eric(dot)Friedman(at)asterdata(dot)com>, John Cieslewicz <John(dot)Cieslewicz(at)asterdata(dot)com>, Dheeraj Pandey <Dheeraj(dot)Pandey(at)asterdata(dot)com>, "hero(at)cs(dot)duke(dot)edu" <hero(at)cs(dot)duke(dot)edu>, "nedyalko(at)cs(dot)duke(dot)edu" <nedyalko(at)cs(dot)duke(dot)edu>
Subject: Re: Join optimization for inheritance tables
Date: 2009-09-26 16:19:13
Message-ID: 1253981953.4449.918.camel@ebony.2ndQuadrant
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On Tue, 2009-09-22 at 18:16 -0400, Emmanuel Cecchet wrote:

> If the partitioning implementation does not make progress (and does not
> make it for 8.5)

Manu, not sure if you are referring to Kedar's patch I reviewed earlier
in July, but that patch didn't implement anything like the right
internals for this to work.

As Tom says, what we need is a single internal data structure that
describes all the partitions. That can then be used to automatically
route inserts, perform the push down merge joins and speed-up constraint
exclusion. We should build this once and store it in the relcache for
the parent relation and so would need some changes in relcache
invalidation as well to make sure child ddl changes invalidate that.

--
Simon Riggs www.2ndQuadrant.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Herodotos Herodotou <hero(at)cs(dot)duke(dot)edu>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, Emmanuel Cecchet <manu(at)asterdata(dot)com>, Greg Stark <gsstark(at)mit(dot)edu>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Eric Friedman <Eric(dot)Friedman(at)asterdata(dot)com>, John Cieslewicz <John(dot)Cieslewicz(at)asterdata(dot)com>, Dheeraj Pandey <Dheeraj(dot)Pandey(at)asterdata(dot)com>, "nedyalko(at)cs(dot)duke(dot)edu" <nedyalko(at)cs(dot)duke(dot)edu>
Subject: Re: Join optimization for inheritance tables
Date: 2009-09-26 16:28:15
Message-ID: 22334.1253982495@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Herodotos Herodotou <hero(at)cs(dot)duke(dot)edu> writes:
> On Tue, Sep 22, 2009 at 8:06 PM, Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
>> I think you mean that the planning time is in milliseconds, not seconds.

> The planning time is actually in seconds.

This is exactly why I think this is a dead-end approach. Trying to do
theorem proving from an unstructured collection of constraints simply
cannot scale to hundreds of partitions, at least not if you want
reasonable planning performance. There are other aspects of our current
partitioning approach that don't scale either, eg the complexity of the
insert redirection triggers. We could handle standard cases where
there's a simple partitioning rule with far less overhead than this,
if we had an explicit model of the partitioning rule inside the system.

regards, tom lane


From: Emmanuel Cecchet <manu(at)asterdata(dot)com>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: Emmanuel Cecchet <Emmanuel(dot)Cecchet(at)asterdata(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Herodotos Herodotou <Herodotos(dot)Herodotou(at)asterdata(dot)com>, Greg Stark <gsstark(at)mit(dot)edu>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Eric Friedman <Eric(dot)Friedman(at)asterdata(dot)com>, John Cieslewicz <John(dot)Cieslewicz(at)asterdata(dot)com>, Dheeraj Pandey <Dheeraj(dot)Pandey(at)asterdata(dot)com>, "hero(at)cs(dot)duke(dot)edu" <hero(at)cs(dot)duke(dot)edu>, "nedyalko(at)cs(dot)duke(dot)edu" <nedyalko(at)cs(dot)duke(dot)edu>
Subject: Re: Join optimization for inheritance tables
Date: 2009-09-26 22:00:11
Message-ID: 4ABE8EEB.8050002@asterdata.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi Simon,

Thanks for the insight. I might take that as a long term project. I have
to discuss that with my colleagues at Aster. It would certainly help to
put together a wiki page with the key insights on the design of such
implementation to be able to better scope the project and agree on what
is the right approach for this.

manu

> On Tue, 2009-09-22 at 18:16 -0400, Emmanuel Cecchet wrote:
>
>
>> If the partitioning implementation does not make progress (and does not
>> make it for 8.5)
>>
>
> Manu, not sure if you are referring to Kedar's patch I reviewed earlier
> in July, but that patch didn't implement anything like the right
> internals for this to work.
>
> As Tom says, what we need is a single internal data structure that
> describes all the partitions. That can then be used to automatically
> route inserts, perform the push down merge joins and speed-up constraint
> exclusion. We should build this once and store it in the relcache for
> the parent relation and so would need some changes in relcache
> invalidation as well to make sure child ddl changes invalidate that.
>
>

--
Emmanuel Cecchet
Aster Data Systems
Web: http://www.asterdata.com


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Join optimization for inheritance tables
Date: 2009-09-27 19:30:07
Message-ID: 4ABFBD3F.3010204@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 9/26/09 3:00 PM, Emmanuel Cecchet wrote:
> Hi Simon,
>
> Thanks for the insight. I might take that as a long term project. I have
> to discuss that with my colleagues at Aster. It would certainly help to
> put together a wiki page with the key insights on the design of such
> implementation to be able to better scope the project and agree on what
> is the right approach for this.

In the meantime, can you look at the possibility I suggested? If you
could produce a patch which would implement the most common and least
complex case of partition joining: where the constraints on two
partitions match exactly (or are mathematically provable subsets) with a
small patch, it would probably work for 8.5.

--
Josh Berkus
PostgreSQL Experts Inc.
www.pgexperts.com