Re: Optimizer oddness, possibly compounded in 8.1

Lists: pgsql-hackers
From: Philip Warner <pjw(at)rhyme(dot)com(dot)au>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: Optimizer oddness, possibly compounded in 8.1
Date: 2005-12-02 14:55:09
Message-ID: 4390604D.4080902@rhyme.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


The optimizer seems to want to use sequential scans on inherited tables
when crossed with another table, as the following seems to demonstrate:

Create Table base(f1 bigserial);
create table inh1(f2 bigint) inherits (base);
create table inh2(f2 bigint) inherits (base);
create table inh3(f2 bigint) inherits (base);
create table inh4(f2 bigint) inherits (base);

insert into inh1(f2) values(1);
insert into inh1(f2) select f2 from inh1;
insert into inh1(f2) select f2 from inh1;
insert into inh1(f2) select f2 from inh1;
insert into inh1(f2) select f2 from inh1;
insert into inh1(f2) select f2 from inh1;
insert into inh1(f2) select f2 from inh1;
insert into inh1(f2) select f2 from inh1;
insert into inh1(f2) select f2 from inh1;
insert into inh1(f2) select f2 from inh1;
insert into inh1(f2) select f2 from inh1;
insert into inh1(f2) select f2 from inh1;
insert into inh1(f2) select f2 from inh1;
insert into inh1(f2) select f2 from inh1;
insert into inh1(f2) select f2 from inh1;
insert into inh1(f2) select f2 from inh1;
insert into inh1(f2) select f2 from inh1;

create unique index base_f1 on base(f1);
create unique index inh1_f1 on inh1(f1);
create unique index inh2_f1 on inh2(f1);
create unique index inh3_f1 on inh3(f1);
create unique index inh4_f1 on inh4(f1);

vacuum analyze base;
vacuum analyze inh1;
vacuum analyze inh2;
vacuum analyze inh3;
vacuum analyze inh4;

create table t2(f1 bigint);
insert into t2 values(1);
insert into t2 values(2);
insert into t2 values(128);
insert into t2 values(32768);

explain analyze select * from t2,base where base.f1=t2.f1;

gives:

Hash Join (cost=1.05..1546.04 rows=150 width=16) (actual
time=0.433..436.791 rows=4 loops=1)
Hash Cond: ("outer".f1 = "inner".f1)
-> Append (cost=0.00..1181.66 rows=72366 width=8) (actual
time=0.279..331.698 rows=65536 loops=1)
-> Seq Scan on base (cost=0.00..29.40 rows=1940 width=8)
(actual time=0.002..0.002 rows=0 loops=1)
-> Seq Scan on inh1 base (cost=0.00..1073.36 rows=65536
width=8) (actual time=0.273..148.326 rows=65536 loops=1)
-> Seq Scan on inh2 base (cost=0.00..26.30 rows=1630 width=8)
(actual time=0.002..0.002 rows=0 loops=1)
-> Seq Scan on inh3 base (cost=0.00..26.30 rows=1630 width=8)
(actual time=0.003..0.003 rows=0 loops=1)
-> Seq Scan on inh4 base (cost=0.00..26.30 rows=1630 width=8)
(actual time=0.002..0.002 rows=0 loops=1)
-> Hash (cost=1.04..1.04 rows=4 width=8) (actual time=0.132..0.132
rows=0 loops=1)
-> Seq Scan on t2 (cost=0.00..1.04 rows=4 width=8) (actual
time=0.111..0.119 rows=4 loops=1)
Total runtime: 436.880 ms

unwrapping the query into a series of UNIONS on the child tables reduces
the run time by a factor of several hundred under PG8.0:

explain analyze
select z.f1 from t2,only base z where z.f1=t2.f1
UNION ALL
select z.f1 from t2,inh1 z where z.f1=t2.f1
UNION ALL
select z.f1 from t2,inh2 z where z.f1=t2.f1
UNION ALL
select z.f1 from t2,inh3 z where z.f1=t2.f1
UNION ALL
select z.f1 from t2,inh4 z where z.f1=t2.f1

Append (cost=0.00..94.87 rows=20 width=8) (actual time=0.184..0.485
rows=4 loops=1)
-> Subquery Scan "*SELECT* 1" (cost=0.00..20.42 rows=4 width=8)
(actual time=0.096..0.096 rows=0 loops=1)
-> Nested Loop (cost=0.00..20.38 rows=4 width=8) (actual
time=0.093..0.093 rows=0 loops=1)
-> Seq Scan on t2 (cost=0.00..1.04 rows=4 width=8)
(actual time=0.033..0.043 rows=4 loops=1)
-> Index Scan using base_f1 on base z (cost=0.00..4.82
rows=1 width=8) (actual time=0.008..0.008 rows=0 loops=4)
Index Cond: (z.f1 = "outer".f1)
-> Subquery Scan "*SELECT* 2" (cost=0.00..13.18 rows=4 width=8)
(actual time=0.084..0.194 rows=4 loops=1)
-> Nested Loop (cost=0.00..13.14 rows=4 width=8) (actual
time=0.081..0.178 rows=4 loops=1)
-> Seq Scan on t2 (cost=0.00..1.04 rows=4 width=8)
(actual time=0.002..0.012 rows=4 loops=1)
-> Index Scan using inh1_f1 on inh1 z (cost=0.00..3.01
rows=1 width=8) (actual time=0.031..0.033 rows=1 loops=4)
Index Cond: (z.f1 = "outer".f1)
-> Subquery Scan "*SELECT* 3" (cost=0.00..20.42 rows=4 width=8)
(actual time=0.061..0.061 rows=0 loops=1)
-> Nested Loop (cost=0.00..20.38 rows=4 width=8) (actual
time=0.057..0.057 rows=0 loops=1)
-> Seq Scan on t2 (cost=0.00..1.04 rows=4 width=8)
(actual time=0.003..0.011 rows=4 loops=1)
-> Index Scan using inh2_f1 on inh2 z (cost=0.00..4.82
rows=1 width=8) (actual time=0.007..0.007 rows=0 loops=4)
Index Cond: (z.f1 = "outer".f1)
-> Subquery Scan "*SELECT* 4" (cost=0.00..20.42 rows=4 width=8)
(actual time=0.058..0.058 rows=0 loops=1)
-> Nested Loop (cost=0.00..20.38 rows=4 width=8) (actual
time=0.055..0.055 rows=0 loops=1)
-> Seq Scan on t2 (cost=0.00..1.04 rows=4 width=8)
(actual time=0.002..0.011 rows=4 loops=1)
-> Index Scan using inh3_f1 on inh3 z (cost=0.00..4.82
rows=1 width=8) (actual time=0.007..0.007 rows=0 loops=4)
Index Cond: (z.f1 = "outer".f1)
-> Subquery Scan "*SELECT* 5" (cost=0.00..20.42 rows=4 width=8)
(actual time=0.058..0.058 rows=0 loops=1)
-> Nested Loop (cost=0.00..20.38 rows=4 width=8) (actual
time=0.054..0.054 rows=0 loops=1)
-> Seq Scan on t2 (cost=0.00..1.04 rows=4 width=8)
(actual time=0.003..0.012 rows=4 loops=1)
-> Index Scan using inh4_f1 on inh4 z (cost=0.00..4.82
rows=1 width=8) (actual time=0.007..0.007 rows=0 loops=4)
Index Cond: (z.f1 = "outer".f1)
Total runtime: 0.643 ms

Under 8.1, this unwrapping produces only a 4 times performance
improvement:, and a different plan:

Append (cost=34.25..2124.98 rows=9703 width=8) (actual
time=0.386..249.311 rows=4 loops=1)
-> Subquery Scan "*SELECT* 1" (cost=34.25..117.00 rows=1940
width=8) (actual time=0.035..0.035 rows=0 loops=1)
-> Hash Join (cost=34.25..97.60 rows=1940 width=8) (actual
time=0.027..0.027 rows=0 loops=1)
Hash Cond: ("outer".f1 = "inner".f1)
-> Seq Scan on t2 (cost=0.00..29.40 rows=1940 width=8)
(never executed)
-> Hash (cost=29.40..29.40 rows=1940 width=8) (actual
time=0.011..0.011 rows=0 loops=1)
-> Seq Scan on base z (cost=0.00..29.40 rows=1940
width=8) (actual time=0.005..0.005 rows=0 loops=1)
-> Subquery Scan "*SELECT* 2" (cost=135.34..1668.58 rows=1940
width=8) (actual time=0.303..249.097 rows=4 loops=1)
-> Merge Join (cost=135.34..1649.18 rows=1940 width=8)
(actual time=0.296..249.063 rows=4 loops=1)
Merge Cond: ("outer".f1 = "inner".f1)
-> Index Scan using inh1_f1 on inh1 z
(cost=0.00..1320.90 rows=65536 width=8) (actual time=0.179..133.717
rows=32769 loops=1)
-> Sort (cost=135.34..140.19 rows=1940 width=8) (actual
time=0.099..0.111 rows=4 loops=1)
Sort Key: t2.f1
-> Seq Scan on t2 (cost=0.00..29.40 rows=1940
width=8) (actual time=0.007..0.025 rows=4 loops=1)
-> Subquery Scan "*SELECT* 3" (cost=30.38..113.14 rows=1941
width=8) (actual time=0.042..0.042 rows=0 loops=1)
-> Hash Join (cost=30.38..93.73 rows=1941 width=8) (actual
time=0.035..0.035 rows=0 loops=1)
Hash Cond: ("outer".f1 = "inner".f1)
-> Seq Scan on t2 (cost=0.00..29.40 rows=1940 width=8)
(never executed)
-> Hash (cost=26.30..26.30 rows=1630 width=8) (actual
time=0.012..0.012 rows=0 loops=1)
-> Seq Scan on inh2 z (cost=0.00..26.30 rows=1630
width=8) (actual time=0.005..0.005 rows=0 loops=1)
-> Subquery Scan "*SELECT* 4" (cost=30.38..113.14 rows=1941
width=8) (actual time=0.028..0.028 rows=0 loops=1)
-> Hash Join (cost=30.38..93.73 rows=1941 width=8) (actual
time=0.021..0.021 rows=0 loops=1)
Hash Cond: ("outer".f1 = "inner".f1)
-> Seq Scan on t2 (cost=0.00..29.40 rows=1940 width=8)
(never executed)
-> Hash (cost=26.30..26.30 rows=1630 width=8) (actual
time=0.010..0.010 rows=0 loops=1)
-> Seq Scan on inh3 z (cost=0.00..26.30 rows=1630
width=8) (actual time=0.004..0.004 rows=0 loops=1)
-> Subquery Scan "*SELECT* 5" (cost=30.38..113.14 rows=1941
width=8) (actual time=0.026..0.026 rows=0 loops=1)
-> Hash Join (cost=30.38..93.73 rows=1941 width=8) (actual
time=0.019..0.019 rows=0 loops=1)
Hash Cond: ("outer".f1 = "inner".f1)
-> Seq Scan on t2 (cost=0.00..29.40 rows=1940 width=8)
(never executed)
-> Hash (cost=26.30..26.30 rows=1630 width=8) (actual
time=0.010..0.010 rows=0 loops=1)
-> Seq Scan on inh4 z (cost=0.00..26.30 rows=1630
width=8) (actual time=0.004..0.004 rows=0 loops=1)
Total runtime: 249.522 ms

(note 8.1 was installed on a different machine with more load, so the
original query took > 1 sec).

Ideally, both versions should know enough to use indexes in inherited
tables. Is there anything that I am missing here, or that I can do to
reduce the increase under 8.1? Ideally, the planner/optimizer should be
unwrapping the inherited tables, I think.

To add insult to injury, ENABLE_SEQSCAN has no effect.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Philip Warner <pjw(at)rhyme(dot)com(dot)au>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Optimizer oddness, possibly compounded in 8.1
Date: 2005-12-02 15:42:48
Message-ID: 2954.1133538168@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Philip Warner <pjw(at)rhyme(dot)com(dot)au> writes:
> The optimizer seems to want to use sequential scans on inherited tables
> when crossed with another table, as the following seems to demonstrate:

Is it intentional that your test case omits an analyze on t2? Coz when
I add that, I get the same plan you show for 8.0. Without the knowledge
that t2 is small, that plan is not a good choice.

(The larger point that joins of inheritance unions aren't well-planned
is true, but it's always been true...)

regards, tom lane


From: Philip Warner <pjw(at)rhyme(dot)com(dot)au>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Optimizer oddness, possibly compounded in 8.1
Date: 2005-12-03 00:35:04
Message-ID: 4390E838.8080207@rhyme.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

>Is it intentional that your test case omits an analyze on t2?
>

No; my mistake.

>(The larger point that joins of inheritance unions aren't well-planned
>is true, but it's always been true...)

It also seems to have a probkem with unions in views.

Is there anything that can be done about this -- workarounds etc? Any
plans to address it? We've got a couple of places where it's beginning
to bite us due to growth of tables.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Philip Warner <pjw(at)rhyme(dot)com(dot)au>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Optimizer oddness, possibly compounded in 8.1
Date: 2005-12-03 00:43:24
Message-ID: 10201.1133570604@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Philip Warner <pjw(at)rhyme(dot)com(dot)au> writes:
>> (The larger point that joins of inheritance unions aren't well-planned
>> is true, but it's always been true...)

> It also seems to have a probkem with unions in views.

> Is there anything that can be done about this -- workarounds etc? Any
> plans to address it? We've got a couple of places where it's beginning
> to bite us due to growth of tables.

It's something that's on the ever-growing TODO list ... I dunno if
anyone has any near-term plans to work on it. It'd definitely be
nice to teach the planner to do joins-over-unions well, and then
make inheritance just invoke that behavior instead of being a crocky
special case.

regards, tom lane


From: Philip Warner <pjw(at)rhyme(dot)com(dot)au>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Optimizer oddness, possibly compounded in 8.1
Date: 2005-12-03 06:10:22
Message-ID: 439136CE.70509@rhyme.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:

>It's something that's on the ever-growing TODO list ... I dunno if
>anyone has any near-term plans to work on it. It'd definitely be
>nice to teach the planner to do joins-over-unions well, and then
>make inheritance just invoke that behavior instead of being a crocky
>special case.
>
>
Sounds good; currently if you use the polymorphism of inherited tables,
and happen to cross 2 such tables, you get O(n^2) performance.


From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Philip Warner <pjw(at)rhyme(dot)com(dot)au>, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Optimizer oddness, possibly compounded in 8.1
Date: 2005-12-03 09:21:37
Message-ID: 1133601697.2906.721.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, 2005-12-02 at 19:43 -0500, Tom Lane wrote:
> Philip Warner <pjw(at)rhyme(dot)com(dot)au> writes:
> >> (The larger point that joins of inheritance unions aren't well-planned
> >> is true, but it's always been true...)
>
> > It also seems to have a probkem with unions in views.
>
> > Is there anything that can be done about this -- workarounds etc? Any
> > plans to address it? We've got a couple of places where it's beginning
> > to bite us due to growth of tables.
>
> It's something that's on the ever-growing TODO list ... I dunno if
> anyone has any near-term plans to work on it. It'd definitely be
> nice to teach the planner to do joins-over-unions well, and then
> make inheritance just invoke that behavior instead of being a crocky
> special case.

There's a number of things that can be pushed down over a union set, in
certain circumstances. sorts, GROUPs, min/max/limit, joins etc.. so I've
been mulling over a generic push-down mechanism to avoid lots of special
cases emerging. Joins to a union set are also an interesting case.

First off, I think we need to do some more work on partitioning so that
some knowledge about the union set is understood by the optimizer. At
the moment there is no concept of partition key, so its hard to spot
when two union sets have the same key to allow pushdown.

I hope to work on some of that in the 8.2 timebox, but I suspect this is
a multi-year mission.

Best Regards, Simon Riggs


From: Philip Warner <pjw(at)rhyme(dot)com(dot)au>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Optimizer oddness, possibly compounded in 8.1
Date: 2005-12-03 11:06:40
Message-ID: 43917C40.7080601@rhyme.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


>There's a number of things that can be pushed down over a union set, in
>certain circumstances.
>
FWIW, you should also be able to push the unions up.


From: Hannu Krosing <hannu(at)skype(dot)net>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Philip Warner <pjw(at)rhyme(dot)com(dot)au>, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Optimizer oddness, possibly compounded in 8.1
Date: 2005-12-06 14:12:15
Message-ID: 1133878335.3608.3.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Ühel kenal päeval, L, 2005-12-03 kell 09:21, kirjutas Simon Riggs:

> First off, I think we need to do some more work on partitioning so that
> some knowledge about the union set is understood by the optimizer. At
> the moment there is no concept of partition key, so its hard to spot
> when two union sets have the same key to allow pushdown.

One of the easier cases would be non-overlapping (exclusive) constraints
on union subtables on the joined column.

This could serve as a "partition key", or in case of many nonoverlapping
columns (ex.: table is partitioned by date and region), as many
partition keys.

-------------
Hannu


From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Hannu Krosing <hannu(at)skype(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Philip Warner <pjw(at)rhyme(dot)com(dot)au>, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Optimizer oddness, possibly compounded in 8.1
Date: 2005-12-06 18:32:53
Message-ID: 1133893973.2906.876.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, 2005-12-06 at 16:12 +0200, Hannu Krosing wrote:
> Ühel kenal päeval, L, 2005-12-03 kell 09:21, kirjutas Simon Riggs:
>
> > First off, I think we need to do some more work on partitioning so that
> > some knowledge about the union set is understood by the optimizer. At
> > the moment there is no concept of partition key, so its hard to spot
> > when two union sets have the same key to allow pushdown.
>
> One of the easier cases would be non-overlapping (exclusive) constraints
> on union subtables on the joined column.
>
> This could serve as a "partition key", or in case of many nonoverlapping
> columns (ex.: table is partitioned by date and region), as many
> partition keys.

Yes, thats my planned direction.

Best Regards, Simon Riggs


From: Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Hannu Krosing <hannu(at)skype(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Philip Warner <pjw(at)rhyme(dot)com(dot)au>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Optimizer oddness, possibly compounded in 8.1
Date: 2005-12-07 01:46:57
Message-ID: 43963F11.9000101@familyhealth.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

>>One of the easier cases would be non-overlapping (exclusive) constraints
>>on union subtables on the joined column.
>>
>>This could serve as a "partition key", or in case of many nonoverlapping
>>columns (ex.: table is partitioned by date and region), as many
>>partition keys.
>
>
> Yes, thats my planned direction.

In case you didn't know btw, MySQL 5.1 is out with rather extensive
table partition support. So get coding :D

Chris


From: Michael Glaesemann <grzm(at)myrealbox(dot)com>
To: Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Optimizer oddness, possibly compounded in 8.1
Date: 2005-12-07 01:50:16
Message-ID: 52DC5701-142F-4196-A3F6-9595D175E706@myrealbox.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On Dec 7, 2005, at 10:46 , Christopher Kings-Lynne wrote:

> In case you didn't know btw, MySQL 5.1 is out with rather extensive
> table partition support. So get coding :D

You do mean MySQL 5.1 alpha is out, right?

Michael Glaesemann
grzm myrealbox com